Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-15 Thread Chris Redekop
> > Linas, could you capture the output of pg_controldata *and* increase the
> > log level to DEBUG1 on the standby? We should then see nextXid value of
> > the checkpoint the recovery is starting from.
>
> I'll try to do that whenever I'm in that territory again... Incidentally,
> recently there was a lot of unrelated-to-this-post work to polish things
up
> for a talk being given at PGWest 2011 Today :)
>
> > I also checked what rsync does when a file vanishes after rsync computed
the
> > file list, but before it is sent. rsync 3.0.7 on OSX, at least,
complains
> > loudly, and doesn't sync the file. It BTW also exits non-zero, with a
special
> > exit code for precisely that failure case.
>
> To be precise, my script has logic to accept the exit code 24, just as
> stated in PG manual:
>
> Docs> For example, some versions of rsync return a separate exit code for
> Docs> "vanished source files", and you can write a driver script to accept
> Docs> this exit code as a non-error case.

I also am running into this issue and can reproduce it very reliably.  For
me, however, it happens even when doing the "fast backup" like so:
pg_start_backup('whatever', true)...my traffic is more write-heavy than
linas's tho, so that might have something to do with it.  Yesterday it
reliably errored out on pg_clog every time, but today it is
failing sporadically on pg_subtrans (which seems to be past where the
pg_clog error was)the only thing that has changed is that I've changed
the log level to debug1I wouldn't think that could be related though.
 I've linked the requested pg_controldata and debug1 logs for both errors.
 Both links contain the output from pg_start_backup, rsync, pg_stop_backup,
pg_controldata, and then the postgres debug1 log produced from a subsequent
startup attempt.

pg_clog: http://pastebin.com/mTfdcjwH
pg_subtrans: http://pastebin.com/qAXEHAQt

Any workarounds would be very appreciated.would copying clog+subtrans
before or after the rest of the data directory (or something like that) make
any difference?

Thanks!


[HACKERS] plpython SPI cursors

2011-10-15 Thread Jan Urbański
Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

This patch allows reading the result set in smaller chunks, using a SPI
cursor behind the scenes.

Example usage:

cursor = plpy.cursor("select a, b from hugetable")
for row in cursor:
plpy.info("a is %s and b is %s" % (row['a'], row['b']))

The patch itself is simple, but there's a lot of boilerplate dedicated
to opening a subtransaction and handling prepared plans. I'd like to do
some refactoring of they way PL/Python uses SPI to reduce the amount of
boilerplate needed, but that'll come as a separate patch (just before
the patch to split plpython.c in smaller chunks).

This feature has been sponsored by Nomao.

Cheers,
Jan

PS: I already added it to the November CF.

J
>From 9ad14957e7b4ae19667df3bb8cc2aa5ef5bf96c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jan=20Urba=C5=84ski?= 
Date: Tue, 13 Sep 2011 14:42:41 +0200
Subject: [PATCH] Add cursor support to plpythonu.

Exposes SPI cursors as plpythonu objects allowing processing large
result sets without loading them entirely into memory, as plpy.execute
is doing.
---
 doc/src/sgml/plpython.sgml |   80 
 src/pl/plpython/expected/plpython_spi.out  |  151 +++
 src/pl/plpython/expected/plpython_test.out |6 +-
 src/pl/plpython/plpython.c |  605 
 src/pl/plpython/sql/plpython_spi.sql   |  116 ++
 5 files changed, 955 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index eda2bbf..d08c3d1 100644
*** a/doc/src/sgml/plpython.sgml
--- b/doc/src/sgml/plpython.sgml
*** $$ LANGUAGE plpythonu;
*** 892,897 
--- 892,906 

  

+ Note that calling plpy.execute will cause the entire
+ result set to be read into memory. Only use that function when you are sure
+ that the result set will be relatively small.  If you don't want to risk
+ excessive memory usage when fetching large results,
+ use plpy.cursor rather
+ than plpy.execute.
+   
+ 
+   
 For example:
  
  rv = plpy.execute("SELECT * FROM my_table", 5)
*** $$ LANGUAGE plpythonu;
*** 958,963 
--- 967,1043 
  

  
+   
+ Accessing data with cursors
+ 
+   
+ The plpy.cursor function accepts the same arguments
+ as plpy.execute (except for limit)
+ and returns a cursor object, which allows you to process large result sets
+ in smaller chunks.  As with plpy.execute, either a query
+ string or a plan object along with a list of arguments can be used.  The
+ cursor object provides a fetch method that accepts an
+ integer paramter and returns a result object.  Each time you
+ call fetch, the returned object will contain the next
+ batch of rows, never larger than the parameter value.  Once all rows are
+ exhausted, fetch starts returning an empty result
+ object.  Cursor objects also provide an
+ http://docs.python.org/library/stdtypes.html#iterator-types";>iterator
+ interface, yielding one row at a time until all rows are exhausted.
+ Data fetched that way is not returned as result objects, but rather as
+ dictionaries, each dictionary corresponding to a single result row.
+   
+ 
+   
+ Cursors are automatically disposed of, but if you want to explicitly
+ release all resources held by a cursor, use the close
+ method.  Once closed, a cursor cannot be fetched from anymore.
+   
+ 
+
+ 
+   Do not confuse objects created by plpy.cursor with
+   DBAPI cursors as defined by
+   the http://www.python.org/dev/peps/pep-0249/";>Python Database API specification.
+   They don't have anything in common except for the name.
+ 
+
+ 
+   
+ An example of two ways of processing data from a large table would be:
+ 
+ CREATE FUNCTION count_odd_iterator() RETURNS integer AS $$
+ odd = 0
+ for row in plpy.cursor("select num from largetable"):
+ if row['num'] % 2:
+  odd += 1
+ return odd
+ $$ LANGUAGE plpythonu;
+ 
+ CREATE FUNCTION count_odd_fetch(batch_size integer) RETURNS integer AS $$
+ odd = 0
+ cursor = plpy.cursor("select num from largetable")
+ while True:
+ rows = cursor.fetch(batch_size)
+ if not rows:
+ break
+ for row in rows:
+ if row['num'] % 2:
+ odd += 1
+ return odd
+ $$ LANGUAGE plpythonu;
+ 
+ CREATE FUNCTION count_odd_prepared() RETURNS integer AS $$
+ odd = 0
+ plan = plpy.prepare("select num from largetable where num % $1 != 0", ["integer"])
+ rows = list(plpy.cursor(plan, [2]))
+ 
+ return len(rows)
+ $$ LANGUAGE plpythonu;
+ 
+   
+   
+ 

 Trapping Errors
  
diff --git a/src/pl/plpython/expected/plpython_spi.out b/src/pl/plpython/expected/plpython_spi.out
index 7f4ae5c..3b4d7a3 100644
*** a/src/pl/plpython/expected/plpython_spi.out
--

Re: [HACKERS] COUNT(*) and index-only scans

2011-10-15 Thread Jeff Janes
On Mon, Oct 10, 2011 at 9:48 PM, Kevin Grittner
 wrote:
>> Jeff Janes  wrote:
>> Kevin Grittner  wrote:
>
>>> create table t (id int not null primary key);
>>> insert into t select generate_series(1, 100);
>>> vacuum freeze analyze;
>>> explain analyze select count(*) from t
>>> where id between 50 and 500010;
>>>
>>> That gives you an index-only scan; but without the WHERE clause it
>>> uses a seq scan.
>>
>> If you convert the where clause to "where id is not null" it uses
>> the index only scan again, but only if you nudge it too with
>> enable_seqscan=off.

With a recent commit from (I assume) Tom, the "where id is not null"
is no longer needed.

> Clever way to get a full-table test.
>
> It turns out that for the above, with your trick to use the index
> only scan, it comes out 12% faster to do a seqscan, even when the
> table and index are fully cached (based on the average time of ten
> runs each way).  There's very little overlap, so the difference looks
> real.  But that's on a very narrow record, having just the one column
> used in the index.  I added one wide column like this:
>
> alter table t add column x text;
> update t set x = (repeat(random()::text, (random() * 100)::int));
> cluster t USING t_pkey;
> vacuum freeze analyze;
>
> With that change the index-only scan time remained unchanged, while
> the seqscan time grew to about 2.6 times the index only scan time.
> That was mildly surprising for me, considering it was all still
> cached.

I used the pgbench_accounts table from pgbench -i -s 50, where all
data fits in shared_buffers, using the -f switch with either

set enable_seqscan=off;
select count(*) from pgbench_accounts;

or

set enable_indexonlyscan=off;
select count(*) from pgbench_accounts;


With just a single client, it was a toss-up.  But with 8 concurrent
clients on a 8 CPU machine, the index-only scan was 50% faster.  So
that is a nice win, even if well-designed apps probably shouldn't be
endlessly counting rows of an unchanging table using all available
CPUs in the first place.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pushing ScalarArrayOpExpr support into the btree index AM

2011-10-15 Thread Noah Misch
On Sat, Oct 15, 2011 at 02:58:45PM -0400, Tom Lane wrote:
> [algorithm for a regular index scan satisfying "key IN (...)"]

> So, at least as far as btrees are concerned, it seems like I implemented
> the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed
> down into the index AM.  The above rules seem pretty btree-specific, so
> I don't think that we ought to have the main executor doing any of this.
> I envision doing this by marking btree as supporting ScalarArrayOpExpr
> scankeys directly, so that the executor does nothing special with them,
> and the planner treats them the same as regular scalar indexquals.

Sounds sensible.  The algorithm applies to more than ScalarArrayOpExpr; is it
not the ability to handle an OR'ed list of ScanKey instead of an AND'ed one?
Would it be worth exposing the capability along those lines instead, even if the
only initial consumer is ScalarArrayOpExpr?

> In principle somebody could be doing something like
>   WHERE pointcol <@ ANY (ARRAY[list of box values])
> and expecting that to generate a bitmap indexscan on a GIST index, but
> is it likely that anyone is doing that?  (As opposed to the equivalent
> formulation with "pointcol <@ box1 OR pointcol <@ box2 ...", which would
> still work to generate OR'd bitmap scans even if we took out the
> ScalarArrayOpExpr logic.)

Removing the "key IN (...)" optimization for hash indexes would add one more
barrier to making that access method practical.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-15 Thread Jeff Janes
On Sat, Oct 15, 2011 at 2:16 PM, Jeff Janes  wrote:
> On Fri, Oct 7, 2011 at 11:40 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Please find attached a patch implementing a basic version of
>>> index-only scans.
>>
>> I'm making some progress with this, but I notice what seems like a
>> missing feature: there needs to be a way to turn it off.  Otherwise
>> performance comparisons will be difficult to impossible.
>>
>> The most obvious solution is a planner control GUC, perhaps
>> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?
>>
>
> Currently I can't force an indexonlyscan over an indexscan, because of
> the way the enable_* variables work.

OK, scratch that.  I must have been using the wrong query (for
which the index was not covering), as I can't reproduce the
behavior nor looking at the code can I see how it could have
occurred in the first place.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pushing ScalarArrayOpExpr support into the btree index AM

2011-10-15 Thread Robert Haas
On Sat, Oct 15, 2011 at 2:58 PM, Tom Lane  wrote:
> In principle somebody could be doing something like
>        WHERE pointcol <@ ANY (ARRAY[list of box values])
> and expecting that to generate a bitmap indexscan on a GIST index, but
> is it likely that anyone is doing that?  (As opposed to the equivalent
> formulation with "pointcol <@ box1 OR pointcol <@ box2 ...", which would
> still work to generate OR'd bitmap scans even if we took out the
> ScalarArrayOpExpr logic.)

That seems like a pretty natural formulation to me, so I would be
rather reluctant to assume nobody's doing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans

2011-10-15 Thread Jeff Janes
On Fri, Oct 7, 2011 at 11:40 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Please find attached a patch implementing a basic version of
>> index-only scans.
>
> I'm making some progress with this, but I notice what seems like a
> missing feature: there needs to be a way to turn it off.  Otherwise
> performance comparisons will be difficult to impossible.
>
> The most obvious solution is a planner control GUC, perhaps
> "enable_indexonlyscan".  Anyone object, or want to bikeshed the name?
>

Currently I can't force an indexonlyscan over an indexscan, because of
the way the enable_* variables work.

Should setting enable_indexscan=off also disable index-only scans (the
current behavior) in addition to plain index scans?

By way of precedent, enable_indexscan=off does not disable Bitmap Index Scan.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Call stacks and RAISE INFO

2011-10-15 Thread Josh Berkus

> No, I don't think so.  The use-case for this sort of thing seems to me
> to be messages that are directed to the user or DBA, and don't want to
> be decorated with a lot of information about where they came from.
> That determination is usually pretty clear when you write the code.

For my case, I agree with Tom.  For example, in my recent debugging
session, I was debugging a recursive function ... one which calls
itself, up to 6 levels deep.  For that function, I want to turn context
off because there's so much it becomes unreadable, and instead I put a
nesting counter in the INFO.

I don't want to turn of context for other functions universally -- even
in the same ETL session -- because I want to know what called them,
since some of them can be called on multiple paths.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pushing ScalarArrayOpExpr support into the btree index AM

2011-10-15 Thread Tom Lane
Almost immediately after we committed index-only scans, there were
complaints that it didn't work with "indexkey IN (...)" conditions, that
is ScalarArrayOpExpr index quals.  That's because the current code only
supports ScalarArrayOpExpr as a bitmap indexscan qual, not a regular
indexscan.  The executor performs a separate indexscan for each element of
the array, relying on the bitmap mechanism to eliminate duplicate hits on
the same heap tuple.

In principle it's not hard to see how ScalarArrayOpExpr could be supported
as a regular indexqual for btree indexes.  If the comparison operator has
<, <=, >=, or > semantics, then the array condition is equivalent to a
simple scalar condition using the greatest or least array element
respectively.  Otherwise (i.e., the operator is =):

1. Sort the array elements into the same order as the btree, eliminating
duplicates and NULL entries.  (If there are no non-null elements, the
qual condition is unsatisfiable and we can skip the indexscan.)

2. Perform an indexscan for each remaining array element, in sequence.
Since we eliminated equal values in step 1, there can be no two matches
to the same index entry, so there's no need for duplicate elimination.
What's more, because we sorted the array to match the index order, index
entries will be matched in index order, so the results of the scan can
still be expected to come out in sorted order, a critical expectation for
btree indexscans.

If we have more than one ScalarArrayOpExpr then we have to be a bit
careful about how we perform the multiple indexscans to ensure that output
ordering is preserved, but it's certainly doable.

So, at least as far as btrees are concerned, it seems like I implemented
the ScalarArrayOpExpr logic at the wrong level and it ought to be pushed
down into the index AM.  The above rules seem pretty btree-specific, so
I don't think that we ought to have the main executor doing any of this.
I envision doing this by marking btree as supporting ScalarArrayOpExpr
scankeys directly, so that the executor does nothing special with them,
and the planner treats them the same as regular scalar indexquals.

So that leaves me wondering whether the existing executor-level
implementation of ScalarArrayOpExpr indexquals ought to be ripped out
entirely.  It would not save all that much code in the executor proper
(basically ExecIndexEvalArrayKeys, ExecIndexAdvanceArrayKeys, and a few
lines of supporting logic).  However, there's a fair amount of cruft
in the planner to deal with the concept that ScalarArrayOpExpr is
supported as a bitmap indexscan qual but not a plain indexscan qual.
If that code is only going to get exercised for non-btree index types,
it's likely to be under-tested and hence a continuing source of bugs.

In principle somebody could be doing something like
WHERE pointcol <@ ANY (ARRAY[list of box values])
and expecting that to generate a bitmap indexscan on a GIST index, but
is it likely that anyone is doing that?  (As opposed to the equivalent
formulation with "pointcol <@ box1 OR pointcol <@ box2 ...", which would
still work to generate OR'd bitmap scans even if we took out the
ScalarArrayOpExpr logic.)

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Call stacks and RAISE INFO

2011-10-15 Thread Pavel Stehule
2011/10/15 Tom Lane :
> Pavel Stehule  writes:
>> 2011/10/15 Robert Haas :
>>> On Sat, Oct 15, 2011 at 12:24 AM, Pavel Stehule  
>>> wrote:
 I have no problem with this.  A context can be false for info and true
 for other in default. Please, use a different identifier than
 "context", that can be use for reading context in future - maybe
 "attach_context" or some similar.
>
>>> error_context?
>
>> what about show_context, hide_context, hold_context, use_context ??
>
> I still think it should be CONTEXT, period.  All the other options to
> RAISE are named directly after the message lines they control; why
> should this one be different?

I had a idea to set CONTEXT from RAISE statement - for forwarding data
from handled exception

some like

BEGIN
...
EXCEPTION WHEN ...
  GET DIAGNOSTICS _context = PG_EXCEPTION_CONTEXT;
  REISE USING context = _context;
END;

Regards

Pavel Stehule

>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: .psqlrc version dependence (was Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH)

2011-10-15 Thread Bruce Momjian
Tom Lane wrote:
> Peter Eisentraut  writes:
> > Contrary to what the subject suggests, I think the main reason people
> > wanted this feature was to be able to set the linestyle to unicode
> > without getting a warning from older releases about unknown linestyle or
> > something.  But in a few years, they'll have to
> > maintain .psqlrc-9.2, .psqlrc-9.3, .psqlrc-9.4, etc.  That doesn't sound
> > like a useful long-term solution either.
> 
> Well, "in a few years" they won't need that conditionality any more at
> all, so I'm not sure I believe the above argument.  The problem seems
> inherently self-limiting.
> 
> What struck me while looking at the patch is that it is conditional
> based on *psql's* version.  Not the version of the server you're
> connected to.  I'm not too sure what use-cases people have for version
> dependence here, but I'd think that the server version would enter into
> it sometimes.

The assumption is that the .psqlrc is controlling psql behavior.  Not
sure what setting would be changed based on server version, maybe psql
variables. 

I have updated the docs to indicate it is the psql version.

> (Of course, for server version to be used usefully, you'd need to
> re-execute the rc file during \c, something we don't do now.)

Yep, yuck.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


.psqlrc version dependence (was Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH)

2011-10-15 Thread Tom Lane
Peter Eisentraut  writes:
> Contrary to what the subject suggests, I think the main reason people
> wanted this feature was to be able to set the linestyle to unicode
> without getting a warning from older releases about unknown linestyle or
> something.  But in a few years, they'll have to
> maintain .psqlrc-9.2, .psqlrc-9.3, .psqlrc-9.4, etc.  That doesn't sound
> like a useful long-term solution either.

Well, "in a few years" they won't need that conditionality any more at
all, so I'm not sure I believe the above argument.  The problem seems
inherently self-limiting.

What struck me while looking at the patch is that it is conditional
based on *psql's* version.  Not the version of the server you're
connected to.  I'm not too sure what use-cases people have for version
dependence here, but I'd think that the server version would enter into
it sometimes.

(Of course, for server version to be used usefully, you'd need to
re-execute the rc file during \c, something we don't do now.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump vs malloc

2011-10-15 Thread Magnus Hagander
On Fri, Oct 14, 2011 at 21:11, Bruce Momjian  wrote:
> Magnus Hagander wrote:
>> On Wed, Jun 22, 2011 at 17:48, Tom Lane  wrote:
>> > Magnus Hagander  writes:
>> >> Something along the line of this?
>> >
>> > I think this is a seriously, seriously bad idea:
>> >
>> >> +#define strdup(x) pg_strdup(x)
>> >> +#define malloc(x) pg_malloc(x)
>> >> +#define calloc(x,y) pg_calloc(x, y)
>> >> +#define realloc(x,y) pg_realloc(x, y)
>> >
>> > as it will render the code unreadable to people expecting the normal
>> > behavior of these fundamental functions; not to mention break any
>> > call sites that have some other means of dealing with an alloc failure
>> > besides going belly-up. ?Please take the trouble to do
>> > s/malloc/pg_malloc/g and so on, instead.
>>
>> Ok, I'll try that approach. This seemed like a "nicer" approach, but I
>> think once written out, i agree with your arguments :-)
>
> Where are we on this?

It's still sitting on my personal TODO list, just not with a really
high priority.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-10-15 Thread Alexander Korotkov
Hi!

Thanks for your attention to my patch!

On Sat, Oct 15, 2011 at 2:47 PM, Nathan Boley  wrote:

> > Looking now, I see that Alexander wasn't Cc'd on the review, so it's
> > possible he missed the message?
> >
>
> We've corresponded off list and have discussed my review at some length.
>
> Alex submitted an updated patch on Sep 22 to me personally ( although
> not to the list? Alex? ), with the promise of a new version with
> improved comments.
>

Oh, I didn't noticed that I've posted updated patch off-list. So, there is
repost of that version of patch. List of changes is below:
1) Distinct slot is used for length histogram.
2) Standard statistics is collected for arrays.
3) Most common values and most common elements are mapped to distinct
columns of pg_stats view, because both of them are calculated for arrays.

Now, it's hard for me to give to it as much time as I would like to. But I
hope to present improved comments and testing until end of october.

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.5.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Call stacks and RAISE INFO

2011-10-15 Thread Tom Lane
Pavel Stehule  writes:
> 2011/10/15 Robert Haas :
>> On Sat, Oct 15, 2011 at 12:24 AM, Pavel Stehule  
>> wrote:
>>> I have no problem with this.  A context can be false for info and true
>>> for other in default. Please, use a different identifier than
>>> "context", that can be use for reading context in future - maybe
>>> "attach_context" or some similar.

>> error_context?

> what about show_context, hide_context, hold_context, use_context ??

I still think it should be CONTEXT, period.  All the other options to
RAISE are named directly after the message lines they control; why
should this one be different?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-10-15 Thread Bruce Momjian
Andrew Dunstan wrote:
> 
> 
> On 10/15/2011 09:37 AM, Peter Eisentraut wrote:
> > On tor, 2011-10-13 at 11:31 -0400, Bruce Momjian wrote:
> >> The attached patch changes this to use the _major_ version number for
> >> psql rc files.  Does this have to be backward-compatible?  Should I
> >> check for minor and major matches?  That is going to be confusing to
> >> document.
> > Contrary to what the subject suggests, I think the main reason people
> > wanted this feature was to be able to set the linestyle to unicode
> > without getting a warning from older releases about unknown linestyle or
> > something.  But in a few years, they'll have to
> > maintain .psqlrc-9.2, .psqlrc-9.3, .psqlrc-9.4, etc.  That doesn't sound
> > like a useful long-term solution either.
> >
> >
> 
> Wouldn't it be better to support some conditional syntax?

I suppose if we add that to psql we can remove this factility
completely.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-10-15 Thread Bruce Momjian
Peter Eisentraut wrote:
> On tor, 2011-10-13 at 11:31 -0400, Bruce Momjian wrote:
> > The attached patch changes this to use the _major_ version number for
> > psql rc files.  Does this have to be backward-compatible?  Should I
> > check for minor and major matches?  That is going to be confusing to
> > document.
> 
> Contrary to what the subject suggests, I think the main reason people
> wanted this feature was to be able to set the linestyle to unicode
> without getting a warning from older releases about unknown linestyle or
> something.  But in a few years, they'll have to
> maintain .psqlrc-9.2, .psqlrc-9.3, .psqlrc-9.4, etc.  That doesn't sound
> like a useful long-term solution either.

Well, frankly, I think the fact we were matching on minor version number
was even worse.  This is slightly better.  I guess they could use
symlinks to keep a config file for multiple versions, but I agree it
isn't a long-term great solution.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-10-15 Thread Andrew Dunstan



On 10/15/2011 09:37 AM, Peter Eisentraut wrote:

On tor, 2011-10-13 at 11:31 -0400, Bruce Momjian wrote:

The attached patch changes this to use the _major_ version number for
psql rc files.  Does this have to be backward-compatible?  Should I
check for minor and major matches?  That is going to be confusing to
document.

Contrary to what the subject suggests, I think the main reason people
wanted this feature was to be able to set the linestyle to unicode
without getting a warning from older releases about unknown linestyle or
something.  But in a few years, they'll have to
maintain .psqlrc-9.2, .psqlrc-9.3, .psqlrc-9.4, etc.  That doesn't sound
like a useful long-term solution either.




Wouldn't it be better to support some conditional syntax?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-10-15 Thread Peter Eisentraut
On tor, 2011-10-13 at 11:31 -0400, Bruce Momjian wrote:
> The attached patch changes this to use the _major_ version number for
> psql rc files.  Does this have to be backward-compatible?  Should I
> check for minor and major matches?  That is going to be confusing to
> document.

Contrary to what the subject suggests, I think the main reason people
wanted this feature was to be able to set the linestyle to unicode
without getting a warning from older releases about unknown linestyle or
something.  But in a few years, they'll have to
maintain .psqlrc-9.2, .psqlrc-9.3, .psqlrc-9.4, etc.  That doesn't sound
like a useful long-term solution either.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIMITing number of results in a VIEW with global variables

2011-10-15 Thread Thomas Girault
Hello Florian,

It seems dangerous for a cast to modify global state such a MU. The
> evaluation
> order of functions during query execute isn't always easy to guess, and may
> change depending on the execution plan.


I supposed that fuzzy2bool is called just before the terminal evaluation of
the WHERE clause which needs a Boolean. My first tests showed that this
hypothesis is right but it might be wrong in the case of alternative
execution plans.


> > With this implicit cast, the query
> >  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE young(age);
> > is equivalent to
> >  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE
> > fuzzy2bool(young(age));
>
> Those set_alpha() calls seem equally dangerous. If this alpha is supposed
> to be a global parameter, why not set it *before* issuing the query?


Alternatively, we could also set the alpha value before the query :


SELECT set_alpha(0.1); SELECT age, young(age) FROM employees WHERE
young(age);


I would be very interested to know if there is smarter way to set global
variables.

> I can sort the results in the view 'sorted_employees' according to
> > value MU of a fuzzy predicate thanks to fuzzy2bool cast function.
> >
> > CREATE OR REPLACE VIEW sorted_employees AS
> >SELECT *, get_mu() as mu
> >FROM employees
> >ORDER BY mu DESC;
>
> Are you aware that an ORDER BY clause in a VIEW is only going to work
> if you do "SELECT .. .FROM view".


I It really the first time I am using views, I didn't know that !


> If the outer query is more complex
> than that, I wouldn't bet on the results coming back in the expected order.
>

I don't mind if the ordering is wrong : it is just a way to process
filtering according to K and ALPHA.

Usually, you'd attach ORDER BY to the outermost query (or to subqueries
> containing a LIMIT clause).
>
> > The following query
> >  SELECT age, mu FROM set_k(5), set_alpha(0.1), filtered_employees
>
> Again, you seem to rely on these set_k(), set_alpha() calls being
> done before the query is executed. But postgres doesn't guarantee
> that.
>
> > WHERE young(age);
> > gives the results :
> >  age  mu
> >  24   1
> >  16   1
> > instead of :
> >  age  mu
> >  16   1
> >  21   0.89976158142
> >  24   0.60023841858
> >  26   0.40005960464
> >  26   0.40005960464
> >
> > It seems that the 'LIMIT K' instruction have side effects on the MU
> value.
>
> The execution plan may very well vary depending on the limit. Off-hand,
> I'd guess that with a small K, one of these set_whatever() calls in one
> of your FROM clauses gets executed after the computation it'd supposed
> to affect has already happened.
>
> > Why is it not working ? How to fix this issue ?
>
> Don't rely on the execution order of function calls in a SELECT statement.
> Divide your functions into two classes.
>
> The ones which have side-effects (i.e. change global state). These should
> *never* be called from SQL statements, except in the trivial case of
> "SELECT my_func(...);". Also, they should be marked with VOLATILE
>
> And the ones without side effects. Those should be marked with IMMUTABLE,
> STABLE of VOLATILE, depending on how they're influenced by global state
> changes. Read the documentation on these flags.
>
> Finally, don't assume that ORDER BY inside a view influences the output
> order of queries using the view. (Except in the trivial case of
> "SELECT * FROM view"). As a rule of thumb, ORDER BY in a view only makes
> sense if there's also a LIMIT clause. You are then guaranteed that the
> view returns the first  rows according to the specified order.
> Don't assume they're necessarily returned in ascending order, though.
>
> best regards,
> Florian Pflug
>

Thank you very much ! I will try to follow your advices.

Thomas


Re: [HACKERS] LIMITing number of results in a VIEW with global variables

2011-10-15 Thread Thomas Girault
Hello,

Thank you for your answer Robert.

> Well, SQL, our our dialect of it anyway, doesn't have global
> variables.  So I think the above is going to throw a syntax error.
> You may have global variables in your C code, but those won't be
> visible from the SQL level.

I was wrong in the definition of  filtered_employees view.
The correct one is :

CREATE OR REPLACE VIEW filtered_employees AS
SELECT *, get_mu() as mu
FROM employees
ORDER BY mu DESC
LIMIT get_k();


Note that the access to global C variables (K, MU and ALPHA) from SQL is
working well with my definitions of get_k(), get_mu()...  There is no syntax
error here, however the view is not working because it has side effects on
the value of MU.

I have also tried two other alternatives to the LIMIT keyword but it doesn't
work yet :
1) SQL instruction :  RANK() OVER(ORDER BY get_mu()) as sqlf_rank ;
2) C instruction : SPI_exec(query, K) ... it leads to a segmentation fault.

> In general, I think you'd be better off not relying on C global
> variables either,

I don't understand how I could avoid using global variables in some cases.
For instance, I must store the float value $1 corresponding to a fuzzy
predicate degree in the following fuzzy2bool cast operation :

CREATE OR REPLACE FUNCTION fuzzy2bool(FLOAT)
RETURNS BOOLEAN LANGUAGE SQL AS
'SELECT set_mu($1); -- Here $1 must be stored in MU for each record observed
in the view;
SELECT $1 > get_alpha()'; -- Then $1 is converted to Boolean according to
ALPHA global value


> and instead passing the values you need as function
> arguments.
Do you mean that I should define a function filter(table_name TEXT, k
INTEGER, alpha FLOAT) ?

Thanks again for your help,

Thomas Girault


Re: [HACKERS] LIMITing number of results in a VIEW with global variables

2011-10-15 Thread Florian Pflug
On Oct14, 2011, at 16:43 , Thomas Girault wrote:
>  CREATE OR REPLACE FUNCTION fuzzy2bool(FLOAT)
>  RETURNS BOOLEAN LANGUAGE SQL AS 'SELECT set_mu($1);SELECT $1 > get_alpha()';

It seems dangerous for a cast to modify global state such a MU. The evaluation
order of functions during query execute isn't always easy to guess, and may
change depending on the execution plan.

> With this implicit cast, the query
>  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE young(age);
> is equivalent to
>  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE
> fuzzy2bool(young(age));

Those set_alpha() calls seem equally dangerous. If this alpha is supposed
to be a global parameter, why not set it *before* issuing the query?

> I can sort the results in the view 'sorted_employees' according to
> value MU of a fuzzy predicate thanks to fuzzy2bool cast function.
> 
> CREATE OR REPLACE VIEW sorted_employees AS
>SELECT *, get_mu() as mu
>FROM employees
>ORDER BY mu DESC;

Are you aware that an ORDER BY clause in a VIEW is only going to work
if you do "SELECT .. .FROM view". If the outer query is more complex
than that, I wouldn't bet on the results coming back in the expected order.

Usually, you'd attach ORDER BY to the outermost query (or to subqueries
containing a LIMIT clause).

> The following query
>  SELECT age, mu FROM set_k(5), set_alpha(0.1), filtered_employees

Again, you seem to rely on these set_k(), set_alpha() calls being
done before the query is executed. But postgres doesn't guarantee
that. 

> WHERE young(age);
> gives the results :
>  age  mu
>  24   1
>  16   1
> instead of :
>  age  mu
>  16   1
>  21   0.89976158142
>  24   0.60023841858
>  26   0.40005960464
>  26   0.40005960464
> 
> It seems that the 'LIMIT K' instruction have side effects on the MU value.

The execution plan may very well vary depending on the limit. Off-hand,
I'd guess that with a small K, one of these set_whatever() calls in one
of your FROM clauses gets executed after the computation it'd supposed
to affect has already happened.

> Why is it not working ? How to fix this issue ?

Don't rely on the execution order of function calls in a SELECT statement.
Divide your functions into two classes.

The ones which have side-effects (i.e. change global state). These should
*never* be called from SQL statements, except in the trivial case of
"SELECT my_func(...);". Also, they should be marked with VOLATILE

And the ones without side effects. Those should be marked with IMMUTABLE,
STABLE of VOLATILE, depending on how they're influenced by global state
changes. Read the documentation on these flags.

Finally, don't assume that ORDER BY inside a view influences the output
order of queries using the view. (Except in the trivial case of
"SELECT * FROM view"). As a rule of thumb, ORDER BY in a view only makes
sense if there's also a LIMIT clause. You are then guaranteed that the
view returns the first  rows according to the specified order.
Don't assume they're necessarily returned in ascending order, though.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Call stacks and RAISE INFO

2011-10-15 Thread Pavel Stehule
2011/10/15 Robert Haas :
> On Sat, Oct 15, 2011 at 12:24 AM, Pavel Stehule  
> wrote:
>> I have no problem with this.  A context can be false for info and true
>> for other in default. Please, use a different identifier than
>> "context", that can be use for reading context in future - maybe
>> "attach_context" or some similar.
>
> error_context?

what about show_context, hide_context, hold_context, use_context ??


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [GENERAL][HACKERS] register creation date of table

2011-10-15 Thread Robert Haas
On Fri, Oct 14, 2011 at 6:20 AM, Willy-Bas Loos  wrote:
> 1. I think that there is no such information in the system tables. is
> that correct?

Yes.  It's been discussed before but some people (particularly, Tom,
IIRC) are not convinced that it's useful enough to justify its
existence.

> 2. i would like to go back in time. I think that i will just look up
> the creation date for the files in the data directory and translate
> their oid's to the object names and then update their dates. This
> would of course only work from the last restore. Is that a good way to
> do it?

Well, that timestamp will get bumped on TRUNCATE, CLUSTER, VACUUM
FULL, and rewriting versions of ALTER TABLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Call stacks and RAISE INFO

2011-10-15 Thread Robert Haas
On Sat, Oct 15, 2011 at 12:24 AM, Pavel Stehule  wrote:
> I have no problem with this.  A context can be false for info and true
> for other in default. Please, use a different identifier than
> "context", that can be use for reading context in future - maybe
> "attach_context" or some similar.

error_context?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIMITing number of results in a VIEW with global variables

2011-10-15 Thread Robert Haas
On Fri, Oct 14, 2011 at 10:43 AM, Thomas Girault  wrote:
> I am now trying to limit the number of results in the view according
> to the global value K :
>
> CREATE OR REPLACE VIEW filtered_employees AS
>    SELECT *, get_mu() as mu
>    FROM employees
>    ORDER BY mu DESC
>    LIMIT K;

Well, SQL, our our dialect of it anyway, doesn't have global
variables.  So I think the above is going to throw a syntax error.
You may have global variables in your C code, but those won't be
visible from the SQL level.

In general, I think you'd be better off not relying on C global
variables either, and instead passing the values you need as function
arguments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-10-15 Thread Nathan Boley
> Looking now, I see that Alexander wasn't Cc'd on the review, so it's
> possible he missed the message?
>

We've corresponded off list and have discussed my review at some length.

Alex submitted an updated patch on Sep 22 to me personally ( although
not to the list? Alex? ), with the promise of a new version with
improved comments.

Best,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in walsender when calling out to do_pg_stop_backup (and others?)

2011-10-15 Thread Florian Pflug
On Oct11, 2011, at 09:21 , Magnus Hagander wrote:
> On Tue, Oct 11, 2011 at 03:29, Florian Pflug  wrote:
>> On Oct10, 2011, at 21:25 , Magnus Hagander wrote:
>>> On Thu, Oct 6, 2011 at 23:46, Florian Pflug  wrote:
 It'd be nice to generally terminate a backend if the client vanishes, but 
 so
 far I haven't had any bright ideas. Using FASYNC and F_SETOWN unfortunately
 sends a signal *everytime* the fd becomes readable or writeable, not only 
 on
 EOF. Doing select() in CHECK_FOR_INTERRUPTS seems far too expensive. We 
 could
 make the postmaster keep the fd's of around even after forking a backend, 
 and
 make it watch for broken connections using select(). But with a large 
 max_backends
 settings, we'd risk running out of fds in the postmaster...
>>> 
>>> Ugh. Yeah. But at least catching it and terminating it when we *do*
>>> notice it's down would certainly make sense...
>> 
>> I'll try to put together a patch that sets a flag if we discover a broken
>> connection in pq_flush, and tests that flag in CHECK_FOR_INTERRUPTS. Unless 
>> you
>> wanna, of course.
> 
> Please do, I won't have time to even think about it until after
> pgconf.eu anyway ;)

Ok, here's a first cut.

I've based this on how query cancellation due to recovery conflicts work -
internal_flush() sets QueryCancelPending and ClientConnectionLostPending.

If QueryCancelPending is set, CHECK_FOR_INTERRUPTS checks 
ClientConnectionLostPending, and if it's set it does ereport(FATAL).

I've only done light testing so far - basically the only case I've tested is
killing pg_basebackup while it's waiting for all required WAL to be archived.

best regards,
Florian Pflug



pg.discon_cancel.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers