Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Andrea Suisani

On 01/18/2011 06:44 PM, Robert Haas wrote:

On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com  wrote:

a few weeks back I hacked an experimental patch that removed the hint
bit action completely.  the results were very premature and/or
incorrect, but my initial findings suggested that hint bits might not
be worth the cost from performance standpoint.  i'd like to see some
more investigation in this direction before going with a complex
application mechanism (although that would be beneficial vs the status
quo).


I think it's not very responsible to allege that hint bits aren't
providing a benefit without providing the patch that you used and the
tests that you ran.


maybe I'm wrong but it seems it did post an experimental patch and also
a tests used, see:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg01897.php

 This is a topic that needs careful analysis, and
 I think that saying hint bits don't provide a benefit... maybe...
 doesn't do anything but confuse the issue.  How about doing some tests
 with the patch from my OP and posting the results?  If removing hint
 bits entirely doesn't degrade performance, then surely the
 less-drastic approach I've taken here ought to be OK too.  But in my
 testing, it didn't look too good.



Andrea



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Andrea Suisani

On 01/19/2011 09:03 AM, Andrea Suisani wrote:

On 01/18/2011 06:44 PM, Robert Haas wrote:

On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com wrote:

a few weeks back I hacked an experimental patch that removed the hint
bit action completely. the results were very premature and/or
incorrect, but my initial findings suggested that hint bits might not
be worth the cost from performance standpoint. i'd like to see some
more investigation in this direction before going with a complex
application mechanism (although that would be beneficial vs the status
quo).


I think it's not very responsible to allege that hint bits aren't
providing a benefit without providing the patch that you used and the
tests that you ran.


maybe I'm wrong but it seems it did post an experimental patch and also

   ^^
   he

a tests used, see:

  ^^
  the

sorry for the typos (not enough caffeine I suppose :)



Andrea

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-19 Thread Martijn van Oosterhout
On Tue, Jan 18, 2011 at 10:03:01AM +0200, Heikki Linnakangas wrote:
 That isn't ever going to happen, unless you'd like to give up hash joins
 and hash aggregation on text values.

 You could canonicalize the string first in the hash function. I'm not  
 sure if we have all the necessary information at hand there, but at  
 least with some encoding/locale-specific support functions it'd be 
 possible.

This is what strxfrm() was created for.

strcoll(a,b) == strcmp(strxfrm(a),strxfrm(b))

Sure there's a cost, the question is only how much and whether it makes
hash join unfeasible. I doubt it, since by definition it must be faster
than strcoll(). I suppose a test would be interesting.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Oh, wait a minute: there's a bad restriction there, namely that a
 contrib module could only add loose operators that had different
 declared input types from the ones known to the core opclass.  Otherwise
 there'd be a conflict with the contrib module and core needing to insert
 similarly-keyed support functions.  This would actually be enough for
 contrib/intarray (because the core operator entries are for anyarray
 not for integer[]) but it is easy to foresee cases where that wouldn't
 be good enough.  Seems like we'd need an additional key column in
 pg_amproc to really make this cover all cases.

I would have though that such contrib would then need to offer their own
opfamily and opclasses, and users would have to use the specific opclass
manually like they do e.g. for text_pattern_ops.  Can't it work that way?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread Jan Urbański
On 18/01/11 23:22, Peter Eisentraut wrote:
 On mån, 2011-01-17 at 21:49 +0200, Peter Eisentraut wrote:
 On sön, 2011-01-02 at 12:41 +0100, Jan Urbański wrote:
 Here they are. There are 16 patches in total. They amount to what's
 currently in my refactor branch (and almost to what I've sent as the
 complete refactor patch, while doing the splitting I discovered a few
 useless hunks that I've discarded).

 Committed 0001, rest later.
 
 Today committed: 3, 5, 10, 11, 12, 13

Thanks,

 #2: It looks like this loses some information/formatting in the error
 message.  Should we keep the pointing arrow there?
 
 --- a/src/pl/plpython/expected/plpython_error.out
 +++ b/src/pl/plpython/expected/plpython_error.out
 @@ -10,10 +10,7 @@ CREATE FUNCTION sql_syntax_error() RETURNS text
  SELECT sql_syntax_error();
  WARNING:  PL/Python: plpy.SPIError: unrecognized error in 
 PLy_spi_execute_query
  CONTEXT:  PL/Python function sql_syntax_error
 -ERROR:  syntax error at or near syntax
 -LINE 1: syntax error
 -^
 -QUERY:  syntax error
 +ERROR:  PL/Python: plpy.SPIError: syntax error at or near syntax
  CONTEXT:  PL/Python function sql_syntax_error
  /* check the handling of uncaught python exceptions
   */

Yes, the message is less informative, because the error is reported by
PL/Python, by wrapping the SPI message. I guess I could try to extract
more info from the caught ErrorData and put it in the new error that
PL/Python throws. But I'd like to do that as a refinement of the
spi-in-subxacts patch, because the current behaviour is broken anyway -
to catch the SPI error safely you need to wrap the whole thing in a
subtransaction. This patch only gets rid of the global error state, and
to do that I needed to raise some exception from the try/catch block
around SPI calls.

 #7: This is unnecessary because the remaining fields are automatically
 initialized with NULL.  The Python documentation uses initializations
 like that.  The way I understand it, newer Python versions might add
 more fields at the end, and they will rely on the fact that they get
 automatically initialized even if the source code was for an older
 version.  So I would rather not touch this, as it doesn't buy anything.

OK.

 #16: This is probably pointless because pgindent will reformat this.

If it does, than it's a shame. Maybe the comment should be moved above
the if() then.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confusing comment in TransactionIdIsInProgress

2011-01-19 Thread Simon Riggs
On Tue, 2011-01-18 at 10:51 +0200, Heikki Linnakangas wrote:
 On 18.01.2011 07:15, Jim Nasby wrote:
  Shouldn't the comment read If first time through?
 
  /*
   * If not first time through, get workspace to remember main XIDs in. We
   * malloc it permanently to avoid repeated palloc/pfree overhead.
   */
  if (xids == NULL)
  {
  ...
  xids = (TransactionId *) malloc(maxxids * 
  sizeof(TransactionId));
 
 Huh, yes, I'm amazed that no-one have noticed. I must've read that piece 
 of code dozens of times in the last couple of years myself, and that 
 sentence was even copy-pasted to GetConflictingVirtualXIDs() later in 
 that file, including that thinko.

Yes, I laughed when I saw that, having camped out in that code many
nights. 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread Jan Urbański
On 19/01/11 02:06, Hitoshi Harada wrote:
 2011/1/19 Peter Eisentraut pete...@gmx.net:
 On mån, 2011-01-17 at 21:49 +0200, Peter Eisentraut wrote:
 On sön, 2011-01-02 at 12:41 +0100, Jan Urbański wrote:
 Here they are. There are 16 patches in total. They amount to what's
 currently in my refactor branch (and almost to what I've sent as the
 complete refactor patch, while doing the splitting I discovered a few
 useless hunks that I've discarded).

 Committed 0001, rest later.

 Today committed: 3, 5, 10, 11, 12, 13
 
 I have reviewed this original patches for myself as the fundamental of
 subsequent work, and have comments.
 
 - This is not in the patch, but around line 184 vis versa in comment
 seems like typo.

Right, should certainly be vice versa.

 - -1 is used as the initial value of PLyTypeInfo.is_rowtype. Why not 0?

See the comments in struct PLyTypeInfo:

is_rowtype can be: -1 = not known yet (initial state); 0 = scalar
datatype; 1 = rowtype; 2 = rowtype, but I/O functions not set up yet

 - PLy_(input|output)_tuple_funcs() in PLy_trigger_handler() is added.
 The comment says it should check for the possibility that the
 relation's tupdesc changed since last call. Is it true that tupdesc
 may change even in one statement? And it also says the two functions
 are responsible  for not doing repetitive work, but ISTM they are not
 doing something to stop redundant work. The function doesn't seem like
 lightweight enough to be called on each row.

Hm, you may be right. I haven't touched that part of the code, but now
it seems to me that for triggers we do the I/O funcs lookup for every
row. I could try to check that and fix it, but not sure if I'll have the
time, and it's been that way before. Also, the CF is already closed in
theory...

 - There are elog() and PLy_elog() overall, but it looks like to me
 that the usage is inconsistent. Moreover, elog(ERROR, PLyTypeInfo
 struct is initialized for a Datum); in
 PLy_(input|output)_tuple_funcs() should be Assert() not elog()?

Well in theory PLy_elog should be used if there's a current Python
exception set that you'd like to forward to Postgres, making it a
elog(ERROR). It's true that the usage is sometimes a bit inconsistent,
some of these inconsistencies are cleaned up in the next patches, but
probably not all of them. As for the Assert/elog, I would prefer en
elog, because if there are bugs in that code, using the wrong I/O
functions could lead to unexpected results i production (where an Assert
would not be present).

 - A line break should be added before PLy_add_exception() after static void

Oops, you're right.

 - This is also not in the patch, but the comment
 /* these should only be called once at the first call
  * of plpython_call_handler.  initialize the python interpreter
  * and global data.
  */
 is bogus. PLy_init_interp() is called in _PG_init().

Yep, that comment looks misguided.

 That's all for now. Some of them must be my misunderstanding or
 trivial, but I post them for the record.

Thanks, your feedback it's very valuable!

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication logging

2011-01-19 Thread Simon Riggs
On Tue, 2011-01-18 at 10:57 -0500, Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
  Is there *any* usecase for setting them differently though?
 
 I can't believe we're still engaged in painting this bikeshed.  Let's
 just control it off log_connections and have done.

Yes, this is a waste of time. Leave it as is because its there for a
very good reason, as Robert already explained.

The code was put in explicitly because debugging replication connections
is quite important and prior to that addition, wasn't easy. It's a very
separate thing from logging the hundreds/thousands of other connections
on the system.

I don't really care about neatness of code, or neatness of parameters. I
want to be able to confirm the details of the connection.
pg_stat_replication is dynamic, not a historical record of connections
and reconnections.

How else will you diagnose an erratic network, or an accidental change
of authority? Replication is so important it isn't worth tidying away a
couple of lines of log. My objective is to make replication work, not to
reduce the size of the average log file by 1 or 2 lines.

I'm particularly concerned that people make such changes too quickly.
There are many things in this area of code that need changing, but also
many more that do not. If we are to move forwards we need to avoid going
one step forwards, one step back.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ToDo: support for parameters in EXECUTE statement

2011-01-19 Thread Pavel Stehule
Hello

The EXECUTE statement doesn't support a parametrization via
SPI_execute_with_args call and PQexecParams too. It can be a security
issue. If somebody use a prepared statement as protection to sql
injection, then all security goes out, because he has to call EXECUTE
without parametrization.

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread Hitoshi Harada
2011/1/19 Jan Urbański wulc...@wulczer.org:
 On 19/01/11 02:06, Hitoshi Harada wrote:
 - -1 is used as the initial value of PLyTypeInfo.is_rowtype. Why not 0?

 See the comments in struct PLyTypeInfo:

 is_rowtype can be: -1 = not known yet (initial state); 0 = scalar
 datatype; 1 = rowtype; 2 = rowtype, but I/O functions not set up yet

Well, I mean that 0 = inital, 1 = scalar, 2 = rowtype, 3 = rowtype but
not set up sounds more intuitive to me. Of course this is only what I
feel.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo: support for parameters in EXECUTE statement

2011-01-19 Thread Heikki Linnakangas

On 19.01.2011 12:53, Pavel Stehule wrote:

The EXECUTE statement doesn't support a parametrization via
SPI_execute_with_args call and PQexecParams too. It can be a security
issue. If somebody use a prepared statement as protection to sql
injection, then all security goes out, because he has to call EXECUTE
without parametrization.


Why don't you use SPI_prepare and SPI_open_query ?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo: support for parameters in EXECUTE statement

2011-01-19 Thread Pavel Stehule
2011/1/19 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 19.01.2011 12:53, Pavel Stehule wrote:

 The EXECUTE statement doesn't support a parametrization via
 SPI_execute_with_args call and PQexecParams too. It can be a security
 issue. If somebody use a prepared statement as protection to sql
 injection, then all security goes out, because he has to call EXECUTE
 without parametrization.

 Why don't you use SPI_prepare and SPI_open_query ?

I have to execute a session's prepared statement - created with
PREPARE statement.

Pavel




 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Simon Riggs
On Tue, 2011-01-18 at 19:04 +, Magnus Hagander wrote:
 Log replication connections only when log_connections is on
 
 Previously we'd always log replication connections, with no
 way to turn them off.

You noted that the code was there intentionally, yet you also couldn't
see the reason. That is not a great reason to change it. It's especially
not a great reason to make the change quickly.

The log entry served a very specific purpose, which was helping people
to diagnose problems with replication connectivity. It isn't practical
or sensible to force people to use log_connections to help with that;
that is a sledgehammer to crack a nut. Few people have it turned on in
production, so turning it on after a problem happened doesn't help
diagnose things.

The negative impact of this was a couple of log lines. No bug. Plus I
don't see any reason to introduce an incompatibility with the log output
from 9.0.

So removing this has almost no positive benefit, yet a clear negative
one. We need to look at the negative aspects of changes before we make
them.

Let's concentrate on adding the major features, not rush through loads
of minor changes.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-19 Thread Magnus Hagander
On Wed, Jan 19, 2011 at 06:14, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Jan 19, 2011 at 4:12 AM, Magnus Hagander mag...@hagander.net wrote:
 Ah, ok. I've added the errorcode now, PFA. I also fixed an error in
 the change for result codes I broke in the last patch. github branch
 updated as usual.

 Great. Thanks for the quick update!

 Here are another comments:

 + * IDENTIFICATION
 + *               src/bin/pg_basebackup.c

 Typo: s/src/bin/pg_basebackup.c/src/bin/pg_basebackup/pg_basebackup.c

Oops.


 +       printf(_(  -c, --checkpoint=fast|slow\n
 +                                                    set fast or slow 
 checkpoinging\n));

 Typo: s/checkpoinging/checkpointing

 The fast or slow seems to lead users to always choose fast. Instead,
 what about fast or smooth, fast or spread or immediate or delayed?

Hmm. fast or spread seems reasonable to me. And I want to use fast
for the fast version, because that's what we call it when you use
pg_start_backup(). I'll go change it to spread for now  - it's the one
I can find used in the docs.


 You seem to have forgotten to support --checkpoint long option.
 The struct long_options needs to be updated.

Wow, that clearly went too fast. Fixed as wlel.


 What if pg_basebackup receives a signal while doing a backup?
 For example, users might do Ctrl-C to cancel the long-running backup.
 We should define a signal handler and send a Terminate message
 to the server to cancel the backup?

Nah, we'll just disconnect and it'll deal with things that way. Just
like we do with e.g. pg_dump. I don't see the need to complicate it
with that.

(new patch on github in 5 minutes)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Magnus Hagander
On Wed, Jan 19, 2011 at 13:36, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2011-01-18 at 19:04 +, Magnus Hagander wrote:
 Log replication connections only when log_connections is on

 Previously we'd always log replication connections, with no
 way to turn them off.

 You noted that the code was there intentionally, yet you also couldn't
 see the reason. That is not a great reason to change it. It's especially
 not a great reason to make the change quickly.

Yes. And I brought it up in discussion, and the consensus was to
change it to be consistent.

And per that thread, there was also no consensus on ever *adding* it
that way - if I'd realized it was that way back then, I would've
objected at the time. I didn't realize it until I started seeing the
systems in production more.


 The log entry served a very specific purpose, which was helping people
 to diagnose problems with replication connectivity. It isn't practical
 or sensible to force people to use log_connections to help with that;
 that is a sledgehammer to crack a nut. Few people have it turned on in
 production, so turning it on after a problem happened doesn't help
 diagnose things.

Again, if you read the thread, the consensus was to do it the simple
way and make it configurable by that one. The other suggestion was to
turn log_connections into an enum, which was considered unnecessarily
complicated.


 The negative impact of this was a couple of log lines. No bug. Plus I
 don't see any reason to introduce an incompatibility with the log output
 from 9.0.

The inability to remove it has certainly annoyed me, and I know a few
others who have commented on it. And it's inconsistent behavior,
something we in general try to avoid.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump directory archive format / parallel pg_dump

2011-01-19 Thread Heikki Linnakangas

On 19.01.2011 07:45, Joachim Wieland wrote:

On Mon, Jan 17, 2011 at 5:38 PM, Jaime Casanovaja...@2ndquadrant.com  wrote:

This one is the last version of this patch? if so, commitfest app
should be updated to reflect that


Here are the latest patches all of them also rebased to current HEAD.
Will update the commitfest app as well.


What's the idea of storing the file sizes in the toc file? It looks like 
it's not used for anything.


It would be nice to have this format match the tar format. At the 
moment, there's a couple of cosmetic differences:


* TOC file is called TOC, instead of toc.dat

* blobs TOC file is called BLOBS.TOC instead of blobs.toc

* each blob is stored as blobs/oid.dat, instead of blob_oid.dat

The only significant difference is that in the directory archive format, 
each data file has a header in the beginning.


What are the benefits of the data file header? Would it be better to 
leave it out, so that the format would be identical to the tar format? 
You could then just tar up the directory to get a tar archive, or vice 
versa.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Simon Riggs
On Wed, 2011-01-19 at 13:44 +0100, Magnus Hagander wrote:
 On Wed, Jan 19, 2011 at 13:36, Simon Riggs si...@2ndquadrant.com wrote:
  On Tue, 2011-01-18 at 19:04 +, Magnus Hagander wrote:
  Log replication connections only when log_connections is on
 
  Previously we'd always log replication connections, with no
  way to turn them off.
 
  You noted that the code was there intentionally, yet you also couldn't
  see the reason. That is not a great reason to change it. It's especially
  not a great reason to make the change quickly.
 
 Yes. And I brought it up in discussion, and the consensus was to
 change it to be consistent.

No it wasn't. Robert explained the reason it was there.

Why make the change? Why make it quickly? Why avoid and ignore the CF?

  The negative impact of this was a couple of log lines. No bug. Plus I
  don't see any reason to introduce an incompatibility with the log output
  from 9.0.
 
 The inability to remove it has certainly annoyed me, and I know a few
 others who have commented on it. And it's inconsistent behavior,
 something we in general try to avoid.

I care about diagnosing problems on production systems. There will be
many more people annoyed by the inability to diagnose issues then there
will be by people who care lots about a couple of log lines, or who care
about a few people's views on inconsistency.

Is your annoyance worth more than causing newbies problems and being
unable to diagnose production systems?

How will we diagnose erratic connection problems now?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Merlin Moncure
On Tue, Jan 18, 2011 at 12:44 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncure mmonc...@gmail.com wrote:
 a few weeks back I hacked an experimental patch that removed the hint
 bit action completely.  the results were very premature and/or
 incorrect, but my initial findings suggested that hint bits might not
 be worth the cost from performance standpoint.  i'd like to see some
 more investigation in this direction before going with a complex
 application mechanism (although that would be beneficial vs the status
 quo).

 I think it's not very responsible to allege that hint bits aren't
 providing a benefit without providing the patch that you used and the
 tests that you ran.  This is a topic that needs careful analysis, and
 I think that saying hint bits don't provide a benefit... maybe...
 doesn't do anything but confuse the issue.  How about doing some tests
 with the patch from my OP and posting the results?  If removing hint
 bits entirely doesn't degrade performance, then surely the
 less-drastic approach I've taken here ought to be OK too.  But in my
 testing, it didn't look too good.

hm. well, I would have to agree on the performance hit -- I figure 5%
scan penalty should be about the maximum you'd want to pay to get the
i/o reduction.  Odds are you're correct and I blew something...I'd be
happy to test your patch.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Simon Riggs
On Wed, 2011-01-19 at 12:55 +, Simon Riggs wrote:

 How will we diagnose erratic connection problems now?

The point here is that your effort to *remove* pointless log lines now
means that we cannot diagnose production problems with replication
unless we now *add* hundreds of pointless log lines.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Merlin Moncure
On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jan 18, 2011 at 12:44 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncure mmonc...@gmail.com wrote:
 a few weeks back I hacked an experimental patch that removed the hint
 bit action completely.  the results were very premature and/or
 incorrect, but my initial findings suggested that hint bits might not
 be worth the cost from performance standpoint.  i'd like to see some
 more investigation in this direction before going with a complex
 application mechanism (although that would be beneficial vs the status
 quo).

 I think it's not very responsible to allege that hint bits aren't
 providing a benefit without providing the patch that you used and the
 tests that you ran.  This is a topic that needs careful analysis, and
 I think that saying hint bits don't provide a benefit... maybe...
 doesn't do anything but confuse the issue.  How about doing some tests
 with the patch from my OP and posting the results?  If removing hint
 bits entirely doesn't degrade performance, then surely the
 less-drastic approach I've taken here ought to be OK too.  But in my
 testing, it didn't look too good.

 hm. well, I would have to agree on the performance hit -- I figure 5%
 scan penalty should be about the maximum you'd want to pay to get the
 i/o reduction.  Odds are you're correct and I blew something...I'd be
 happy to test your patch.

Ah, I tested your patch vs stock postgres vs my patch, basically your
results are unhappily correct (mine was just a hair faster than yours
which you'd expect).  The differential was even wider on my laptop
class hardware, maybe 26%.  I also agree that even if the penalty was
reduced or determined to be worth it anyways, your approach to move
the setting/i/o around to appropriate places is the way to go vs
wholesale removal, unless some way is found to reduce clog lookup
penalty to a fraction of what it is now (not likely, I didn't profile
but I bet a lot of the problem is the lw lock).  Interesting I didn't
notice this on my original test :(.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump directory archive format / parallel pg_dump

2011-01-19 Thread Joachim Wieland
On Wed, Jan 19, 2011 at 7:47 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Here are the latest patches all of them also rebased to current HEAD.
 Will update the commitfest app as well.

 What's the idea of storing the file sizes in the toc file? It looks like
 it's not used for anything.

It's part of the overall idea to make sure files are not inadvertently
exchanged between different backups and that a file is not truncated.
In the future I'd also like to add a checksum to the TOC so that a
backup can be checked for integrity. This will cost performance but
with the parallel backup it can be distributed to several processors.

 It would be nice to have this format match the tar format. At the moment,
 there's a couple of cosmetic differences:

 * TOC file is called TOC, instead of toc.dat

 * blobs TOC file is called BLOBS.TOC instead of blobs.toc

 * each blob is stored as blobs/oid.dat, instead of blob_oid.dat

That can be done easily...

 The only significant difference is that in the directory archive format,
 each data file has a header in the beginning.

 What are the benefits of the data file header? Would it be better to leave
 it out, so that the format would be identical to the tar format? You could
 then just tar up the directory to get a tar archive, or vice versa.

The header is there to identify a file, it contains the header that
every other pgdump file contains, including the internal version
number and the unique backup id.

The tar format doesn't support compression so going from one to the
other would only work for an uncompressed archive and special care
must be taken to get the order of the tar file right.

If you want to drop the header altogether, fine with me but if it's
just for the tar - directory conversion, then I am failing to see
what the use case of that would be.

A tar archive has the advantage that you can postprocess the dump data
with other tools  but for this we could also add an option that gives
you only the data part of a dump file (and uncompresses it at the same
time if compressed). Once we have that however, the question is what
anybody would then still want to use the tar format for...


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 How will we diagnose erratic connection problems now?

As Heikki pointed out, the slave still logs this information, so we
can look there.  If that's enough, yeah, you'll have to turn
log_connections on on the master, but I don't think that will be very
frequent, and it's not an unmanageable burden anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Heikki Linnakangas

On 19.01.2011 15:56, Merlin Moncure wrote:

On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncuremmonc...@gmail.com  wrote:

On Tue, Jan 18, 2011 at 12:44 PM, Robert Haasrobertmh...@gmail.com  wrote:

On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com  wrote:

a few weeks back I hacked an experimental patch that removed the hint
bit action completely.  the results were very premature and/or
incorrect, but my initial findings suggested that hint bits might not
be worth the cost from performance standpoint.  i'd like to see some
more investigation in this direction before going with a complex
application mechanism (although that would be beneficial vs the status
quo).


I think it's not very responsible to allege that hint bits aren't
providing a benefit without providing the patch that you used and the
tests that you ran.  This is a topic that needs careful analysis, and
I think that saying hint bits don't provide a benefit... maybe...
doesn't do anything but confuse the issue.  How about doing some tests
with the patch from my OP and posting the results?  If removing hint
bits entirely doesn't degrade performance, then surely the
less-drastic approach I've taken here ought to be OK too.  But in my
testing, it didn't look too good.


hm. well, I would have to agree on the performance hit -- I figure 5%
scan penalty should be about the maximum you'd want to pay to get the
i/o reduction.  Odds are you're correct and I blew something...I'd be
happy to test your patch.


Ah, I tested your patch vs stock postgres vs my patch, basically your
results are unhappily correct (mine was just a hair faster than yours
which you'd expect).  The differential was even wider on my laptop
class hardware, maybe 26%.  I also agree that even if the penalty was
reduced or determined to be worth it anyways, your approach to move
the setting/i/o around to appropriate places is the way to go vs
wholesale removal, unless some way is found to reduce clog lookup
penalty to a fraction of what it is now (not likely, I didn't profile
but I bet a lot of the problem is the lw lock).  Interesting I didn't
notice this on my original test :(.


One thing to note is that the current visibility-checking code is 
optimized for the case that the hint bit is set, and the codepath where 
it's not is not particularly fast. HeapTupleSatisfiesMVCC does a lot of 
things besides checking the clog. For xmin:


1. Check HEAP_MOVED_OFF / HEAP_MOVED_IN
2. Check if xmin is the current transaction with 
TransactionIdIsCurrentTransactionId()

3. Check if xmin is still in progress with TransactionIdIsInProgress()
4. And finally, check the clog with TransactionIdDidCommit()

It would be nice to profile the code to see where the time really is 
spent. Most of it is probably in the clog access, but the 
TransactionIdInProgress() call can be quite expensive too if there's a 
lot of concurrent backends.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Simon Riggs
On Wed, 2011-01-19 at 09:08 -0500, Robert Haas wrote:
 On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
  How will we diagnose erratic connection problems now?
 
 As Heikki pointed out, the slave still logs this information, so we
 can look there.  If that's enough, yeah, you'll have to turn
 log_connections on on the master, but I don't think that will be very
 frequent, and it's not an unmanageable burden anyway.

So now we have to check *all* of the standby logs in order to check that
replication on the master is working without problems. There will be no
default capability to tie up events on the master with failures of
replication. Events occurring on multiple standbys will not be picked up
as a correlated set of events.

It's possible, but its not a realistic alternative.

We've lost something and gained almost nothing. We need to avoid making
this kind of improvement.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread David Fetter
On Wed, Jan 19, 2011 at 11:09:56AM +0100, Jan Urbański wrote:
 On 19/01/11 02:06, Hitoshi Harada wrote:
  2011/1/19 Peter Eisentraut pete...@gmx.net:
  On mån, 2011-01-17 at 21:49 +0200, Peter Eisentraut wrote:
  On sön, 2011-01-02 at 12:41 +0100, Jan Urbański wrote:
  Here they are. There are 16 patches in total. They amount to what's
  currently in my refactor branch (and almost to what I've sent as the
  complete refactor patch, while doing the splitting I discovered a few
  useless hunks that I've discarded).
 
  Committed 0001, rest later.
 
  Today committed: 3, 5, 10, 11, 12, 13
  
  I have reviewed this original patches for myself as the fundamental of
  subsequent work, and have comments.
  
  - This is not in the patch, but around line 184 vis versa in comment
  seems like typo.
 
 Right, should certainly be vice versa.
 
  - -1 is used as the initial value of PLyTypeInfo.is_rowtype. Why not 0?
 
 See the comments in struct PLyTypeInfo:
 
 is_rowtype can be: -1 = not known yet (initial state); 0 = scalar
 datatype; 1 = rowtype; 2 = rowtype, but I/O functions not set up yet
 
  - PLy_(input|output)_tuple_funcs() in PLy_trigger_handler() is added.
  The comment says it should check for the possibility that the
  relation's tupdesc changed since last call. Is it true that tupdesc
  may change even in one statement? And it also says the two functions
  are responsible  for not doing repetitive work, but ISTM they are not
  doing something to stop redundant work. The function doesn't seem like
  lightweight enough to be called on each row.
 
 Hm, you may be right. I haven't touched that part of the code, but now
 it seems to me that for triggers we do the I/O funcs lookup for every
 row. I could try to check that and fix it, but not sure if I'll have the
 time, and it's been that way before. Also, the CF is already closed in
 theory...

If you're fixing things in PL/PythonU, such a change would certainly
be in scope as an update to your patch, i.e. don't let the fact that
the CF has started stop you from fixing it.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Florian Pflug
On Jan19, 2011, at 16:16 , Simon Riggs wrote:
 On Wed, 2011-01-19 at 09:08 -0500, Robert Haas wrote:
 On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 How will we diagnose erratic connection problems now?
 
 As Heikki pointed out, the slave still logs this information, so we
 can look there.  If that's enough, yeah, you'll have to turn
 log_connections on on the master, but I don't think that will be very
 frequent, and it's not an unmanageable burden anyway.
 
 So now we have to check *all* of the standby logs in order to check that
 replication on the master is working without problems. There will be no
 default capability to tie up events on the master with failures of
 replication. Events occurring on multiple standbys will not be picked up
 as a correlated set of events.

I don't see for what kind of problems logging replication connections is
the right way to monitor replication.

To check that all slaves are connected and are streaming WAL, one ought to
monitor pg_stat_replication, no? Once a slave vanishes from there (or falls
back too far) you'd inspect the *slave's* log to find the reason. Inspecting
the master's log instead will always be a poor replacement, since some failure
conditions won't leave a trace there (Connectivity problems, for example).

Could you explain the failure condition you do have in mind where logging
replication connections unconditionally is beneficial? 

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 8:56 AM, Merlin Moncure mmonc...@gmail.com wrote:
 Ah, I tested your patch vs stock postgres vs my patch, basically your
 results are unhappily correct (mine was just a hair faster than yours
 which you'd expect).  The differential was even wider on my laptop
 class hardware, maybe 26%.  I also agree that even if the penalty was
 reduced or determined to be worth it anyways, your approach to move
 the setting/i/o around to appropriate places is the way to go vs
 wholesale removal, unless some way is found to reduce clog lookup
 penalty to a fraction of what it is now (not likely, I didn't profile
 but I bet a lot of the problem is the lw lock).  Interesting I didn't
 notice this on my original test :(.

OK.  My apologies for the email yesterday in which I forgot that you
actually HAD posted a patch, but thanks for testing mine and posting
your results (and thanks also to Andrea for pointing out the oversight
to me).

Here's a new version of the patch based on some experimentation with
ideas I posted yesterday.  At least on my Mac laptop, this is pretty
effective at blunting the response time spike for the first table
scan, and it converges to steady-state after about 20 tables scans.
Rather than write every 20th page, what I've done here is make every
2000'th buffer allocation grant an allowance of 100 hint bit only
writes.  All dirty pages and the next 100 pages that are
dirty-only-for-hint-bits get written out.  Then we stop writing the
dirty-only-for-hint-bits-pages until we get our next allowance of
writes.  The idea is to try to avoid creating a lot of random writes
on each scan through the table.  At least here, that seems to work
pretty well - the initial scan is only about 25% slower than the
steady state (rather than 6x or more slower).

I am seeing occasional latency spikes that appear to be the result of
the OS write cache filling up and deciding that it has to flush
everything to disk before writing anything more.  I'm not too
concerned about that because this is a fairly artificial test case
(one doesn't usually sit around doing consecutive SELECT sum(1) FROM s
commands) but it seems like pretty odd behavior.  The system sits
there doing no writes at all as I'm sending more and more dirty pages
into the system buffer cache and then, boom, write storm.  I haven't
yet tested to see if the same behavior occurs on Linux.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


bm-hint-bits-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 So now we have to check *all* of the standby logs in order to check that
 replication on the master is working without problems. There will be no
 default capability to tie up events on the master with failures of
 replication. Events occurring on multiple standbys will not be picked up
 as a correlated set of events.

This is pure FUD.  If you suspect such a problem, turn on
log_connections.  If you aren't suspecting such a problem, you likely
aren't looking in the postmaster log anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Simon Riggs
On Wed, 2011-01-19 at 16:42 +0100, Florian Pflug wrote:
 
 Could you explain the failure condition you do have in mind where
 logging replication connections unconditionally is beneficial? 

Sure. 

Replication drops and immediately reconnects during night.
When did that happen? How many times did it happen? For how long did the
disconnection last? Why did it happen? How does that correlate with
other situations? That info is not available easily.

pg_stat_replication is necessary, and is separate from pg_stat_activity
because replication connections are not the same as normal connections.
We even have a separate permission for replication, to further
demonstrate that.

Replication connections being logged differently from normal connections
was not an anomaly, nor was it wasteful. By treating them the same we
are forced to log too much, instead of just enough.

Replication connections are not even logged unconditionally. You need to
enable replication before they appear in the log. And that is exactly
the time the log entries are helpful.

The question we should have asked is Why is removing those log entries
helpful?. I shouldn't have to justify putting something back, when the
good reason for its existence was previously explained and there was no
consensus to remove in the first place.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread Jan Urbański
On 19/01/11 16:35, David Fetter wrote:
 On Wed, Jan 19, 2011 at 11:09:56AM +0100, Jan Urbański wrote:
 On 19/01/11 02:06, Hitoshi Harada wrote:
 - PLy_(input|output)_tuple_funcs() in PLy_trigger_handler() is added.
 The comment says it should check for the possibility that the
 relation's tupdesc changed since last call. Is it true that tupdesc
 may change even in one statement? And it also says the two functions
 are responsible  for not doing repetitive work, but ISTM they are not
 doing something to stop redundant work. The function doesn't seem like
 lightweight enough to be called on each row.

 Hm, you may be right. I haven't touched that part of the code, but now
 it seems to me that for triggers we do the I/O funcs lookup for every
 row. I could try to check that and fix it, but not sure if I'll have the
 time, and it's been that way before. Also, the CF is already closed in
 theory...

I looked again and it seems that PLy_(input|output)_tuple_funcs does
indeed avoid repetitive work. Observe that in the loop going over the
TupleDesc attributes there's

if (arg-in.r.atts[i].typoid == desc-attrs[i]-atttypid)
continue;  /* already set up this entry */

which avoids the syscache lookup and calling PLy_input_datum_func2.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Merlin Moncure
On Wed, Jan 19, 2011 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote:
 Here's a new version of the patch based on some experimentation with
 ideas I posted yesterday.  At least on my Mac laptop, this is pretty
 effective at blunting the response time spike for the first table
 scan, and it converges to steady-state after about 20 tables scans.
 Rather than write every 20th page, what I've done here is make every
 2000'th buffer allocation grant an allowance of 100 hint bit only
 writes.  All dirty pages and the next 100 pages that are
 dirty-only-for-hint-bits get written out.  Then we stop writing the
 dirty-only-for-hint-bits-pages until we get our next allowance of
 writes.  The idea is to try to avoid creating a lot of random writes
 on each scan through the table.  At least here, that seems to work
 pretty well - the initial scan is only about 25% slower than the
 steady state (rather than 6x or more slower).

does this only impact the scan case?  in oltp scenarios you want to
write out the bits asap, i would imagine.   what about time based
flushing, so that only x dirty hint bit pages can be written out per
time unit y?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 11:05 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The question we should have asked is Why is removing those log entries
 helpful?. I shouldn't have to justify putting something back, when the
 good reason for its existence was previously explained and there was no
 consensus to remove in the first place.

This is simply revisionist history.  It was discussed for several
days, and consensus was eventually reached.  The fact that you chose
not to participate in that discussion until after it was over doesn't
mean we didn't have it, and the fact that you don't agree with the
outcome doesn't mean there wasn't consensus.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Simone Aiken

 Robert
 
 I think the first 
 thing to do would be to try to come up with a reproducible test case 
 where clustering the tables improves performance.  


On that note, is there any standard way you guys do benchmarks?  


 Bruce

I think CLUSTER is a win when you are looking up multiple rows in the same
table, either using a non-unique index or a range search.  What places do
such lookups?  Having them all in adjacent pages would be a win ---
single-row lookups are usually not.


Mostly the tables that track column level data.  Typically you will want to
grab rows for multiple columns for a given table at once so it would be
helpful to have them be contiguous on disk. 

I could design a benchmark to display this by building a thousand tables one
column at a time using 'alter add column' to scatter the catalog rows for
the tables across many blocks.  So they'll be a range with column 1 for each
table and column 2 for each table and column three for each table.  Then
fill a couple data tables with a lot of data and set some noise makers to
loop through them over and over with full table scans ... filling up cache
with unrelated data and hopefully ageing out the cache of the pg_tables.
Then do some benchmark index lookup queries to see the retrieval time before
and after clustering the pg_ctalog tables to record a difference.

If the criteria is doesn't hurt anything and helps a little I think this
passes.  Esp since clusters aren't maintained automatically so adding them
has no negative impact on insert or update.  It'd just be a nice thing to do
if you know it can be done that doesn't harm anyone who doesn't know.


-Simone Aiken





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Merlin Moncure
On Wed, Jan 19, 2011 at 9:13 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 19.01.2011 15:56, Merlin Moncure wrote:

 On Wed, Jan 19, 2011 at 7:57 AM, Merlin Moncuremmonc...@gmail.com
  wrote:

 On Tue, Jan 18, 2011 at 12:44 PM, Robert Haasrobertmh...@gmail.com
  wrote:

 On Tue, Jan 18, 2011 at 9:24 AM, Merlin Moncuremmonc...@gmail.com
  wrote:

 a few weeks back I hacked an experimental patch that removed the hint
 bit action completely.  the results were very premature and/or
 incorrect, but my initial findings suggested that hint bits might not
 be worth the cost from performance standpoint.  i'd like to see some
 more investigation in this direction before going with a complex
 application mechanism (although that would be beneficial vs the status
 quo).

 I think it's not very responsible to allege that hint bits aren't
 providing a benefit without providing the patch that you used and the
 tests that you ran.  This is a topic that needs careful analysis, and
 I think that saying hint bits don't provide a benefit... maybe...
 doesn't do anything but confuse the issue.  How about doing some tests
 with the patch from my OP and posting the results?  If removing hint
 bits entirely doesn't degrade performance, then surely the
 less-drastic approach I've taken here ought to be OK too.  But in my
 testing, it didn't look too good.

 hm. well, I would have to agree on the performance hit -- I figure 5%
 scan penalty should be about the maximum you'd want to pay to get the
 i/o reduction.  Odds are you're correct and I blew something...I'd be
 happy to test your patch.

 Ah, I tested your patch vs stock postgres vs my patch, basically your
 results are unhappily correct (mine was just a hair faster than yours
 which you'd expect).  The differential was even wider on my laptop
 class hardware, maybe 26%.  I also agree that even if the penalty was
 reduced or determined to be worth it anyways, your approach to move
 the setting/i/o around to appropriate places is the way to go vs
 wholesale removal, unless some way is found to reduce clog lookup
 penalty to a fraction of what it is now (not likely, I didn't profile
 but I bet a lot of the problem is the lw lock).  Interesting I didn't
 notice this on my original test :(.

 One thing to note is that the current visibility-checking code is optimized
 for the case that the hint bit is set, and the codepath where it's not is
 not particularly fast. HeapTupleSatisfiesMVCC does a lot of things besides
 checking the clog. For xmin:

 1. Check HEAP_MOVED_OFF / HEAP_MOVED_IN
 2. Check if xmin is the current transaction with
 TransactionIdIsCurrentTransactionId()
 3. Check if xmin is still in progress with TransactionIdIsInProgress()
 4. And finally, check the clog with TransactionIdDidCommit()

 It would be nice to profile the code to see where the time really is spent.
 Most of it is probably in the clog access, but the TransactionIdInProgress()
 call can be quite expensive too if there's a lot of concurrent backends.

Nice thought -- it's worth checking out. I'll play around with it some
more -- I think you're right and the first step is to profile.  If the
bottleneck is in fact the lock there's not much that can be done
afaict.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 11:18 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Jan 19, 2011 at 10:44 AM, Robert Haas robertmh...@gmail.com wrote:
 Here's a new version of the patch based on some experimentation with
 ideas I posted yesterday.  At least on my Mac laptop, this is pretty
 effective at blunting the response time spike for the first table
 scan, and it converges to steady-state after about 20 tables scans.
 Rather than write every 20th page, what I've done here is make every
 2000'th buffer allocation grant an allowance of 100 hint bit only
 writes.  All dirty pages and the next 100 pages that are
 dirty-only-for-hint-bits get written out.  Then we stop writing the
 dirty-only-for-hint-bits-pages until we get our next allowance of
 writes.  The idea is to try to avoid creating a lot of random writes
 on each scan through the table.  At least here, that seems to work
 pretty well - the initial scan is only about 25% slower than the
 steady state (rather than 6x or more slower).

 does this only impact the scan case?  in oltp scenarios you want to
 write out the bits asap, i would imagine.   what about time based
 flushing, so that only x dirty hint bit pages can be written out per
 time unit y?

No, it doesn't only affect the scan case.  But I don't think that's
bad.  The goal is for the background writer to provide enough clean
pages that backends don't have to write anything at all.  If that's
not happening, the backends will be slowed by the need to write out
pages themselves in order to create a sufficient supply of clean pages
to satisfy their allocation needs.  The easiest way for that situation
to occur is if the backend is doing a large sequential scan of a table
- in that case, it's by definition cycling through pages at top speed,
and the fact that it's cycling through them in a ring buffer rather
than using all of shared_buffers makes the loop even tighter.  But if
it's possible under some other set of circumstances, the behavior is
still reasonable.  This behavior kicks in if more than 100 out of some
set of 2000 page allocations would require a write only for the
purpose of flushing hint bits.

Time-based flushing would be problematic in several respects.  First,
it would require a kernel call, which would be vastly more expensive
than what I'm doing now, and might have undesirable performance
implications for that reason.  Second, I don't think it would be the
right way to tune it even if that were not an issue.  It doesn't
really matter whether the system takes a millisecond or a microsecond
or a nanosecond to write each buffer - what matters is that writing
all the buffers is a lot slower than writing none of them.  So what we
want to do is write a percentage of them, in a way that guarantees
that they'll all eventually get written if people continue to access
the same data.  This does that, and a time-based setting would not; it
would also almost certainly require tuning based on the I/O capacities
of the system it's running on, which isn't necessary with this
approach.

Before we get too deeply involved in theory, can you give this a test
drive on your system and see how it looks?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Robert Haas
On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 opinion isn't strong in this topic. One or twenty useless detoasting
 isn't really significant in almost use cases (problem is thousands
 detoasting).

Yeah.  Many-times-repeated detoasting is really bad, and this is not
the only place in the backend where we have this problem.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... So what we
 want to do is write a percentage of them, in a way that guarantees
 that they'll all eventually get written if people continue to access
 the same data.

The word guarantee seems quite inappropriate here, since as far as I
can see this approach provides no such guarantee --- even after many
cycles you'd never be really certain all the bits were set.

What I asked for upthread was that we continue to have some
deterministic, practical way to force all hint bits in a table to be
set.  This is not *remotely* responding to that request.  It's still not
deterministic, and even if it were, vacuuming a large table 20 times
isn't a very practical solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication logging

2011-01-19 Thread Bruce Momjian
Simon Riggs wrote:
 On Tue, 2011-01-18 at 10:57 -0500, Tom Lane wrote:
  Magnus Hagander mag...@hagander.net writes:
   Is there *any* usecase for setting them differently though?
  
  I can't believe we're still engaged in painting this bikeshed.  Let's
  just control it off log_connections and have done.
 
 Yes, this is a waste of time. Leave it as is because its there for a
 very good reason, as Robert already explained.
 
 The code was put in explicitly because debugging replication connections
 is quite important and prior to that addition, wasn't easy. It's a very
 separate thing from logging the hundreds/thousands of other connections
 on the system.
 
 I don't really care about neatness of code, or neatness of parameters. I
 want to be able to confirm the details of the connection.
 pg_stat_replication is dynamic, not a historical record of connections
 and reconnections.
 
 How else will you diagnose an erratic network, or an accidental change
 of authority? Replication is so important it isn't worth tidying away a
 couple of lines of log. My objective is to make replication work, not to
 reduce the size of the average log file by 1 or 2 lines.
 
 I'm particularly concerned that people make such changes too quickly.
 There are many things in this area of code that need changing, but also
 many more that do not. If we are to move forwards we need to avoid going
 one step forwards, one step back.

There were enough people who wanted a change that we went ahead and did
it --- if there was lack of agreement, we would have delayed longer.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Pavel Stehule
2011/1/19 Robert Haas robertmh...@gmail.com:
 On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 opinion isn't strong in this topic. One or twenty useless detoasting
 isn't really significant in almost use cases (problem is thousands
 detoasting).

 Yeah.  Many-times-repeated detoasting is really bad, and this is not
 the only place in the backend where we have this problem.  :-(


???

some general solution can be pretty hardcore - some like handlers
instead pointers on varlena :(

I know mainly about plpgsql performance problems - but it is my
interes. Plpgsql can be simply fixed - maybe 10 maybe less lines of
code.

Pavel

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication logging

2011-01-19 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Simon Riggs wrote:
 I'm particularly concerned that people make such changes too quickly.
 There are many things in this area of code that need changing, but also
 many more that do not. If we are to move forwards we need to avoid going
 one step forwards, one step back.

 There were enough people who wanted a change that we went ahead and did
 it --- if there was lack of agreement, we would have delayed longer.

The real reason why we changed this is that nobody (except Simon) sees
a situation where unconditional logging of successful replication
connections is especially helpful.  If you were trying to diagnose a
problem you would more likely need to know about *failed* connections,
but the code that was in there didn't provide that.  At least not unless
you turned on log_connections ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 opinion isn't strong in this topic. One or twenty useless detoasting
 isn't really significant in almost use cases (problem is thousands
 detoasting).

 Yeah.  Many-times-repeated detoasting is really bad, and this is not
 the only place in the backend where we have this problem.  :-(

Yeah, there's been some discussion of a more general solution, and I
think I even had a trial patch at one point (which turned out not to
work terribly well, but maybe somebody will have a better idea someday).
In the meantime, the proposal at hand seems like a bit of a stop-gap,
which is why I'd prefer to see something with a very minimal code
footprint.  Detoast at assignment would likely need only a few lines
of code added in a single place.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication logging

2011-01-19 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Simon Riggs wrote:
  I'm particularly concerned that people make such changes too quickly.
  There are many things in this area of code that need changing, but also
  many more that do not. If we are to move forwards we need to avoid going
  one step forwards, one step back.
 
  There were enough people who wanted a change that we went ahead and did
  it --- if there was lack of agreement, we would have delayed longer.
 
 The real reason why we changed this is that nobody (except Simon) sees
 a situation where unconditional logging of successful replication
 connections is especially helpful.  If you were trying to diagnose a
 problem you would more likely need to know about *failed* connections,
 but the code that was in there didn't provide that.  At least not unless
 you turned on log_connections ...

Yep.  

Simon, I understand you being disappointed you could not champion your
cause during the discussion, but you can still try to sway people that
the original code was appropriate.  

Based on the feedback from the group, it seemed unlikely you would be
able to sway a significant number of people so we just went ahead and
made the change.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Peter Eisentraut's message of mar ene 18 19:22:50 -0300 2011:
 #16: This is probably pointless because pgindent will reformat this.

 pgindent used to remove useless braces around single-statement blocks,
 but this behavior was removed years ago because it'd break formatting
 around PG_TRY blocks.

Yeah.  FWIW, I concur with Jan that this is a readability improvement.
A comment and a statement always look like two statements to me ---
and the fact that pgindent insists on inserting a blank line in front
of the comment in this scenario does even more to mangle the visual
impression of what the if controls.  +1 for the braces.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-19 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 What I'm worried about is the case where a tablespace is created
 under the $PGDATA directory.

What would be the sense of that?  If you're concerned about whether the
code handles it correctly, maybe the right solution is to add code to
CREATE TABLESPACE to disallow it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... So what we
 want to do is write a percentage of them, in a way that guarantees
 that they'll all eventually get written if people continue to access
 the same data.

 The word guarantee seems quite inappropriate here, since as far as I
 can see this approach provides no such guarantee --- even after many
 cycles you'd never be really certain all the bits were set.

 What I asked for upthread was that we continue to have some
 deterministic, practical way to force all hint bits in a table to be
 set.  This is not *remotely* responding to that request.  It's still not
 deterministic, and even if it were, vacuuming a large table 20 times
 isn't a very practical solution.

I get the impression you haven't spent as much time reading my email
as I spent writing it.  Perhaps I'm wrong, but in any case the code
doesn't do what you're suggesting.  In the most recently posted
version of this patch, which is v2, if VACUUM hits a page that is
hint-bit-dirty, it always writes it.  Full stop.  The 20 times bit
applies to a SELECT * FROM table, which is a rather different case.

As I write this, I realize that there is a small fly in the ointment
here, which is that neither VACUUM nor SELECT force out all the pages
they modify to disk.  So there is some small amount of remaining
nondeterminism, even if you VACUUM, because VACUUM will leave the last
few pages it dirties in shared_buffers, and whether those hint bits
hit the disk will depend on a decision made at the time they're
evicted, not at the time they were dirtied.  Possibly I could fix that
by making SetBufferCommitInfoNeedsSave() set BM_DIRTY during vacuum
and BM_HINT_BITS at other times.  That would nail the lid shut pretty
tight.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Oh, wait a minute: there's a bad restriction there, namely that a
 contrib module could only add loose operators that had different
 declared input types from the ones known to the core opclass.

 I would have though that such contrib would then need to offer their own
 opfamily and opclasses, and users would have to use the specific opclass
 manually like they do e.g. for text_pattern_ops.  Can't it work that way?

I think you missed the point: right now, to use both the core and
intarray operators on an integer[] column, you have to create *two*
GIN indexes, which will have exactly identical contents.  I'm looking
for a way to let intarray extend the core opfamily definition so that
one index can serve.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 opinion isn't strong in this topic. One or twenty useless detoasting
 isn't really significant in almost use cases (problem is thousands
 detoasting).

 Yeah.  Many-times-repeated detoasting is really bad, and this is not
 the only place in the backend where we have this problem.  :-(

 Yeah, there's been some discussion of a more general solution, and I
 think I even had a trial patch at one point (which turned out not to
 work terribly well, but maybe somebody will have a better idea someday).

I'm pretty doubtful that there's going to be a general solution to
this problem - I think it's going to require gradual refactoring of
problem spots.

 In the meantime, the proposal at hand seems like a bit of a stop-gap,
 which is why I'd prefer to see something with a very minimal code
 footprint.  Detoast at assignment would likely need only a few lines
 of code added in a single place.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Oh, wait a minute: there's a bad restriction there, namely that a
 contrib module could only add loose operators that had different
 declared input types from the ones known to the core opclass.

 I would have though that such contrib would then need to offer their own
 opfamily and opclasses, and users would have to use the specific opclass
 manually like they do e.g. for text_pattern_ops.  Can't it work that way?

 I think you missed the point: right now, to use both the core and
 intarray operators on an integer[] column, you have to create *two*
 GIN indexes, which will have exactly identical contents.  I'm looking
 for a way to let intarray extend the core opfamily definition so that
 one index can serve.

Maybe this is a dumb question, but why not just put whatever stuff
intarray[] adds directly into the core opfamily?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Noah Misch
On Wed, Jan 19, 2011 at 12:10:16PM -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, Jan 18, 2011 at 5:22 PM, Pavel Stehule pavel.steh...@gmail.com 
  wrote:
  opinion isn't strong in this topic. One or twenty useless detoasting
  isn't really significant in almost use cases (problem is thousands
  detoasting).
 
  Yeah.  Many-times-repeated detoasting is really bad, and this is not
  the only place in the backend where we have this problem.  :-(
 
 Yeah, there's been some discussion of a more general solution, and I
 think I even had a trial patch at one point (which turned out not to
 work terribly well, but maybe somebody will have a better idea someday).
 In the meantime, the proposal at hand seems like a bit of a stop-gap,
 which is why I'd prefer to see something with a very minimal code
 footprint.  Detoast at assignment would likely need only a few lines
 of code added in a single place.

What qualifies a patch as stop-gap scale?  Pavel's patch is ~60 lines.

If adding PLpgSQL_var.should_be_detoasted is your main pain point, testing
VARATT_IS_EXTENDED there might be the least-harmful way to avoid it.  Saving a
few more lines by moving the work to exec_assign_value probably does not justify
the associated performance regressions Pavel has cited.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Couple document fixes

2011-01-19 Thread Thom Brown
Hi,

I've attached a couple minor fixes to the docs.  One relating to
SECURITY LABEL and the other for pg_class.relpersistence

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


doc_fixes.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Couple document fixes

2011-01-19 Thread Kevin Grittner
Thom Brown t...@linux.com wrote:
 
 I've attached a couple minor fixes to the docs.  One relating to
 SECURITY LABEL and the other for pg_class.relpersistence
 
relpersistence should be typechar/type, not typechar/type.
Oddly enough, there is a difference.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Robert Haas
On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken
sai...@quietlycompetent.com wrote:
 Pages like this one have column comments for the system tables:

 http://www.psql.it/manuale/8.3/catalog-pg-attribute.html

Oh, I see.  I don't think we want to go there.  We'd need some kind of
system for keeping the two places in sync.  And there'd be no easy way
to upgrade the in-database descriptions when we upgraded to a newer
minor release, supposing they'd changed in the meantime.  And some of
the descriptions are quite long, so they wouldn't fit nicely in the
amount of space you typically have available when you run \d+.  And it
would enlarge the size of an empty database by however much was
required to store all those comments, which could be an issue for
PostgreSQL instances that have many small databases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Couple document fixes

2011-01-19 Thread Thom Brown
On 19 January 2011 18:11, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Thom Brown t...@linux.com wrote:

 I've attached a couple minor fixes to the docs.  One relating to
 SECURITY LABEL and the other for pg_class.relpersistence

 relpersistence should be typechar/type, not typechar/type.
 Oddly enough, there is a difference.

 -Kevin

relkind in the same table is the same type, but isn't displayed as
char in the docs, and the same applies to many other system tables.
They would need changing too then.

Examples are:

pg_type.typtype
pg_proc.provolatile
pg_attribute.attstorage

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think you missed the point: right now, to use both the core and
 intarray operators on an integer[] column, you have to create *two*
 GIN indexes, which will have exactly identical contents. I'm looking
 for a way to let intarray extend the core opfamily definition so that
 one index can serve.

 Maybe this is a dumb question, but why not just put whatever stuff
 intarray[] adds directly into the core opfamily?

AFAICS that means integrating contrib/intarray into core.  Independently
of whether that's a good idea or not, PG is supposed to be an extensible
system, so it would be nice to have a solution that supported add-on
extensions.

The subtext here is that GIN, unlike the other index AMs, uses a
representation that seems pretty amenable to supporting a wide variety
of query types with a single index.  contrib/intarray's query_int
operators are not at all like the subset-inclusion-testing operators
that the core opclass supports, and it's not very hard to think of
additional cases that could be of interest to somebody (example: find
all arrays that contain some/all entries within a given integer range).
I think we're going to come up against similar situations over and over
until we find a solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think you missed the point: right now, to use both the core and
 intarray operators on an integer[] column, you have to create *two*
 GIN indexes, which will have exactly identical contents. I'm looking
 for a way to let intarray extend the core opfamily definition so that
 one index can serve.

 Maybe this is a dumb question, but why not just put whatever stuff
 intarray[] adds directly into the core opfamily?

 AFAICS that means integrating contrib/intarray into core.  Independently
 of whether that's a good idea or not, PG is supposed to be an extensible
 system, so it would be nice to have a solution that supported add-on
 extensions.

Yeah, I'm just wondering if it's worth the effort, especially in view
of a rather large patch queue we seem to have outstanding at the
moment.

 The subtext here is that GIN, unlike the other index AMs, uses a
 representation that seems pretty amenable to supporting a wide variety
 of query types with a single index.  contrib/intarray's query_int
 operators are not at all like the subset-inclusion-testing operators
 that the core opclass supports, and it's not very hard to think of
 additional cases that could be of interest to somebody (example: find
 all arrays that contain some/all entries within a given integer range).
 I think we're going to come up against similar situations over and over
 until we find a solution.

Interesting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Wed, Jan 19, 2011 at 12:10:16PM -0500, Tom Lane wrote:
 In the meantime, the proposal at hand seems like a bit of a stop-gap,
 which is why I'd prefer to see something with a very minimal code
 footprint.  Detoast at assignment would likely need only a few lines
 of code added in a single place.

 What qualifies a patch as stop-gap scale?  Pavel's patch is ~60 lines.

Yeah, but they're spread out all over plpgsql, and seem likely to
metastasize to other places --- the additional field that needs to be
initialized is the main culprit.  I'd like a one-spot patch that will
be easy to remove when/if it's no longer needed.

 If adding PLpgSQL_var.should_be_detoasted is your main pain point, testing
 VARATT_IS_EXTENDED there might be the least-harmful way to avoid it.

I thought about that too, but adding an additional set of tests into
exec_eval_datum isn't free --- that's a hot spot for plpgsql execution.
Doing it in exec_assign_value would be significantly cheaper, first
because it's reasonable to assume that assignments are less frequent
than reads, and second because there's already a test there to detect
pass-by-ref datatypes, as well as a datumCopy() step that could be
skipped altogether when we detoast.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Use of O_DIRECT only for open_* sync options

2011-01-19 Thread Bruce Momjian
Is there a reason we only use O_DIRECT with open_* sync options?
xlogdefs.h says:

/*
 *  Because O_DIRECT bypasses the kernel buffers, and because we never
 *  read those buffers except during crash recovery, it is a win to use
 *  it in all cases where we sync on each write().  We could allow O_DIRECT
 *  with fsync(), but because skipping the kernel buffer forces writes out
 *  quickly, it seems best just to use it for O_SYNC.  It is hard to imagine
 *  how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
 *  Also, O_DIRECT is never enough to force data to the drives, it merely
 *  tries to bypass the kernel cache, so we still need O_SYNC or fsync().
 */

This seems wrong because fsync() can win if there are two writes before
the sync call.  Can kernels not issue fsync() if the write was O_DIRECT?
If that is the cause, we should document it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Couple document fixes

2011-01-19 Thread Kevin Grittner
Thom Brown t...@linux.com wrote:
 
 relkind in the same table is the same type, but isn't displayed as
 char in the docs, and the same applies to many other system
tables.
 They would need changing too then.
 
 Examples are:
 
 pg_type.typtype
 pg_proc.provolatile
 pg_attribute.attstorage
 
That's a good point.  Consistency would trump getting a single entry
right, for sure.  I wonder, though, whether we shouldn't
consistently distinguish them.  For one thing, I've seen multiple
posts where people were reporting bugs because of having confused
char with char.
 
FWIW, \d shows:
 
   Table pg_catalog.pg_class
 Column  |   Type| Modifiers
-+---+---
 relname | name  | not null
 relnamespace| oid   | not null
 reltype | oid   | not null
 reloftype   | oid   | not null
 relowner| oid   | not null
 relam   | oid   | not null
 relfilenode | oid   | not null
 reltablespace   | oid   | not null
 relpages| integer   | not null
 reltuples   | real  | not null
 reltoastrelid   | oid   | not null
 reltoastidxid   | oid   | not null
 relhasindex | boolean   | not null
 relisshared | boolean   | not null
 relpersistence  | char| not null
 relkind | char| not null
 relnatts| smallint  | not null
 relchecks   | smallint  | not null
 relhasoids  | boolean   | not null
 relhaspkey  | boolean   | not null
 relhasexclusion | boolean   | not null
 relhasrules | boolean   | not null
 relhastriggers  | boolean   | not null
 relhassubclass  | boolean   | not null
 relfrozenxid| xid   | not null
 relacl  | aclitem[] |
 reloptions  | text[]|
Indexes:
pg_class_oid_index UNIQUE, btree (oid)
pg_class_relname_nsp_index UNIQUE, btree (relname,
relnamespace)
 
Currently we don't seem to distinguish them in very many places in
the docs:
 
$ find -name '*.sgml' | xargs egrep -n '\char\'
./doc/src/sgml/textsearch.sgml:1271:setweight(replaceable
class=PARAMETERvector/replaceable typetsvector/,
replaceable class=PARAMETERweight/replaceable typechar/)
returns typetsvector/
./doc/src/sgml/datatype.sgml:1116:length might change in a
future release. The type typechar/type
./doc/src/sgml/datatype.sgml:1134:   
entrytypechar/type/entry
./doc/src/sgml/release-old.sgml:4406:Add routines for single-byte
char type(Thomas)
./doc/src/sgml/release-old.sgml:4747:Make char type a synonym for
char(1) (actually implemented as bpchar)(Thomas)
./doc/src/sgml/xfunc.sgml:1794:
entrytypechar/type/entry
./doc/src/sgml/release-8.0.sgml:3389:  typechar/ data type
have been removed.
./doc/src/sgml/release-8.0.sgml:4460:typechar/ data
type have been removed.
./doc/src/sgml/release-8.0.sgml:4466:to do arithmetic on a
typechar/ column, you can cast it to
./doc/src/sgml/func.sgml:8462:
literalfunctionsetweight(typetsvector/,
typechar/)/function/literal
./doc/src/sgml/btree-gin.sgml:17:  typeoid/, typemoney/,
typechar/,
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 AFAICS that means integrating contrib/intarray into core.  Independently
 of whether that's a good idea or not, PG is supposed to be an extensible
 system, so it would be nice to have a solution that supported add-on
 extensions.

 Yeah, I'm just wondering if it's worth the effort, especially in view
 of a rather large patch queue we seem to have outstanding at the
 moment.

Oh, maybe we're not on the same page here: I wasn't really proposing
to do this right now, it's more of a TODO item.

Offhand the only reason to do it now would be if we settled on something
that required a layout change in pg_amop/pg_amproc.  Since we already
have one such change in 9.1, getting the additional change done in the
same release would be valuable to reduce the number of distinct cases
for pg_dump and other clients to support.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Robert Haas
On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Yes, I was aware of this problem (amount of memory consumed with lots of
 updates), and I kind of hoped someone will come up with a reasonable
 solution.

As far as I can see, periodically sampling some or all of the table is
really the only thing that has a chance of working OK.  You could try
to track changes only when they're not too big, but I think that's
still going to have nasty performance consequences.

 I was thinking about it and I believe at least one of the algorithms
 (the adaptive sampling) might handle merging i.e. the backends would
 not need to store the list of values, just a private copy of the
 estimator and update it. And at the end (after commit), the estimator
 would be merged back into the actual one.

 So the algorithm would be something like this:

 1. create copy for all distinct estimators influenced by the INSERT
 2. update the local copy
 3. commit and merge the local copies back into the original estimator

Well, maybe.  But DELETEs seem like a problem.  And it's still adding
foreground work to every query, which I just have a hard time
believing is ever going to work out

 Regarding the crash scenario - if the commit fails, just throw away the
 local estimator copy, it's not needed. I'm not sure how to take care of
 the case when commit succeeds and the write of the merged estimator
 fails, but I think it might be possible to write the estimator to xlog
 or something like that. So it would be replayed during recovery etc. Or
 is it a stupid idea?

It's not stupid, in the sense that that is what you'd need to do if
you want to avoid ever having to rescan the table.  But it is another
thing that I think is going to be way too expensive.

 Yes, as I've mentioned above, the multi-column stats are the reason why
 I started working on this. And yes, it really should work like this:

 1. user realizes there's something wrong as the plans are bad
 2. after analysis, the user realizes the reason is an imprecise
   estimate due to dependence between columns
 3. the user enables cross-column stats on the columns and checks
   if it actually solved the issue
 4. if the cost outweights the gains, the user drops the stats

 Does that sound reasonable?

Yes.  The only caveat I'm trying to insert is that it's hard for me to
see how the proposed approach could ever be cheap enough to be a win.
If we need some kind of more expensive kind of ANALYZE that scans the
whole table, and it's optional, sure, why not?  But that's a one-time
hit.  You run it, it sucks, it's over, and now your queries work.
Odds are good you may never need to touch it again.  Now, if we can
convince ourselves that multi-column stats are likely to require
constant updating, then maybe there would be more to recommend the
stream-based approach, although even then it seems dreadfully complex
and expensive to me.  But I bet these things are pretty static.  If
the city and state tend to imply the zip code when there are 10K rows
in the table, they probably still will when there are 100K rows in the
table.  If users with org_id = 1 have a higher karma score on average
than users with org_id != 1, that'll probably still be true when there
are more users in both classes.  Whether the database can understand
that without rescanning the table depends on the data representation,
of course, but I guess I'm just saying I'd think really, really hard
before abandoning the idea of periodic sampling.  You have to get an
awful lot of benefit out of those cross-column stats to make it worth
paying a run-time cost for them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken
 sai...@quietlycompetent.com wrote:
 Pages like this one have column comments for the system tables:
 
 http://www.psql.it/manuale/8.3/catalog-pg-attribute.html

 Oh, I see.  I don't think we want to go there.  We'd need some kind of
 system for keeping the two places in sync.

I seem to recall some muttering about teaching genbki to extract such
comments from the SGML sources or perhaps the C header files.  I tend to
agree though that it would be a lot more work than it's worth.  And as
you say, pg_description entries aren't free.

Which brings up another point though.  I have a personal TODO item to
make the comments for operator support functions more consistent:
http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us
Should we consider removing those comments altogether, instead?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié ene 19 15:25:00 -0300 2011:

 Oh, I see.  I don't think we want to go there.  We'd need some kind of
 system for keeping the two places in sync.

Maybe autogenerate both the .sgml and the postgres.description files
from a single source.

 And there'd be no easy way
 to upgrade the in-database descriptions when we upgraded to a newer
 minor release, supposing they'd changed in the meantime.

I wouldn't worry about this issue.  We don't do many catalog changes in
minor releases anyway.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] REVIEW: patch: remove redundant code from pl_exec.c

2011-01-19 Thread Stephen Frost
Greetings,

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
 This patch remove redundant rows from PL/pgSQL executor (-89 lines).

While I can certainly appreciate wanting to remove redundant code, I
don't think this change actually improves the situation.  The problem is
more than just that we might want to make a change to 'while' but not
'for', it's also that it makes it very difficult to follow the code
flow.

If you could have found a way to make it work for all calls to
exec_stmts(), it might be a bit better, but you can't without kludging
it up further.  If you could, then it might be possible to move some of
this logic *into* exec_stmts(), but I don't see that being reasonable
either.

In the end, I'd recommend cleaning up the handling of the exec_stmts()
return code so that all of these pieces follow the same style and look
similar (I'd go with the switch-based approach and remove the if/else
branches).  That'll make it easier for anyone coming along later who
does end up needing to change all three.

 Doesn't change a functionality.

I'm not convinced of this either, to be honest..  In exec_stmt_while(),
there is:

for(;;)
{
[...]
if (isnull || !value)
break;

rc = exec_stmts(estate, stmt-body);
[...]
}
return PLPGSQL_RC_OK;

You replaced the last return with:

return rc;

Which could mean returning an uninitialized rc if the above break;
happens.

In the end, I guess it's up to the committers on how they feel about
this, so here's an updated patch which fixes the above issue and
improves the comments/grammer.  Passes all regressions and works in my
limited testing.

commit e6639d83db5b301e184bf158b1591007a3f79526
Author: Stephen Frost sfr...@snowman.net
Date:   Wed Jan 19 14:28:36 2011 -0500

Improve comments in pl_exec.c wrt can_leave_loop()

This patch improves some of the comments around can_leave_loop(), and
fixes a couple of fall-through cases to make sure they behave the same
as before the code de-duplication patch that introduced
can_leave_loop().

commit f8262adcba662164dbc24d289cb036b3f0aee582
Author: Stephen Frost sfr...@snowman.net
Date:   Wed Jan 19 14:26:27 2011 -0500

remove redundant code from pl_exec.c

This patch removes redundant handling of exec_stmts()'s return code
by creating a new function to be called after exec_stmts() is run.
This new function will then handle the return code, update it if
necessary, and return if the loop should continue or not.

Patch by Pavel Stehule.

Thanks,

Stephen
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
***
*** 204,209  static Portal exec_dynquery_with_params(PLpgSQL_execstate *estate,
--- 204,211 
  		  PLpgSQL_expr *dynquery, List *params,
  		  const char *portalname, int cursorOptions);
  
+ static bool can_leave_loop(PLpgSQL_execstate *estate,
+ PLpgSQL_any_loop *stmt, int *rc);
  
  /* --
   * plpgsql_exec_function	Called by the call handler for
***
*** 1566,1571  exec_stmt_case(PLpgSQL_execstate *estate, PLpgSQL_stmt_case *stmt)
--- 1568,1650 
  	return exec_stmts(estate, stmt-else_stmts);
  }
  
+ /*
+  * can_leave_loop
+  *
+  * Check the result of exec_stmts for the various exec_stmt_loop
+  * functions (unconditional loop, while loop, for-over-integer loop,
+  * for-over-portal loop).
+  *
+  * Returns true when the outer cycle should be left, otherwise false.
+  * Will also update the return code (rc) as necessary.
+  */
+ static bool
+ can_leave_loop(PLpgSQL_execstate *estate, PLpgSQL_any_loop *stmt, int *rc)
+ {
+ 	/*
+ 	 * Check which return code exec_stmts() returned and handle it
+ 	 * accordingly.
+ 	 */
+ 	switch (*rc)
+ 	{
+ 		case PLPGSQL_RC_OK:
+ 			/* Just continue the outer loop on PLPGSQL_RC_OK */
+ 			return false;
+ 
+ 		case PLPGSQL_RC_RETURN:
+ 			/* Time to break out of the outer loop */
+ 			return true;
+ 
+ 		case PLPGSQL_RC_EXIT:
+ 			if (estate-exitlabel == NULL)
+ 			{
+ /* unlabelled exit, finish the current loop */
+ *rc = PLPGSQL_RC_OK;
+ 			}
+ 			if (stmt-label != NULL  strcmp(stmt-label, estate-exitlabel) == 0)
+ 			{
+ /* labelled exit, matches the current stmt's label */
+ estate-exitlabel = NULL;
+ *rc = PLPGSQL_RC_OK;
+ 			}
+ 
+ 			/*
+ 			 * otherwise, this is a labelled exit that does not match the
+ 			 * current statement's label, if any: return RC_EXIT so that the
+ 			 * EXIT continues to propagate up the stack.
+ 			 */
+ 			return true;
+ 
+ 		case PLPGSQL_RC_CONTINUE:
+ 			if (estate-exitlabel == NULL)
+ 			{
+ /* anonymous continue, so re-run the loop */
+ *rc = PLPGSQL_RC_OK;
+ 			}
+ 			else if (stmt-label != NULL 
+ 	 strcmp(stmt-label, estate-exitlabel) == 0)
+ 			{
+ /* label matches named continue, so re-run loop */
+ estate-exitlabel = NULL;
+ *rc = 

Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken
 sai...@quietlycompetent.com wrote:
 Pages like this one have column comments for the system tables:

 http://www.psql.it/manuale/8.3/catalog-pg-attribute.html

 Oh, I see.  I don't think we want to go there.  We'd need some kind of
 system for keeping the two places in sync.

 I seem to recall some muttering about teaching genbki to extract such
 comments from the SGML sources or perhaps the C header files.  I tend to
 agree though that it would be a lot more work than it's worth.  And as
 you say, pg_description entries aren't free.

 Which brings up another point though.  I have a personal TODO item to
 make the comments for operator support functions more consistent:
 http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us
 Should we consider removing those comments altogether, instead?

I could go either way on that.  Most of those comments are pretty
short, aren't they?  How much storage are they really costing us?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... I guess I'm just saying I'd think really, really hard
 before abandoning the idea of periodic sampling.  You have to get an
 awful lot of benefit out of those cross-column stats to make it worth
 paying a run-time cost for them.

I've been trying to not discourage Tomas from blue-sky speculation,
but I have to say I agree with Robert that the chance of any usable
result from this approach is very very small.  Feel free to do some
experimentation to prove us wrong --- but don't put a lot of effort
into it before that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Yeah.  Many-times-repeated detoasting is really bad, and this is not
 the only place in the backend where we have this problem.  :-(

 Yeah, there's been some discussion of a more general solution, and I
 think I even had a trial patch at one point (which turned out not to
 work terribly well, but maybe somebody will have a better idea someday).

 I'm pretty doubtful that there's going to be a general solution to
 this problem - I think it's going to require gradual refactoring of
 problem spots.

Do you remember the previous discussion?  One idea that was on the table
was to make the TOAST code maintain a cache of detoasted values, which
could be indexed by the toast pointer OIDs (toast rel OID + value OID),
and then PG_DETOAST_DATUM might give back a pointer into the cache
instead of a fresh value.  In principle that could be done in a fairly
centralized way.  The hard part is to know when a cache entry is not
actively referenced anymore ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread Peter Eisentraut
On ons, 2011-01-19 at 00:52 -0300, Alvaro Herrera wrote:
 Excerpts from Peter Eisentraut's message of mar ene 18 19:22:50 -0300 2011:
 
  #16: This is probably pointless because pgindent will reformat this.
 
 pgindent used to remove useless braces around single-statement blocks,
 but this behavior was removed years ago because it'd break formatting
 around PG_TRY blocks.

Good to know.  Committed then.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I think you missed the point: right now, to use both the core and
 intarray operators on an integer[] column, you have to create *two*
 GIN indexes, which will have exactly identical contents.  I'm looking
 for a way to let intarray extend the core opfamily definition so that
 one index can serve.

That I think I understood, but then I mixed opfamily and opclasses
badly.  Let's try again.

For the GIN indexes, we have 2 methods for building the index and 3
others to search it to solve the query.  You're proposing that the 2
former methods would be in the opfamily and the 3 later in the opclass.

We'd like to be able to use the same index (which building depends on
the opfamily) for solving different kind of queries, for which we can
use different traversal and search algorithms, that's the opclass.

So we would want the planner to know that in the GIN case an index built
with any opclass of a given opfamily can help answer a query that would
need any opclass of the opfamily.  Right?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Which brings up another point though. I have a personal TODO item to
 make the comments for operator support functions more consistent:
 http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us
 Should we consider removing those comments altogether, instead?

 I could go either way on that.  Most of those comments are pretty
 short, aren't they?  How much storage are they really costing us?

Well, on my machine pg_description is about 210K (per database) as of
HEAD.  90% of its contents are pg_proc entries, though I have no good
fix on how much of that is for internal-use-only functions.  A very
rough estimate from counting pg_proc and pg_operator entries suggests
that the answer might be about a third.  So if we do what was said in
the above-cited thread, ie move existing comments to pg_operator and
add boilerplate ones to pg_proc, we probably would pay 100K for it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 2:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Yeah.  Many-times-repeated detoasting is really bad, and this is not
 the only place in the backend where we have this problem.  :-(

 Yeah, there's been some discussion of a more general solution, and I
 think I even had a trial patch at one point (which turned out not to
 work terribly well, but maybe somebody will have a better idea someday).

 I'm pretty doubtful that there's going to be a general solution to
 this problem - I think it's going to require gradual refactoring of
 problem spots.

 Do you remember the previous discussion?  One idea that was on the table
 was to make the TOAST code maintain a cache of detoasted values, which
 could be indexed by the toast pointer OIDs (toast rel OID + value OID),
 and then PG_DETOAST_DATUM might give back a pointer into the cache
 instead of a fresh value.  In principle that could be done in a fairly
 centralized way.  The hard part is to know when a cache entry is not
 actively referenced anymore ...

I do remember that discussion.  Aside from the problem you mention, it
also seems that maintaining the hash table and doing lookups into it
would have some intrinsic cost.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 3:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Which brings up another point though. I have a personal TODO item to
 make the comments for operator support functions more consistent:
 http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us
 Should we consider removing those comments altogether, instead?

 I could go either way on that.  Most of those comments are pretty
 short, aren't they?  How much storage are they really costing us?

 Well, on my machine pg_description is about 210K (per database) as of
 HEAD.  90% of its contents are pg_proc entries, though I have no good
 fix on how much of that is for internal-use-only functions.  A very
 rough estimate from counting pg_proc and pg_operator entries suggests
 that the answer might be about a third.  So if we do what was said in
 the above-cited thread, ie move existing comments to pg_operator and
 add boilerplate ones to pg_proc, we probably would pay 100K for it.

I guess that's not enormously expensive, but it's not insignificant
either.  On my machine, a template database is 5.5MB.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: patch: remove redundant code from pl_exec.c

2011-01-19 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 While I can certainly appreciate wanting to remove redundant code, I
 don't think this change actually improves the situation.  The problem is
 more than just that we might want to make a change to 'while' but not
 'for', it's also that it makes it very difficult to follow the code
 flow.

That was my reaction as well; and I was also concerned that this could
have a non-negligible performance price.  (At the very least it's adding
an additional function call per loop execution, and there could also be
a penalty from forcing rc to be in memory rather than a register.)

I think we should reject this one.

 In the end, I'd recommend cleaning up the handling of the exec_stmts()
 return code so that all of these pieces follow the same style and look
 similar (I'd go with the switch-based approach and remove the if/else
 branches).  That'll make it easier for anyone coming along later who
 does end up needing to change all three.

Using a switch there is a bit problematic since in some cases you want
to use break to exit the loop.  We could replace such breaks by gotos,
but that would be another strike against the argument that you're making
things more readable.  I think the switch in exec_stmt_loop is only
workable because it has no cleanup to do, so it can just return in
places where a loop break would otherwise be needed.  In short: if you
want to make these all look alike, better to go the other way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: patch: remove redundant code from pl_exec.c

2011-01-19 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 I think we should reject this one.

Works for me.

 Using a switch there is a bit problematic since in some cases you want
 to use break to exit the loop.  We could replace such breaks by gotos,
 but that would be another strike against the argument that you're making
 things more readable.  I think the switch in exec_stmt_loop is only
 workable because it has no cleanup to do, so it can just return in
 places where a loop break would otherwise be needed.  In short: if you
 want to make these all look alike, better to go the other way.

Ah, yeah, good point.  We do use gotos elsewhere for this reason, might
consider revisiting those also, if we're trying to a 'clean-up' patch.
In any case, I'll mark this as rejected.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Extending opfamilies for GIN indexes

2011-01-19 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 For the GIN indexes, we have 2 methods for building the index and 3
 others to search it to solve the query.  You're proposing that the 2
 former methods would be in the opfamily and the 3 later in the opclass.

Actually the other way around.  An opclass is the subset of an opfamily
that is tightly bound to an index.  The build methods have to be
associatable with an index, so they're part of the index's opclass.
The query methods could be loose in the opfamily.

 So we would want the planner to know that in the GIN case an index built
 with any opclass of a given opfamily can help answer a query that would
 need any opclass of the opfamily.  Right?

The planner's not the problem here --- what's missing is the rule for
the index AM to look up the right support functions to call at runtime.

The trick is to associate the proper query support methods with any
given query operator (which'd also be loose in the family, probably).
The existing schema for pg_amop and pg_amproc is built on the assumption
that the amoplefttype/amoprighttype are sufficient for making this
association; but that seems to fall down if we would like to allow
contrib modules to add new query operators that coincidentally take the
same input types as an existing opfamily member.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I do remember that discussion.  Aside from the problem you mention, it
 also seems that maintaining the hash table and doing lookups into it
 would have some intrinsic cost.

Well, sure, but it's still far cheaper than going out to the toast table
(which will require multiple hashtable lookups, not to mention I/O and
possible lock blocking).  If we could solve the refcounting problem I
think this would be a very significant win.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 If we could solve the refcounting problem I think this would be a
 very significant win.
 
Instead of trying to keep a refcount, how about just evicting from
the buffer as needed based on LRU?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Couple document fixes

2011-01-19 Thread Tom Lane
Thom Brown t...@linux.com writes:
 I've attached a couple minor fixes to the docs.  One relating to
 SECURITY LABEL and the other for pg_class.relpersistence

Applied, thanks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Couple document fixes

2011-01-19 Thread Thom Brown
On 19 January 2011 21:10, Tom Lane t...@sss.pgh.pa.us wrote:
 Thom Brown t...@linux.com writes:
 I've attached a couple minor fixes to the docs.  One relating to
 SECURITY LABEL and the other for pg_class.relpersistence

 Applied, thanks.

Cheers Mr Lane.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 If we could solve the refcounting problem I think this would be a
 very significant win.
 
 Instead of trying to keep a refcount, how about just evicting from
 the buffer as needed based on LRU?

Well, unless you know for certain that an item is no longer used, you
can't evict it.  There are ways you could finesse that --- for instance,
you might try to only flush between statements, when certainly no user
functions are running --- but the problem is that the cache is likely
to contain some large values that you can't adopt such a laissez faire
approach with, or you'll run out of memory.

One idea that I think we discussed was to tie cache entries to the
memory context they were demanded in, and mark them unused at the next
context reset/delete.  That way they'd be considered unused at the same
points where the current implementation would certainly have discarded
the value.  This isn't perfect (because of pfree) but might be good
enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One idea that I think we discussed was to tie cache entries to the
 memory context they were demanded in, and mark them unused at the next
 context reset/delete.  That way they'd be considered unused at the same
 points where the current implementation would certainly have discarded
 the value.  This isn't perfect (because of pfree) but might be good
 enough.

Yeah, I was thinking that's probably what would have to be done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 2:59 PM, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2011-01-19 at 00:52 -0300, Alvaro Herrera wrote:
 Excerpts from Peter Eisentraut's message of mar ene 18 19:22:50 -0300 2011:

  #16: This is probably pointless because pgindent will reformat this.

 pgindent used to remove useless braces around single-statement blocks,
 but this behavior was removed years ago because it'd break formatting
 around PG_TRY blocks.

 Good to know.  Committed then.

I cracked up upon reading your commit message.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Tomas Vondra
Dne 19.1.2011 20:25, Robert Haas napsal(a):
 On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Yes, I was aware of this problem (amount of memory consumed with lots of
 updates), and I kind of hoped someone will come up with a reasonable
 solution.
 
 As far as I can see, periodically sampling some or all of the table is
 really the only thing that has a chance of working OK.  You could try
 to track changes only when they're not too big, but I think that's
 still going to have nasty performance consequences.

Yes, sampling all the table is the only way to get reliable ndistinct
estimates. I'm not sure what you mean by 'tracking changes' - as I've
mentioned in the previous post, this might be solved by updating a local
copy. Which requires a constant space (a few kB, see the previous post).
Is that acceptable? I don't know, that's up to the user if he want's to
pay this price.

 So the algorithm would be something like this:

 1. create copy for all distinct estimators influenced by the INSERT
 2. update the local copy
 3. commit and merge the local copies back into the original estimator
 
 Well, maybe.  But DELETEs seem like a problem.  And it's still adding
 foreground work to every query, which I just have a hard time
 believing is ever going to work out

Yes, deletes are difficult to handle. My idea was to compute something
like ((tuples changed + tuples deleted) / tuples total), and indicate
that a rebuild of the estimator is needed if this coefficient changes
too much - e.g. log a message or something like that.

 Regarding the crash scenario - if the commit fails, just throw away the
 local estimator copy, it's not needed. I'm not sure how to take care of
 the case when commit succeeds and the write of the merged estimator
 fails, but I think it might be possible to write the estimator to xlog
 or something like that. So it would be replayed during recovery etc. Or
 is it a stupid idea?
 
 It's not stupid, in the sense that that is what you'd need to do if
 you want to avoid ever having to rescan the table.  But it is another
 thing that I think is going to be way too expensive.

Way too expensive? All you need to put into the logfile is a copy of the
estimator, which is a few kBs. How is that 'way too expensive'? Sure, it
might be expensive when the user does a lot of small changes in separate
transactions, that's true, but I guess we could minimize the amount of
data written to the xlog by doing a diff of the estimators or something
like that.

 Yes, as I've mentioned above, the multi-column stats are the reason why
 I started working on this. And yes, it really should work like this:

 1. user realizes there's something wrong as the plans are bad
 2. after analysis, the user realizes the reason is an imprecise
   estimate due to dependence between columns
 3. the user enables cross-column stats on the columns and checks
   if it actually solved the issue
 4. if the cost outweights the gains, the user drops the stats

 Does that sound reasonable?
 
 Yes.  The only caveat I'm trying to insert is that it's hard for me to
 see how the proposed approach could ever be cheap enough to be a win.

IMHO the question is not 'How expensive is that?' but rather 'How
expensive is it compared to the gains?' If the user gets much better
estimates and a then a much better plan, then this may be a completely
acceptable price.

 If we need some kind of more expensive kind of ANALYZE that scans the
 whole table, and it's optional, sure, why not?  But that's a one-time
 hit.  You run it, it sucks, it's over, and now your queries work.
 Odds are good you may never need to touch it again.  Now, if we can
 convince ourselves that multi-column stats are likely to require
 constant updating, then maybe there would be more to recommend the
 stream-based approach, although even then it seems dreadfully complex
 and expensive to me.  But I bet these things are pretty static.  If

No, the multi-column statistics do not require constant updating. There
are cases where a sampling is perfectly fine, although you may need a
bit larger sample. Generally if you can use a multi-dimensional
histogram, you don't need to scan the whole table.

But the multi-dimensional histograms are not applicable to some cases.
Especially to the ZIP code fail case, that was repeatedly discussed. So
in case of discrete data, we need a different approach - and the
solution I've proposed is based on using ndistinct estimates to get the
estimates (actually it's based on one of the papers listed on wiki).

 and expensive to me.  But I bet these things are pretty static.  If
 the city and state tend to imply the zip code when there are 10K rows
 in the table, they probably still will when there are 100K rows in the
 table.  If users with org_id = 1 have a higher karma score on average

OK, how will you measure the implicativeness?

We have discussed this in another thread and there is a lot of gotchas
although it seems like a really 

Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Tomas Vondra
Dne 19.1.2011 20:46, Tom Lane napsal(a):
 Robert Haas robertmh...@gmail.com writes:
 ... I guess I'm just saying I'd think really, really hard
 before abandoning the idea of periodic sampling.  You have to get an
 awful lot of benefit out of those cross-column stats to make it worth
 paying a run-time cost for them.
 
 I've been trying to not discourage Tomas from blue-sky speculation,
 but I have to say I agree with Robert that the chance of any usable
 result from this approach is very very small.  Feel free to do some
 experimentation to prove us wrong --- but don't put a lot of effort
 into it before that.
 
   regards, tom lane

Discourage? Not really - I have not expected this to be a simple thing
to implement. And yes, it might turn out to be a dead end eventually.

OTOH the approach it seems really expensive so let's not try to build
it is a certain dead end, so I'm not going to surrender due to such
speculations (althouh those are valid concerns and I'll have to address
them in the future).

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 3:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, on my machine pg_description is about 210K (per database) as of
 HEAD.  90% of its contents are pg_proc entries, though I have no good
 fix on how much of that is for internal-use-only functions.  A very
 rough estimate from counting pg_proc and pg_operator entries suggests
 that the answer might be about a third.  So if we do what was said in
 the above-cited thread, ie move existing comments to pg_operator and
 add boilerplate ones to pg_proc, we probably would pay 100K for it.

 I guess that's not enormously expensive, but it's not insignificant
 either.  On my machine, a template database is 5.5MB.

The implementation I was thinking about was to have initdb run a SQL
command that would do something like

INSERT INTO pg_description
  SELECT oprcode, 'pg_proc'::regclass, 0, 'implementation of ' || oprname
  FROM pg_operator
  WHERE theres-not-already-a-description-of-the-oprcode-function

So it would be minimal work to either provide or omit the boilerplate
descriptions.  I think we can postpone the decision till we have a
closer fix on the number of entries we're talking about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Nathan Boley
On Wed, Jan 19, 2011 at 2:13 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 19.1.2011 20:25, Robert Haas napsal(a):
 On Tue, Jan 18, 2011 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Yes, I was aware of this problem (amount of memory consumed with lots of
 updates), and I kind of hoped someone will come up with a reasonable
 solution.

 As far as I can see, periodically sampling some or all of the table is
 really the only thing that has a chance of working OK.  You could try
 to track changes only when they're not too big, but I think that's
 still going to have nasty performance consequences.

 Yes, sampling all the table is the only way to get reliable ndistinct
 estimates.

IMHO continuing to focus on worst case results is a dead end. Yes, for
some distributions, ndistinct is very hard to estimate well. However,
let us not forget that the current ndistinct estimator is very bad but
the postgres planner is, on the whole, very good.  As far as I can see
this is due to two facts:

1) The distribution of values in a table is rarely pathological, and
usually follows one of several common patterns. ( IOW, we have good
heuristics )

2) The choice of plan is fairly robust to mis-estimation of ndistinct,
because there are only a few things the executer can choose. It
doesn't usually matter if a value composes 5% or 20%  ( or 99% ) of
the table, we still usually need to scan the entire table.

Again, in my *very* humble opinion, If the ultimate goal is to improve
the planner, we should try to cut down on the number of cases in which
a poor estimate of ndistinct gives a really bad plan instead of
chasing after marginal gains from a better ndistinct estimator. Maybe
( as I've advocated for before ) this means parameterizing the
distribution of values ( ie, find that the values are roughly uniform
), maybe this means estimating the error of our statistics and using
the most robust rather than the best plan, or maybe it means something
totally different. But: ( and the literature seems to support me )
pounding away at the ndistinct estimator seems like a dead end. If you
think about it, it's a bit ridiculous to look at the whole table
*just* to estimate ndistinct - if we go that far why dont we just
store the full distribution and be done with it?

Best,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-19 Thread Noah Misch
Hi Simon,

I'm reviewing this patch for CommitFest 2011-01.

On Sat, Jan 15, 2011 at 10:02:03PM +, Simon Riggs wrote:
 On Tue, 2010-12-14 at 19:48 +, Simon Riggs wrote:
  REPLACE TABLE ying WITH yang
 Patch. Needs work.

First, I'd like to note that the thread for this patch had *four* me-too
responses to the use case.  That's extremely unusual; the subject is definitely
compelling to people.  It addresses the bad behavior of natural attempts to
atomically swap two tables in the namespace:

psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
('new')
psql -c 'SELECT pg_sleep(2) FROM t'  # block the ALTER or DROP briefly
sleep 1   # 
give prev time to take AccessShareLock

# Do it this way, and the next SELECT gets data from the old table.
#psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' 

# Do it this way, and get: ERROR:  could not open relation with OID 
41380
psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' 

psql -c 'SELECT * FROM t'   # I get 'old' or an error, 
never 'new'.
psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'

by letting you do this instead:

psql -c CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
('new')
psql -c 'SELECT pg_sleep(2) FROM t'  # block the ALTER or DROP briefly
sleep 1   # 
give prev time to take AccessShareLock

psql -c 'EXCHANGE TABLE new_t TO t 

psql -c 'SELECT * FROM t'   # I get 'new', finally!
psql -c 'DROP TABLE IF EXISTS t, new_t'

I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the thread
interesting: can we just make the first example work?  Even granting that the
second syntax may be a useful addition, the existing behavior of the first
example is surely worthless, even actively harmful.  I tossed together a
proof-of-concept patch, attached, that makes the first example DTRT.  Do you see
any value in going down that road?

As for your patch itself:

 + /*
 +  * Exchange table contents
 +  *
 +  * Swap heaps, toast tables, toast indexes
 +  * all forks
 +  * all indexes

For indexes, would you basically swap yin-yang in pg_index.indrelid, update
pg_class.relnamespace as needed, and check for naming conflicts (when yin and
yang differ in schema)?  Is there more?

 +  *
 +  * Checks:
 +  * * table definitions must match

Is there a particular reason to require this, or is it just a simplification to
avoid updating things to match?

 +  * * constraints must match

Wouldn't CHECK constraints have no need to match?

 +  * * indexes need not match
 +  * * outbound FKs don't need to match
 +  * * inbound FKs will be set to not validated

We would need to ensure that inbound FOREIGN KEY constraints still have indexes
suitable to implement them.  The easiest thing there might be to internally drop
and recreate the constraint, so we get all that verification.

 +  *
 +  * No Trigger behaviour
 +  *
 +  * How is it WAL logged? By locks and underlying catalog updates
 +  */

I see that the meat of the patch is yet to be written, so this ended up as more
of a design review based on your in-patch comments.  Hopefully it's of some
value.  I'll go ahead and mark the patch Returned with Feedback.

Thanks,
nm
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 970,995  relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
  {
Oid relOid;
  
!   /*
!* Check for shared-cache-inval messages before trying to open the
!* relation.  This is needed to cover the case where the name 
identifies a
!* rel that has been dropped and recreated since the start of our
!* transaction: if we don't flush the old syscache entry then we'll 
latch
!* onto that entry and suffer an error when we do RelationIdGetRelation.
!* Note that relation_open does not need to do this, since a relation's
!* OID never changes.
!*
!* We skip this if asked for NoLock, on the assumption that the caller 
has
!* already ensured some appropriate lock is held.
!*/
!   if (lockmode != NoLock)
!   AcceptInvalidationMessages();
! 
!   /* Look up the appropriate relation using namespace search */
!   relOid = RangeVarGetRelid(relation, false);
  
/* Let relation_open do the rest */
!   return relation_open(relOid, lockmode);
  }
  
  /* 
--- 970,980 
  {
Oid relOid;
  
!   /* Look up and lock the appropriate relation using namespace search */
!   relOid = RangeVarLockRelid(relation, lockmode, false);
  
/* Let relation_open do the rest */
!   

Re: [HACKERS] psql: Add \dL to show languages

2011-01-19 Thread Andreas Karlsson
On Tue, 2011-01-18 at 19:34 -0500, Josh Kupershmidt wrote:
 Got that now too. I lost my ~/.emacs file recently, which is mostly
 why I'm making whitespace mistakes. Rebuilding slowly though;
 (setq-default show-trailing-whitespace t) is what I needed.

Aha, I see.

 I left the Call Handler and Validator columns in the verbose
 output since I haven't heard otherwise.
 
 Josh

I do not really care either way. The columns are not terribly useful but
not pointless either.

The patch looks alright now so I will mark it as ready for committer
now.

Andreas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] REVIEW: WIP: plpgsql - foreach in

2011-01-19 Thread Stephen Frost
Greetings,

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
 attached patch contains a implementation of iteration over a array:

I've gone through this patch and, in general, it looks pretty reasonable
to me.  There's a number of places where I think additional comments
would be good and maybe some variable name improvments.  Also, my
changes should be reviewed to make sure they make sense.

Attached is a patch against master which includes my changes, and a
patch against Pavel's patch, so he can more easily see my changes and
include them if he'd like.

I'm going to mark this returned to author with feedback.

commit 30295015739930e68c33b29da4f7ef535bc293ea
Author: Stephen Frost sfr...@snowman.net
Date:   Wed Jan 19 17:58:24 2011 -0500

Clean up foreach-in-array PL/PgSQL code/comments

Minor clean-up of the PL/PgSQL foreach-in-array patch, includes
some white-space cleanup, grammar fixes, additional errhint where
it makes sense, etc.

Also added a number of 'XXX' comments asking for clarification
and additional comments on what's happening in the code.

commit f1a02fe3a8fa84217dae32d5ba74e9764c77431c
Author: Stephen Frost sfr...@snowman.net
Date:   Wed Jan 19 15:11:53 2011 -0500

PL/PgSQL - Add interate-over-array support

This patch adds support for iterating over an array in PL/PgSQL.

Patch Author: Pavel Stehule

Thanks,

Stephen
*** a/doc/src/sgml/plpgsql.sgml
--- b/doc/src/sgml/plpgsql.sgml
***
*** 2238,2243  END LOOP optional replaceablelabel/replaceable /optional;
--- 2238,2268 
  /para
 /sect2
  
+sect2 id=plpgsql-array-iterating
+ titleLooping Through Array/title
+ synopsis
+ optional lt;lt;replaceablelabel/replaceablegt;gt; /optional
+ FOREACH replaceabletarget/replaceable optional SCALE replaceablenumber/replaceable /optional IN replaceableexpression/replaceable LOOP
+ replaceablestatements/replaceable
+ END LOOP optional replaceablelabel/replaceable /optional;
+ /synopsis
+ 
+ programlisting
+ CREATE FUNCTION sum(VARIADIC int[]) RETURNS int8 AS $$
+ DECLARE
+   s int8; x int;
+ BEGIN
+   FOREACH x IN $1
+   LOOP
+ s := s + x;
+   END LOOP;
+   RETURN s;
+ END;
+ $$ LANGUAGE plpgsql;
+ /programlisting
+ 
+/sect2
+ 
 sect2 id=plpgsql-error-trapping
  titleTrapping Errors/title
  
*** a/src/pl/plpgsql/src/gram.y
--- b/src/pl/plpgsql/src/gram.y
***
*** 21,26 
--- 21,27 
  #include parser/parse_type.h
  #include parser/scanner.h
  #include parser/scansup.h
+ #include utils/array.h
  
  
  /* Location tracking support --- simpler than bison's default */
***
*** 134,139  static	List			*read_raise_options(void);
--- 135,141 
  			PLpgSQL_datum   *scalar;
  			PLpgSQL_rec *rec;
  			PLpgSQL_row *row;
+ 			int	slice;
  		}		forvariable;
  		struct
  		{
***
*** 178,184  static	List			*read_raise_options(void);
  %type ival	assign_var
  %type var		cursor_variable
  %type datum	decl_cursor_arg
! %type forvariable	for_variable
  %type stmt	for_control
  
  %type str		any_identifier opt_block_label opt_label
--- 180,186 
  %type ival	assign_var
  %type var		cursor_variable
  %type datum	decl_cursor_arg
! %type forvariable	for_variable foreach_control
  %type stmt	for_control
  
  %type str		any_identifier opt_block_label opt_label
***
*** 190,196  static	List			*read_raise_options(void);
  %type stmt	stmt_return stmt_raise stmt_execsql
  %type stmt	stmt_dynexecute stmt_for stmt_perform stmt_getdiag
  %type stmt	stmt_open stmt_fetch stmt_move stmt_close stmt_null
! %type stmt	stmt_case
  
  %type list	proc_exceptions
  %type exception_block exception_sect
--- 192,198 
  %type stmt	stmt_return stmt_raise stmt_execsql
  %type stmt	stmt_dynexecute stmt_for stmt_perform stmt_getdiag
  %type stmt	stmt_open stmt_fetch stmt_move stmt_close stmt_null
! %type stmt	stmt_case stmt_foreach_a
  
  %type list	proc_exceptions
  %type exception_block exception_sect
***
*** 264,269  static	List			*read_raise_options(void);
--- 266,272 
  %token keyword	K_FETCH
  %token keyword	K_FIRST
  %token keyword	K_FOR
+ %token keyword	K_FOREACH
  %token keyword	K_FORWARD
  %token keyword	K_FROM
  %token keyword	K_GET
***
*** 298,303  static	List			*read_raise_options(void);
--- 301,307 
  %token keyword	K_ROWTYPE
  %token keyword	K_ROW_COUNT
  %token keyword	K_SCROLL
+ %token keyword	K_SLICE
  %token keyword	K_SQLSTATE
  %token keyword	K_STRICT
  %token keyword	K_THEN
***
*** 739,744  proc_stmt		: pl_block ';'
--- 743,750 
  		{ $$ = $1; }
  | stmt_for
  		{ $$ = $1; }
+ | stmt_foreach_a
+ 		{ $$ = $1; }
  | stmt_exit
  		{ $$ = $1; }
  | stmt_return
***
*** 1386,1391  for_variable	: T_DATUM
--- 1392,1455 
  	}
  ;
  
+ stmt_foreach_a		: opt_block_label K_FOREACH foreach_control K_IN 

Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Tomas Vondra
Dne 19.1.2011 23:44, Nathan Boley napsal(a):
 1) The distribution of values in a table is rarely pathological, and
 usually follows one of several common patterns. ( IOW, we have good
 heuristics )

Not true. You're missing the goal of this effort - to get ndistinct
estimate for combination of multiple columns. Which is usually
pathological in case of dependent columns. Which is exactly the case
when the user will explicitly enable this feature to get better
estimates (and thus better plans).

 2) The choice of plan is fairly robust to mis-estimation of ndistinct,
 because there are only a few things the executer can choose. It
 doesn't usually matter if a value composes 5% or 20%  ( or 99% ) of
 the table, we still usually need to scan the entire table.

Again, don't think about a single column (although even in that case
there are known fail cases). Think about multiple columns and the number
of distinct combinations. With attribute value independence assumption,
this is usually significantly underestimated. And that's a problem as it
often leads to an index scan instead of sequential scan (or something
like that).

 Again, in my *very* humble opinion, If the ultimate goal is to improve
 the planner, we should try to cut down on the number of cases in which
 a poor estimate of ndistinct gives a really bad plan instead of
 chasing after marginal gains from a better ndistinct estimator. Maybe

What is a marginal gain? The ultimate goal is to build cross-column
stats, which in case of dependent columns usually results is poor plans.
How is fixing that a marginal gain?

Yes, it may turn out it's not worth it, but it's a bit early to say that
without an implementation and some hard data.

 ( as I've advocated for before ) this means parameterizing the
 distribution of values ( ie, find that the values are roughly uniform
 ), maybe this means estimating the error of our statistics and using
 the most robust rather than the best plan, or maybe it means something
 totally different. But: ( and the literature seems to support me )

Which literature? I've read an awful lot of papers on this topic lately,
and I don't remember anything like that. If there's an interesting
paper, I'd like to read it. Yes, all the papers state estimating a
ndistinct is a particularly tricky task, but that's somehow expected?

I've even checked how other databases do this - e.g. Oracle does it very
similarly to what I proposed (the user has to enable the feature, it has
to be rebuilt from time to time, etc.). I'm not saying we should do
everything like Oracle, but they are not clueless idiots.

 pounding away at the ndistinct estimator seems like a dead end. If you
 think about it, it's a bit ridiculous to look at the whole table
 *just* to estimate ndistinct - if we go that far why dont we just
 store the full distribution and be done with it?

No, I'm not trying to do this just to improve the ndistinct estimate.
The goal is to improve the cardinality estimate in case of dependent
columns, and one of the papers depends on good ndistinct estimates.

And actually it does not depend on ndistinct for the columns only, it
depends on ndistinct estimates for the combination of columns. So
improving the ndistinct estimates for columns is just a necessary first
step (and only if it works reasonably well, we can do the next step).

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Florian Pflug
On Jan19, 2011, at 23:44 , Nathan Boley wrote:
 If you think about it, it's a bit ridiculous to look at the whole table
 *just* to estimate ndistinct - if we go that far why dont we just
 store the full distribution and be done with it?

The crucial point that you're missing here is that ndistinct provides an
estimate even if you *don't* have a specific value to search for at hand.
This is way more common than you may think, it e.g. happens every you time
PREPARE are statement with parameters. Even knowing the full distribution
has no advantage in this case - the best you could do is to average the
individual probabilities which gives ... well, 1/ndistinct.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2011-01-19 Thread Jan Urbański
On 19/01/11 10:57, Jan Urbański wrote:
 On 18/01/11 23:22, Peter Eisentraut wrote:
 #2: It looks like this loses some information/formatting in the error
 message.  Should we keep the pointing arrow there?

  CONTEXT:  PL/Python function sql_syntax_error
 -ERROR:  syntax error at or near syntax
 -LINE 1: syntax error
 -^
 -QUERY:  syntax error
 +ERROR:  PL/Python: plpy.SPIError: syntax error at or near syntax
  CONTEXT:  PL/Python function sql_syntax_error

 Yes, the message is less informative, because the error is reported by
 PL/Python, by wrapping the SPI message. I guess I could try to extract
 more info from the caught ErrorData and put it in the new error that
 PL/Python throws.

All right, I found a way to shoehorn the extra information into
SPIException, I'll post a new patch series with what's left of the
general refactoring patch soon.

Shortly after, I'll post updated patches for doing SPI in subxacts,
explicit subxact starting and generated SPI exceptions, as they will
surely be broken by these changes.

Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-19 Thread Simon Riggs
On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:

 I'll go ahead and mark the patch Returned with Feedback.

My understanding of the meaning of that is polite rejection. If you do
that there is no further author comment and we move to July 2011. That
then also rejects your own patch with what you say is an alternative
implementation...

Is that what you wish? That isn't what I wish, either way. I suggest you
mark it Waiting on Author, so we can discuss it further.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SSI and Hot Standby

2011-01-19 Thread Kevin Grittner
Here's an issue for feedback from the community -- do we want to
support truly serializable transactions on hot standby machines?
 
The best way Dan and I have been able to think to do this is to
build on the SERIALIZABLE READ ONLY DEFERRABLE behavior.  We are
able to obtain a snapshot and then check to see if it is at a place
in the transaction processing that it would be guaranteed to be
serializable without participating in predicate locking, rw-conflict
detection, etc.  If it's not, we block until a READ WRITE
transaction completes, and then check again.  Repeat.  We may reach
a point where we determine that the snapshot can't work, and we get
a new one and start over.  Due to the somewhat complex rules for
this, you are likely to see a safe snapshot fairly quickly even in a
mix which always has short-lived READ WRITE transactions running,
although a single long-running READ WRITE transaction can block
things until it completes.
 
The idea is that whenever we see a valid snapshot which would yield
a truly serializable view of the data for a READ ONLY transaction,
we add a WAL record with that snapshot information.  Of course, we
might want some limit of how often they are sent, to avoid WAL
bloat.  A hot standby could just keep the most recently received of
these and use it when a SERIALIZABLE transaction is requested. 
Perhaps DEFERRABLE in this context could mean that it waits for the
*next* one and uses it, to assure freshness.
 
Actually, we could try to get tricky to avoid sending a complete
snapshot by having two WAL messages with no payload -- one would
mean the snapshot you would get now is being tested for
serializability.  If it failed reach that state we would send
another when we started working a new snapshot.  The other type of
message would mean the snapshot you built when we last told you we
were starting to test one is good.  I *think* that can work, and it
may require less WAL space.
 
If we don't do something like this, do we just provide REPEATABLE
READ on the standby as the strictest level of transaction isolation?
If so, do we generate an error on a request for SERIALIZABLE, warn
and provide degraded behavior, or just quietly give them REPEATABLE
READ behavior?
 
Thoughts?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-19 Thread Simon Riggs
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:

 Here's an issue for feedback from the community -- do we want to
 support truly serializable transactions on hot standby machines?

In this release? Maybe? In later releases? Yes.

If it blocks your excellent contribution in this release, then from me,
no. If you can achieve this in this release, yes. However, if this is
difficult or complex, then I would rather say not yet quickly now,
than spend months working out the weirdnesses and possibly still get
them wrong.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-19 Thread Fujii Masao
On Wed, Jan 19, 2011 at 9:37 PM, Magnus Hagander mag...@hagander.net wrote:
 The fast or slow seems to lead users to always choose fast. Instead,
 what about fast or smooth, fast or spread or immediate or delayed?

 Hmm. fast or spread seems reasonable to me. And I want to use fast
 for the fast version, because that's what we call it when you use
 pg_start_backup(). I'll go change it to spread for now  - it's the one
 I can find used in the docs.

Fair enough.

 What if pg_basebackup receives a signal while doing a backup?
 For example, users might do Ctrl-C to cancel the long-running backup.
 We should define a signal handler and send a Terminate message
 to the server to cancel the backup?

 Nah, we'll just disconnect and it'll deal with things that way. Just
 like we do with e.g. pg_dump. I don't see the need to complicate it
 with that.

Okay.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI and Hot Standby

2011-01-19 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 In this release? Maybe? In later releases? Yes.
 
 If it blocks your excellent contribution in this release, then
 from me, no. If you can achieve this in this release, yes.
 However, if this is difficult or complex, then I would rather say
 not yet quickly now, than spend months working out the
 weirdnesses and possibly still get them wrong.
 
We already have a mechanism for generating a good snapshot, the hard
part (for me at least) would be to get that snapshot over to the hot
standby and have it use the latest one on a request for a
serializable transaction.  I have no experience with WAL file
output, and don't know what it would take for hot standby to use it
as I describe.
 
I agree it's pretty late in the cycle, but I'm going through all the
loose ends and found this one -- which has been hanging out on the
Wiki page as an RD item for over a full year without discussion. 
:-(  If we provide the snapshots (which we can safely and easily
do), can someone else who knows what they're doing with WAL and HS
get the rest of it safely into the release?  That seems to me to be
the only way it can still happen for 9.1.
 
If not, I agree this can be 9.2 material.  We just have to decide
how to document it and answer the questions near the bottom of my
initial post of the thread.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-19 Thread Nathan Boley
 If you think about it, it's a bit ridiculous to look at the whole table
 *just* to estimate ndistinct - if we go that far why dont we just
 store the full distribution and be done with it?

 - the best you could do is to average the
 individual probabilities which gives ... well, 1/ndistinct.


That is certainly *not* the best you could do in every case. The mean
is only the best estimate in L2, which is definitely not the case
here.

Consider a table with 10K values, 9,990 of which are 1 and the rest of
which are 2, 3, ..., 10, versus a table that has the same 10 distinct
values evenly distributed. For a simple equality query, in the first
case, a bitmap scan might be best. In the second case, a sequential
scan would always be best.

This is precisely the point I was trying to make in my email: the loss
function is very complicated. Improving the ndistinct estimator could
significantly improve the estimates of ndistinct ( in the quadratic
loss sense ) while only marginally improving the plans.

-Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-19 Thread Fujii Masao
On Thu, Jan 20, 2011 at 2:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 What I'm worried about is the case where a tablespace is created
 under the $PGDATA directory.

 What would be the sense of that?  If you're concerned about whether the
 code handles it correctly, maybe the right solution is to add code to
 CREATE TABLESPACE to disallow it.

I'm not sure why that's the right solution. Why do you think that we should
not create the tablespace under the $PGDATA directory? I'm not surprised
that people mounts the filesystem on $PGDATA/mnt and creates the
tablespace on it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-19 Thread Noah Misch
On Thu, Jan 20, 2011 at 12:57:23AM +, Simon Riggs wrote:
 On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:
 
  I'll go ahead and mark the patch Returned with Feedback.
 
 My understanding of the meaning of that is polite rejection. If you do
 that there is no further author comment and we move to July 2011. That
 then also rejects your own patch with what you say is an alternative
 implementation...
 
 Is that what you wish? That isn't what I wish, either way. I suggest you
 mark it Waiting on Author, so we can discuss it further.

Before I consider my wishes too carefully, I've moved the patch back to Waiting
on Author, for the reason that it seems wrong to force it elsewhere today as
long as the author (you) would like it there.  Not that I have some kind of
authority over patch disposition in any case.

I had put it straight to RWF because it seemed clearly not intended to be
applied.  No political statement or harm intended, and maybe that determination
was not even correct.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-19 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Thu, Jan 20, 2011 at 2:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 What I'm worried about is the case where a tablespace is created
 under the $PGDATA directory.

 What would be the sense of that?  If you're concerned about whether the
 code handles it correctly, maybe the right solution is to add code to
 CREATE TABLESPACE to disallow it.

 I'm not sure why that's the right solution. Why do you think that we should
 not create the tablespace under the $PGDATA directory? I'm not surprised
 that people mounts the filesystem on $PGDATA/mnt and creates the
 tablespace on it.

No?  Usually, having a mount point in a non-root-owned directory is
considered a Bad Thing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >