Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-23 Thread Amit kapila
On Friday, November 23, 2012 10:10 PM Fujii Masao wrote:
On Fri, Nov 23, 2012 at 6:56 PM, Amit Kapila  wrote:
>> When I remove postgresql.auto.conf, SET PERSISTENT failed.
>>
>>> =# SET PERSISTENT synchronous_commit = 'local';
>>> ERROR:  failed to open "postgresql.auto.conf" file
>
>> There can be 2 ways to handle this, either we recreate the
>> "postgresql.auto.conf" file or give error.
>> I am not sure if user tries to delete internal files, what should be exact
>> behavior?
>> Any suggestion?

> I prefer to recreate it.

>$PGDATA/config_dir is specified in include_dir by default. Users might
>create and remove the configuration files in that directory many times.
>So I'm not surprised even if a user accidentally removes
>postgresql.auto.conf in that directory. Also users might purposely remove
>that file to reset all the settings by SET PERSISTENT. 

One of the suggestion in this mail chain was if above happens then at restart, 
it should display/log message.
However I think if such a situation happens then we should provide some 
mechanism to users so that the settings through 
command should work.

> So I think that SET
>PERSISTENT should handle the case where postgresql.auto.conf doesn't
>exist.

If we directly create a file user might not be aware that his settings done in 
previous commands will not work.
So how about if we display NOTICE also when internally for SET PERSISTENT new 
file needs to be created.
Notice should suggest that the settings done by previous commands are lost due 
to manual deletion of internal file.


>We might be able to expect that postgresql.auto.conf is not deleted by
>a user if it's in $PGDATA/global or base directory.

So do you mean to say that if we don't find file in config_dir, then we should 
search in $PGDATA/global or base directory?
Can't we mandate to user that even if it is deleted, the file has to be only 
expected in config_dir.


>>> We should implement "RESET PERSISTENT"? Otherwise, there is no way
>>> to get rid of the parameter setting from postgresql.auto.conf, via SQL.
>>> Also
>>> We should implement "SET PERSISTENT name TO DEFAULT"?
>
>> Till now, I have not implemented this in patch, thinking that it can be done
>> as a 2nd part if basic stuff is ready.
>> However I think you are right without one of "RESET PERSISTENT" or "SET
>> PERSISTENT name TO DEFAULT", it is difficult for user
>> to get rid of parameter.
>> Will "SET PERSISTENT name TO DEFAULT" be sufficient or do you think both are
>> necessary, because RESET PERSISTENT also internally might need
>> to behave similarly.
>
>> For implementation of "SET PERSISTENT name TO DEFAULT", there can be 2 ways
>> 1) Delete the entry from postgresql.auto.conf
>> 2) Update the entry value in postgresql.auto.conf to default value

>Both seems to be useful. I think that "SET ... TO DEFAULT" is suitable for
>2), and "RESET PERSISTENT ..." is suitable for 1).

For other commands behavior for "SET ... TO DEFAULT" and "RESET  ..." is same.
In the docs it is mentioned "RESET "is an alternative name for "SET ... TO 
DEFAULT"

However still the way you are telling can be done. 
Others, any objection to it, else I will go ahead with it?

> Another comment is:

> What happens if the server crashes while SET PERSISTENT is writing the
> setting to the file? A partial write occurs and restart of the server would 
> fail
> because of corrupted postgresql.auto.conf?

This situation will not happen as SET PERSISTENT command will first write to 
".lock" file and then at commit time, 
rename it to ".auto.conf".

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

2012-11-23 Thread Bruce Momjian
On Wed, Nov 14, 2012 at 06:28:41PM -0500, Bruce Momjian wrote:
> On Wed, Nov 14, 2012 at 06:15:30PM -0500, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Wed, Nov 14, 2012 at 05:39:29PM -0500, Tom Lane wrote:
> > >> You would have been better off keeping the array and sorting it so you
> > >> could use binary search, instead of passing the problem off to the
> > >> filesystem.
> > 
> > > Well, testing showed using open() was a big win.
> > 
> > ... on the filesystem you tested on.  I'm concerned that it might not
> > look so good on other platforms.
> 
> True. I am on ext3.  So I need to generate a proof-of-concept patch and
> have others test it?

OK, test patch attached.  The patch will only work if your database uses
a single tablespace.  Doing multiple tablespaces seemed too complex for
the test patch.

Here are my results:

# tables  gitbsearch patch
111.16 10.99
 100019.13 19.26
 200026.78 27.11
 400043.81 42.15
 800079.96 77.38
16000   165.26162.24
32000   378.18368.49
64000  1083.35   1086.77
 
As you can see, the bsearch code doesn't see to make much difference. 
Sometimes it is faster, other times, slower ---  seem to be just
measurement noise.  This code uses the same method of file lookup as git
head, meaning it looks for specific files rather than patterns ---  this
simplified the bsearch code.

Can anyone see a consistent improvement with the bsearch patch? 
Attached is my test shell script.  There is no reason we can't use
bsearch(), except not using it allows us to remove the pg_upgrade
directory listing function.
 
-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/file.c b/contrib/pg_upgrade/file.c
new file mode 100644
index d8cd8f5..a5d92c6
*** a/contrib/pg_upgrade/file.c
--- b/contrib/pg_upgrade/file.c
*** copy_file(const char *srcfile, const cha
*** 221,226 
--- 221,281 
  #endif
  
  
+ /*
+  * load_directory()
+  *
+  * Read all the file names in the specified directory, and return them as
+  * an array of "char *" pointers.  The array address is returned in
+  * *namelist, and the function result is the count of file names.
+  *
+  * To free the result data, free each (char *) array member, then free the
+  * namelist array itself.
+  */
+ int
+ load_directory(const char *dirname, char ***namelist)
+ {
+ 	DIR		   *dirdesc;
+ 	struct dirent *direntry;
+ 	int			count = 0;
+ 	int			allocsize = 64;		/* initial array size */
+ 
+ 	*namelist = (char **) pg_malloc(allocsize * sizeof(char *));
+ 
+ 	if ((dirdesc = opendir(dirname)) == NULL)
+ 		pg_log(PG_FATAL, "could not open directory \"%s\": %s\n",
+ 			   dirname, getErrorText(errno));
+ 
+ 	while (errno = 0, (direntry = readdir(dirdesc)) != NULL)
+ 	{
+ 		if (count >= allocsize)
+ 		{
+ 			allocsize *= 2;
+ 			*namelist = (char **)
+ 		pg_realloc(*namelist, allocsize * sizeof(char *));
+ 		}
+ 
+ 		(*namelist)[count++] = pg_strdup(direntry->d_name);
+ 	}
+ 
+ #ifdef WIN32
+ 	/*
+ 	 * This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in
+ 	 * released version
+ 	 */
+ 	if (GetLastError() == ERROR_NO_MORE_FILES)
+ 		errno = 0;
+ #endif
+ 
+ 	if (errno)
+ 		pg_log(PG_FATAL, "could not read directory \"%s\": %s\n",
+ 			   dirname, getErrorText(errno));
+ 
+ 	closedir(dirdesc);
+ 
+ 	return count;
+ }
+ 
+ 
  void
  check_hard_link(void)
  {
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index ace56e5..0248d40
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
***
*** 7,12 
--- 7,13 
  
  #include 
  #include 
+ #include 
  #include 
  #include 
  
*** const char *setupPageConverter(pageCnvCt
*** 365,370 
--- 366,372 
  typedef void *pageCnvCtx;
  #endif
  
+ int			load_directory(const char *dirname, char ***namelist);
  const char *copyAndUpdateFile(pageCnvCtx *pageConverter, const char *src,
    const char *dst, bool force);
  const char *linkAndUpdateFile(pageCnvCtx *pageConverter, const char *src,
diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c
new file mode 100644
index 7dbaac9..b5b7380
*** a/contrib/pg_upgrade/relfilenode.c
--- b/contrib/pg_upgrade/relfilenode.c
***
*** 18,24 
  static void transfer_single_new_db(pageCnvCtx *pageConverter,
  	   FileNameMap *maps, int size);
  static void transfer_relfile(pageCnvCtx *pageConverter, FileNameMap *map,
! 			 const char *suffix);
  
  
  /*
--- 18,24 
  static void transfer_single_new_db(pageCnvCtx *pageConverter,
  	   FileNameMap *maps, int size);
  static void transfer_relfile(pageCnvCtx *pageConverter, FileNameMap *map,
! 			 const char *suffix, 

Use of fsync; was Re: [HACKERS] Pg_upgrade speed for many tables

2012-11-23 Thread Bruce Momjian
 On Mon, Nov 19, 2012 at 12:11:26PM -0800, Jeff Janes wrote:

[ Sorry for the delay in replying.]

> On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian  wrote:
> > On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:
> >>
> >> Is turning off synchronous_commit enough?  What about turning off fsync?
> >
> > I did some testing with the attached patch on a magnetic disk with no
> > BBU that turns off fsync;
> 
> With which file system? I wouldn't expect you to see a benefit with
> ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits
> "group fsync" of new file creations but rather does each one serially.
>  Whether it is worth applying a fix that is only needed for that one
> file system, I don't know.  The trade-offs are not all that clear to
> me yet.

That only ext4 shows the difference seems possible.

> >  I got these results
> >
> >  sync_com=off  fsync=off
> > 115.90 13.51
> >  100026.09 24.56
> >  200033.41 31.20
> >  400057.39 57.74
> >  8000   102.84116.28
> > 16000   189.43207.84
> >
> > It shows fsync faster for < 4k, and slower for > 4k.  Not sure why this
> > is the cause but perhaps the buffering of the fsync is actually faster
> > than doing a no-op fsync.
> 
> synchronous-commit=off turns off not only the fsync at each commit,
> but also the write-to-kernel at each commit; so it is not surprising
> that it is faster at large scale.  I would specify both
> synchronous-commit=off and fsync=off.

I would like to see actual numbers showing synchronous-commit=off is
also useful if we use fsync=off.

> >> When I'm doing a pg_upgrade with thousands of tables, the shutdown
> >> checkpoint after restoring the dump to the new cluster takes a very
> >> long time, as the writer drains its operation table by opening and
> >> individually fsync-ing thousands of files.  This takes about 40 ms per
> >> file, which I assume is a combination of slow lap-top disk drive, and
> >> a strange deal with ext4 which makes fsyncing a recently created file
> >> very slow.   But even with faster hdd, this would still be a problem
> >> if it works the same way, with every file needing 4 rotations to be
> >> fsynced and this happens in serial.
> >
> > Is this with the current code that does synchronous_commit=off?  If not,
> > can you test to see if this is still a problem?
> 
> Yes, it is with synchronous_commit=off. (or if it wasn't originally,
> it is now, with the same result)
> 
> Applying your fsync patch does solve the problem for me on ext4.
> Having the new cluster be on ext3 rather than ext4 also solves the
> problem, without the need for a patch; but it would be nice to more
> friendly to ext4, which is popular even though not recommended.

Do you have numbers with synchronous-commit=off, fsync=off, and both, on
ext4?

> >> Anyway, the reason I think turning fsync off might be reasonable is
> >> that as soon as the new cluster is shut down, pg_upgrade starts
> >> overwriting most of those just-fsynced file with other files from the
> >> old cluster, and AFAICT makes no effort to fsync them.  So until there
> >> is a system-wide sync after the pg_upgrade finishes, your new cluster
> >> is already in mortal danger anyway.
> >
> > pg_upgrade does a cluster shutdown before overwriting those files.
> 
> Right.  So as far as the cluster is concerned, those files have been
> fsynced.  But then the next step is go behind the cluster's back and
> replace those fsynced files with different files, which may or may not
> have been fsynced.  This is what makes me thing the new cluster is in
> mortal danger.  Not only have the new files perhaps not been fsynced,
> but the cluster is not even aware of this fact, so you can start it
> up, and then shut it down, and it still won't bother to fsync them,
> because as far as it is concerned they already have been.
> 
> Given that, how much extra danger would be added by having the new
> cluster schema restore run with fsync=off?
> 
> In any event, I think the documentation should caution that the
> upgrade should not be deemed to be a success until after a system-wide
> sync has been done.  Even if we use the link rather than copy method,
> are we sure that that is safe if the directories recording those links
> have not been fsynced?

OK, the above is something I have been thinking about, and obviously you
have too.  If you change fsync from off to on in a cluster, and restart
it, there is no guarantee that the dirty pages you read from the kernel
are actually on disk, because Postgres doesn't know they are dirty. 
They probably will be pushed to disk by the kernel in less than one
minute, but still, it doesn't seem reliable. Should this be documented
in the fsync section?

Again, another reason not to use fsync=off, though your example of the
file copy is a good one.  As you stated, this is a problem with the file
copy/link, independen

Re: [HACKERS] splitting *_desc routines

2012-11-23 Thread Tom Lane
Alvaro Herrera  writes:
> So incorporating these ideas, the layout now looks like this:

> ./commands/seq_desc.c
> ./commands/dbase_desc.c
> ./commands/tablespace_desc.c
> ./catalog/storage_desc.c
> ./utils/cache/relmap_desc.c
> ./access/rmgrdesc/mxact_desc.c
> ./access/rmgrdesc/spgdesc.c
> ./access/rmgrdesc/xact_desc.c
> ./access/rmgrdesc/heapdesc.c
> ./access/rmgrdesc/tupdesc.c
> ./access/rmgrdesc/xlog_desc.c
> ./access/rmgrdesc/gistdesc.c
> ./access/rmgrdesc/clog_desc.c
> ./access/rmgrdesc/hashdesc.c
> ./access/rmgrdesc/gindesc.c
> ./access/rmgrdesc/nbtdesc.c
> ./storage/ipc/standby_desc.c

FWIW, I'd vote for dumping all of these into *one* rmgrdesc directory
(which may as well be under access/ since that's where the xlog code is),
regardless of where the corresponding replay code is in the source tree.
I don't think splitting them up into half a dozen directories adds
anything except confusion.  If you want, the header comment for each
file could mention where the corresponding replay code lives.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] splitting *_desc routines

2012-11-23 Thread Simon Riggs
On 23 November 2012 22:52, Alvaro Herrera  wrote:

> ./access/rmgrdesc/xact_desc.c
> ./access/rmgrdesc/heapdesc.c

Could we have either all underscores _ or none at all for the naming?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further pg_upgrade analysis for many tables

2012-11-23 Thread Jeff Janes
On Thu, Nov 15, 2012 at 7:05 PM, Jeff Janes  wrote:
> On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane  wrote:
>> Jeff Janes  writes:
>>> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane  wrote:
 There are at least three ways we could whack that mole: ...

 * Keep a separate list (or data structure of your choice) so that
 relcache entries created in the current xact could be found directly
 rather than having to scan the whole relcache.  That'd add complexity
 though, and could perhaps be a net loss for cases where the relcache
 isn't so bloated.
>>
>>> Maybe a static list that can overflow, like the ResourceOwner/Lock
>>> table one recently added.  The overhead of that should be very low.
>>
>>> Are the three places where "need_eoxact_work = true;" the only places
>>> where things need to be added to the new structure?
>>
>> Yeah.  The problem is not so much the number of places that do that,
>> as that places that flush entries from the relcache would need to know
>> to remove them from the separate list, else you'd have dangling
>> pointers.
>
> If the list is of hash-tags rather than pointers, all we would have to
> do is ignore entries that are not still in the hash table, right?
>

I've attached a proof-of-concept patch to implement this.

I got rid of need_eoxact_work entirely and replaced it with a short
list that fulfills the functions of indicating that work is needed,
and suggesting which rels might need that work.  There is no attempt
to prevent duplicates, nor to remove invalidated entries from the
list.   Invalid entries are skipped when the hash entry is not found,
and processing is idempotent so duplicates are not a problem.

Formally speaking, if MAX_EOXACT_LIST were 0, so that the list
overflowed the first time it was accessed, then it would be identical
to the current behavior or having only a flag.  So formally all I did
was increase the max from 0 to 10.

I wasn't so sure about the idempotent nature of Sub transaction
processing, so I chickened out and left that part alone.  I know of no
workflow for which that was a bottleneck.

AtEOXact_release is oddly indented because that makes the patch
smaller and easier to read.

This makes the non "-1" restore of large dumps very much faster (and
makes them faster than "-1" restores, as well)

I added a "create temp table foo (x integer) on commit drop;" line to
the default pgbench transaction and tested that.  I was hoping to see
a performance improvement there was well (the transaction has ~110
entries in the RelationIdCache at eoxact each time), but the
performance was too variable (probably due to the intense IO it
causes) to detect any changes.  At least it is not noticeably slower.
If I hack pgbench to bloat the RelationIdCache by touching 20,000
useless tables as part of the connection start up process, then this
patch does show a win.

It is not obvious what value to set the MAX list size to.  Since this
array is only allocated once per back-end, and since it not groveled
through to invalidate relations at each invalidation, there is no
particular reason it must be small.  But if the same table is assigned
new filenodes (or forced index lists, whatever those are) repeatedly
within a transaction, the list could become bloated with replicate
entries, potentially becoming even larger than the hash table whose
scan it is intended to short-cut.

In any event, 10 seems to be large enough to overcome the currently
known bottle-neck.  Maybe 100 would be a more principled number, as
that is about where the list could start to become as big as the basal
size of the RelationIdCache table.

I don't think this patch replaces having some mechanism for
restricting how large RelationIdCache can get or how LRU entries in it
can get as Robert suggested.  But this approach seems like it is
easier to implement and agree upon; and doesn't preclude doing other
optimizations later.

Cheers,

Jeff


relcache_list_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] splitting *_desc routines

2012-11-23 Thread Alvaro Herrera
Robert Haas escribió:
> On Thu, Oct 25, 2012 at 4:25 AM, Simon Riggs  wrote:
> > I'd put these in a separate directory to avoid annoyance. Transam is
> > already too large.
> 
> +1.  A further advantage of that is that it means that that everything
> in that new directory will be intended to end up as
> all-separately-compilable, which should make it easier to remember
> what the rules are, and easier to design an automated framework that
> continuously verifies that it still works that way.

So incorporating these ideas, the layout now looks like this:

./commands/seq_desc.c
./commands/dbase_desc.c
./commands/tablespace_desc.c
./catalog/storage_desc.c
./utils/cache/relmap_desc.c
./access/rmgrdesc/mxact_desc.c
./access/rmgrdesc/spgdesc.c
./access/rmgrdesc/xact_desc.c
./access/rmgrdesc/heapdesc.c
./access/rmgrdesc/tupdesc.c
./access/rmgrdesc/xlog_desc.c
./access/rmgrdesc/gistdesc.c
./access/rmgrdesc/clog_desc.c
./access/rmgrdesc/hashdesc.c
./access/rmgrdesc/gindesc.c
./access/rmgrdesc/nbtdesc.c
./storage/ipc/standby_desc.c

There are two files that are two subdirs down from the backend
directory:

./storage/ipc/standby_desc.c
./utils/cache/relmap_desc.c

We could keep them in there, or we could alternatively create more
"rmgrdesc" subdirs for them, so

./storage/rmgrdesc/standby_desc.c
./utils/rmgrdesc/relmap_desc.c

This approach appears more future-proof if we ever want to create desc
routines in other subdirs in storage/ and utils/, though it's a bit
annoying to have subdirs that will only hold a single file each (and a
very tiny file to boot).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-23 Thread Kevin Grittner
Alvaro Herrera wrote:

> Are you posting an updated patch?

Well, there wasn't exactly a consensus on what should change, so I'll
throw some initial review comments out even though I still need to
check some things in the code and do more testing.

The patch applied cleanly, compiled without warnings, and passed all
tests in `make check-world`.

The previous discussion seemed to me to achieve consensus on the need
for the feature, but a general dislike for adding so many new GUC
settings to configure it. I concur on that. I agree with the feelings
of others that just using deadlock_timeout / 10 (probably clamped to
a minimum of 10ms) would be good instead of
autovacuum_truncate_lock_check. I agree that the values of the other
two settings probably aren't too critical as long as they are fairly
reasonable, which I would define as being 20ms to 100ms with with
retries lasting no more than 2 seconds.  I'm inclined to suggest a
total time of deadlock_timeout * 2 clamped to 2 seconds, but maybe
that is over-engineering it.

There was also a mention of possibly inserting a vacuum_delay_point()
call outside of the AccessExclusiveLock. I don't feel strongly about
it, but if the page scanning cost can be tracked easily, I guess it
is better to do that.

Other than simplifying the code based on eliminating the new GUCs,
the coding issues I found were:

TRUE and FALSE were used as literals.  Since true and false are used
about 10 times as often and current code in the modified files is
mixed, I would recommend the lowercase form. We decided it wasn't
worth the trouble to convert the minority usage over, but I don't see
any reason to add new instances of the minority case.

In LockHasWaiters() the partition lock is acquired using
LW_EXCLUSIVE. I don't see why that can't be LW_SHARED. Was there a
reason for this, or was it just not changed after copy/paste?

I still need to review the timing calls, since I'm not familiar with
them so it wasn't immediately obvious to me whether they were being
used correctly. I have no reason to believe that they aren't, but
feel I should check.

Also, I want to do another pass looking just for off-by-one errors on
blkno. Again, I have no reason to believe that there is a problem; it
just seems like it would be a particularly easy type of mistake to
make and miss when a key structure has this field:

  BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */

And I want to test more.

But if a new version could be created based on the above, that would
be great. Chances seem relatively slim at this point that there will
be much else to change, if anything.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add -Wlogical-op to standard compiler options?

2012-11-23 Thread Peter Eisentraut
On Wed, 2012-11-21 at 11:46 -0800, Jeff Janes wrote:
> Using gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4),  I get dozens of
> warnings, all apparently coming from somewhere in the MemSet macro. 

OK, reverted.

It looks like they dialed it down to a more useful volume in GCC 4.5,
but that doesn't really help us.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Deparsing DDL command strings

2012-11-23 Thread Dimitri Fontaine
Pavel Stehule  writes:
> * some wrong in deparsing - doesn't support constraints
>
> postgres=# alter table a add column bbbsss int check (bbbsss > 0);
> NOTICE:  event: ddl_command_start, context: TOPLEVEL, tag: ALTER
> TABLE, operation: ALTER, type: TABLE, schema: , name: 
> NOTICE:  command: 
> NOTICE:  event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE,
> operation: ALTER, type: TABLE, schema: public, name: a
> NOTICE:  command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ;
> ALTER TABLE

So apparently to be able to decipher what ALTER TABLE did actually do
you need more than just hook yourself after transformAlterTableStmt()
have been done, because the interesting stuff is happening in the alter
table "work queue", see ATController in src/backend/commands/tablecmds.c
for details.

Exporting that data, I'm now able to implement constraint rewriting this
way:

alter table a add column bbbsss int check (bbbsss > 0);
NOTICE:  AT_AddConstraint: a_bbbsss_check
NOTICE:  event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE, 
operation: ALTER, type: TABLE, schema: public, name: a
NOTICE:  command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ADD 
CONSTRAINT a_bbbsss_check CHECK ((bbbsss > 0));
ALTER TABLE

I did publish that work on the github repository (that I rebase every
time I'm pulling from master, beware):

  https://github.com/dimitri/postgres/compare/evt_add_info

This implementation also shows to me that it's not possible to get the
command string from the parsetree directly nor after the DDL transform
step if you want such things as the constraint name that's been produced
automatically by the backend. And I don't see any way to implement that
from an extension, without first patching the backend.

As that's the kind of code we want to be able to break at will in
between releases (or to fix an important bug in a minor update), I think
we need to have the facility to provide users with the normalized
command string in core.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-23 Thread Josh Kupershmidt
On Wed, Nov 21, 2012 at 5:48 AM, Karl O. Pinc  wrote:

>> On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc  wrote:
>> > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

> OT:
> After looking at the code I found a number of  "conflicting"
> option combinations are not tested for or rejected.   I can't
> recall what they are now.  The right way to fix these would be
> to send in a separate patch for each "conflict" all attached
> to one email/commitfest item?  Or one patch that just gets
> adjusted until it's right?

Typically I think it's easiest for the reviewer+committer to consider
a bunch of such similar changes altogether in one patch, rather than
in a handful of separate patches, though that's just my own
preference.

>> > 
>> >
>> > More serious objections were raised regarding semantics.
>> >
>> > What if, instead, the initial structure looked like:
>> >
>> > CREATE TABLE schemaA.foo
>> >   (id PRIMARY KEY, data INT);
>> >
>> > CREATE TABLE schemaB.bar
>> >   (id INT CONSTRAINT "bar_on_foo" REFERENCES foo
>> >  , moredata INT);
>> >
>> > With a case like this, in most real-world situations, you'd
>> > have to use pg_restore with --disable-triggers if you wanted
>> > to use --data-only and --truncate-tables.  The possibility of
>> > foreign key referential integrity corruption is obvious.
>>
>> Why would --disable-triggers be used in this example? I don't think
>> you could use --truncate-tables to restore only table "foo", because
>> you would get:
>>
>>   ERROR:  cannot truncate a table referenced in a foreign key
>> constraint
>>
>> (At least, I think so, not having tried with the actual patch.) You
>> could use --disable-triggers when restoring "bar".
>
> I tried it and you're quite right.  (I thought I'd tried this
> before, but clearly I did not -- proving the utility of the review
> process.  :-)  My assumption was that since triggers
> were turned off that constraints, being triggers, would be off as
> well and therefore tables with foreign keys could be truncated.
> Obviously not, since the I get the above error.
>
> I just tried it.  --disable-triggers does not disable constraints.

Just to be clear, I believe the problem in this example is that
--disable-triggers does not disable any foreign key constraints of
other tables when you are restoring a single table. So with:

  pg_restore -1 --truncate-tables --disable-triggers --table=foo \
  --data-only my.dump ...

you would get the complaint

  ERROR:  cannot truncate a table referenced in a foreign key constraint

which is talking about bar's referencing foo, and there was no

  ALTER TABLE bar DISABLE TRIGGER ALL

done, since "bar" isn't being restored. I don't have a quibble with
this existing behavior -- you are instructing pg_restore to only mess
with "bar", after all. See below, though, for a comparison of --clean
and --truncate-tables when restoring "foo" and "bar" together.

>> For your first example, --truncate-tables seems to have some use, so
>> that the admin isn't forced to recreate various views which may be
>> dependent on the table. (Though it's not too difficult to work around
>> this case today.)
>
> As an aside: I never have an issue with this, having planned ahead.
> I'm curious what the not-too-difficult work-around is that you have
> in mind.  I'm not coming up with a tidy way to, e.g, pull a lot
> of views out of a dump.

Well, for the first example, there was one table and only one view
which depended on the table, so manually editing the list file like
so:

  pg_restore --list --file=my.dump > output.list
  # manually edit file "output.list", select only entries pertaining
  # to the table and dependent view
  pg_restore -1 --clean --use-list=output.list ...

isn't too arduous, though it would become so if you had more dependent
views to worry about.

I'm willing to count this use-case as a usability win for
--truncate-tables, since with that option the restore can be boiled
down to a short and sweet:

  pg_restore -1 --data-only --truncate-tables --table=mytable my.dump ...

Though note this may not prove practical for large tables, since
--data-only leaves constraints and indexes intact during the restore,
and can be massively slower compared to adding the constraints only
after the data has been COPYed in, as pg_restore otherwise does.

>> For the second example involving FKs, I'm a little bit more hesitant
>> about  endorsing the use of --truncate-tables combined with
>> --disable-triggers to potentially allow bogus FKs. I know this is
>> possible to some extent today using the --disable-triggers option,
>> but
>> it makes me nervous to be adding a mode to pg_restore if it's
>> primarily designed to work together with --disable-triggers as a
>> larger foot-gun.
>
> This is the crux of the issue, and where I was thinking of
> taking this patch.  I very often am of the mindset that
> foreign keys are no more or less special than other
> more complex data integrity rules enforced with triggers.
> (I suppose others 

Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-23 Thread Amit kapila
On Friday, November 23, 2012 11:15 AM Pavan Deolasee wrote:
On Thu, Nov 22, 2012 at 2:05 PM, Amit Kapila  wrote:



>>>Sorry, I haven't followed this thread at all, but the numbers (43171 and 
>>>57920) in the last two runs of @mv-free-list for 32 clients look 
>>>aberrations, no ?  I wonder if
>>>that's skewing the average.

>>Yes, that is one of the main reasons, but in all runs this is consistent that 
>>for 32 clients or above this kind of numbers  are observed.
>>Even Jeff has pointed the similar thing in one of his mails and suggested to 
>>run the tests such that first test should run “with patch” and then “without 
>>patch”. 
>>After doing what he suggested the observations are still similar.


>Are we convinced that the jump that we are seeing is a real one then ? 

  Still not convinced, as the data has been collected in only my setup. 

>I'm a bit surprised because it happens only with the patch and only for 32 
>clients. How >would you explain that ?

The reason this patch can improve performance is due to reduce contention for 
BufFreeListLock and PartitionLock (which it takes in BufferAlloc a. to remove 
old page from buffer or b. to see if block is already in buffer pool) in 
backends. As the number of backends increase the chances of improved 
performance is much better. In particular for 32 clients when tests run for 
longer time results are not that skewed.

For 32 clients, as mentioned in previous mail when the test has ran for 1 hr, 
the differrence is not very skewed.

 32 client /32 thread for 1 hour 
@mv-free-lst@9.3devl 
Single-run:9842.019229  8050.357981 

>>> I also looked at the the Results.htm file down thread. There seem to be a 
>>> steep degradation when the shared buffers are increased from 5GB to 10GB, 
>>> both with and 
>>> without the patch. Is that expected ? If so, isn't that worth investigating 
>>> and possibly even fixing before we do anything else ?

>> The reason for decrease in performance is that when shared buffers are 
>> increased from 5GB to 10GB, the I/O starts as after increasing it cannot 
>> hold all
>> the data in OS buffers.


>Shouldn't that data be in the shared buffers if not the OS cache and hence 
>approximately same IO will be required?

I don't think so as the data in OS cache or PG Shared buffers doesn't have any 
direct relation, OS can flush its buffers based on its scheduler algorithm.

Let us try to see by example:
Total RAM - 22G
Database size - 16G

Case -1 (Shared Buffers - 5G)
a. Load all the files in OS buffers. Chances are good that all 16G data will be 
there in OS buffers as OS has still 17G of memory available.
b. Try to load all in Shared buffers. Last 5G will be there in shared buffers.
c. Chances are high that remaining 11G buffers access will not lead to IO as 
they will be in OS buffers.

Case -2 (Shared Buffers - 10G)
a. Load all the files in OS buffers. In best case OS buffers can contain10-12G 
data as OS has 12G of memory available.
b. Try to load all in Shared buffers. Last 10G will be there in shared buffers.
c. Now as there is no direct correlation of data between Shared Buffers and OS 
buffers, so whenever PG has to access any data
   which is not there in Shared Buffers, good chances are there that it can 
lead to IO.


> Again, the drop in the performance is so severe that it seems worth 
> investigating that further, especially because you can reproduce it reliably.

   Yes, I agree that it is worth investigating, but IMO this is a different 
problem which might not be addressed with the Patch in discussion. 
The 2 reasons I can think for dip in performance when Shared Buffers 
increase beyond certain threshhold percentage of RAM are, 
   a. either the algorithm of Buffer Management has some bottleneck
   b. due to the way data is managed in Shared Buffers and OS buffer cache

Any Suggestion/Comments?

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state

2012-11-23 Thread Heikki Linnakangas

On 23.11.2012 19:55, Tom Lane wrote:

Heikki Linnakangas  writes:

Committed, thanks.


Doesn't seem to have been pushed to master?


On purpose. Per commit message:


9.3 doesn't have this problem because XLogRecPtr is now a single 64-bit
integer, so XLogRecPtrIsInvalid() does the right thing. Apply to 9.2, and
9.1 where pg_stat_replication view was introduced.


I considered applying it to master anyway, just to keep the branches in 
sync. But XLogRecPtrIsInvalid(var) seems slightly more readable than 
XLByteEQ(var, InvalidXLogRecPtr), so I decided not to.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state

2012-11-23 Thread Tom Lane
Heikki Linnakangas  writes:
> Committed, thanks.

Doesn't seem to have been pushed to master?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Deparsing DDL command strings

2012-11-23 Thread Pavel Stehule
Hello

2012/11/22 Dimitri Fontaine :
> Hi,
>
> Thank you Pavel for reviewing my patch! :)
>
> Pavel Stehule  writes:
>> * issues
>>
>> ** missing doc
>
> Yes. I wanted to reach an agreement on why we need the de parsing for.
> You can read the Last comment we made about it at the following link, I
> didn't have any answer to my proposal.
>
>   http://archives.postgresql.org/message-id/m2391fu79m@2ndquadrant.fr
>
> In that email, I said:
>> Also, I'm thinking that publishing the normalized command string is
>> something that must be maintained in the core code, whereas the external
>> stable format might be done as a contrib extension, coded in C, working
>> with the Node *parsetree. Because it needs to ouput a compatible format
>> when applied to different major versions of PostgreSQL, I think it suits
>> quite well the model of C coded extensions.
>
> I'd like to hear what people think about that position. I could be
> working on the docs meanwhile I guess, but a non trivial amount of what
> I'm going to document is to be thrown away if we end up deciding that we
> don't want the normalized command string…
>

I agree with you, so for PL languages just plain text is best format -
it is enough for all tasks that, I can imagine, can be solved by all
supported PL. And for complex tasks - exported parsetree is perfect.

My starting point is strategy, so everything should by possible from
PL/pgSQL, that is our most used PL - and there any complex format is
bad - the most work is doable via plan text and regular expressions.

There is precedent - EXPLAIN - we support more formats, but in PL, we
use usually just plain format.



>> ** statements are not deparsd for ddl_command_start event
>
> Yes, that's by design, and that's why we have the ddl_command_trace
> event alias too. Tom is right in saying that until the catalogs are
> fully populated we can not rewrite most of the command string if we want
> normalized output (types, constraints, namespaces, all the jazz).
>

ok

>> ** CREATE FUNCTION is not supported
>
> Not yet. The goal of this patch in this CF is to get a green light on
> providing a full implementation of what information to add to event
> triggers and in particular about rewriting command strings.

I don't have a objection. Any other ???

Regards

Pavel

>
> That said, if you think it would help you as a reviewer, I may attack
> the CREATE FUNCTION support right now.
>
>> * some wrong in deparsing - doesn't support constraints
>>
>> postgres=# alter table a add column bbbsss int check (bbbsss > 0);
>> NOTICE:  event: ddl_command_start, context: TOPLEVEL, tag: ALTER
>> TABLE, operation: ALTER, type: TABLE, schema: , name: 
>> NOTICE:  command: 
>> NOTICE:  event: ddl_command_end, context: TOPLEVEL, tag: ALTER TABLE,
>> operation: ALTER, type: TABLE, schema: public, name: a
>> NOTICE:  command: ALTER TABLE public.a ADD COLUMN bbbsss pg_catalog.int4, ;
>> ALTER TABLE
>
> Took me some time to figure out how the constraint processing works in
> CREATE TABLE, and apparently I didn't finish ALTER TABLE support yet.
> Working on that, thanks.
>



> Regards,
> --
> Dimitri Fontaine
> http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] False indication in pg_stat_replication.sync_state

2012-11-23 Thread Heikki Linnakangas

On 09.11.2012 15:28, Fujii Masao wrote:

On Fri, Nov 9, 2012 at 4:06 AM, Alvaro Herrera  wrote:

Fujii Masao escribió:

On Fri, Oct 19, 2012 at 10:29 PM, Fujii Masao  wrote:



However, I've forgotten to treat other three portions in
walsender.c and syncrep.c also does XLogRecPtrIsInvalid().  This new patch includes the
changes for them.


Good catch.


Does any commiter pick up this?


If not, please add to next commitfest so that we don't forget.


Yep, I added this to next CF. This is just a bug fix, so please feel free to
pick up this even before CF.


Committed, thanks.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further pg_upgrade analysis for many tables

2012-11-23 Thread Bruce Momjian
On Thu, Nov 15, 2012 at 07:05:00PM -0800, Jeff Janes wrote:
> On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane  wrote:
> > Jeff Janes  writes:
> >> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane  wrote:
> >>> There are at least three ways we could whack that mole: ...
> >>>
> >>> * Keep a separate list (or data structure of your choice) so that
> >>> relcache entries created in the current xact could be found directly
> >>> rather than having to scan the whole relcache.  That'd add complexity
> >>> though, and could perhaps be a net loss for cases where the relcache
> >>> isn't so bloated.
> >
> >> Maybe a static list that can overflow, like the ResourceOwner/Lock
> >> table one recently added.  The overhead of that should be very low.
> >
> >> Are the three places where "need_eoxact_work = true;" the only places
> >> where things need to be added to the new structure?
> >
> > Yeah.  The problem is not so much the number of places that do that,
> > as that places that flush entries from the relcache would need to know
> > to remove them from the separate list, else you'd have dangling
> > pointers.
> 
> If the list is of hash-tags rather than pointers, all we would have to
> do is ignore entries that are not still in the hash table, right?
> 
> 
> On a related thought, is a shame that "create temp table on commit
> drop" sets "need_eoxact_work", because by the time we get to
> AtEOXact_RelationCache upon commit, the entry is already gone and so
> there is actual work to do (unless a non-temp  table was also
> created).  But on abort, the entry is still there.  I don't know if
> there is an opportunity for optimization there for people who use temp
> tables a lot.  If we go with a caching list, that would render it moot
> unless they use so many as to routinely overflow the cache.

I added the attached C comment last year to mention why temp tables are
not as isolated as we think, and can't be optimized as much as you would
think.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
commit f458c90bff45ecae91fb55ef2b938af37d977af3
Author: Bruce Momjian 
Date:   Mon Sep 5 22:08:14 2011 -0400

Add C comment about why we send cache invalidation messages for
session-local objects.

diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
new file mode 100644
index 337fe64..98dc3ad
*** a/src/backend/utils/cache/inval.c
--- b/src/backend/utils/cache/inval.c
*** ProcessCommittedInvalidationMessages(Sha
*** 812,817 
--- 812,821 
   * about CurrentCmdInvalidMsgs too, since those changes haven't touched
   * the caches yet.
   *
+  * We still send invalidation messages for session-local objects to other
+  * backends because, while other backends cannot see any tuples, they can
+  * drop tables that are session-local to another session.
+  * 
   * In any case, reset the various lists to empty.  We need not physically
   * free memory here, since TopTransactionContext is about to be emptied
   * anyway.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-23 Thread Fujii Masao
On Fri, Nov 23, 2012 at 6:56 PM, Amit Kapila  wrote:
>> When I remove postgresql.auto.conf, SET PERSISTENT failed.
>>
>> =# SET PERSISTENT synchronous_commit = 'local';
>> ERROR:  failed to open "postgresql.auto.conf" file
>
> There can be 2 ways to handle this, either we recreate the
> "postgresql.auto.conf" file or give error.
> I am not sure if user tries to delete internal files, what should be exact
> behavior?
> Any suggestion?

I prefer to recreate it.

$PGDATA/config_dir is specified in include_dir by default. Users might
create and remove the configuration files in that directory many times.
So I'm not surprised even if a user accidentally removes
postgresql.auto.conf in that directory. Also users might purposely remove
that file to reset all the settings by SET PERSISTENT. So I think that SET
PERSISTENT should handle the case where postgresql.auto.conf doesn't
exist.

We might be able to expect that postgresql.auto.conf is not deleted by
a user if it's in $PGDATA/global or base directory.

>> We should implement "RESET PERSISTENT"? Otherwise, there is no way
>> to get rid of the parameter setting from postgresql.auto.conf, via SQL.
>> Also
>> We should implement "SET PERSISTENT name TO DEFAULT"?
>
> Till now, I have not implemented this in patch, thinking that it can be done
> as a 2nd part if basic stuff is ready.
> However I think you are right without one of "RESET PERSISTENT" or "SET
> PERSISTENT name TO DEFAULT", it is difficult for user
> to get rid of parameter.
> Will "SET PERSISTENT name TO DEFAULT" be sufficient or do you think both are
> necessary, because RESET PERSISTENT also internally might need
> to behave similarly.
>
> For implementation of "SET PERSISTENT name TO DEFAULT", there can be 2 ways
> 1) Delete the entry from postgresql.auto.conf
> 2) Update the entry value in postgresql.auto.conf to default value

Both seems to be useful. I think that "SET ... TO DEFAULT" is suitable for
2), and "RESET PERSISTENT ..." is suitable for 1).


Another comment is:

What happens if the server crashes while SET PERSISTENT is writing the
setting to the file? A partial write occurs and restart of the server would fail
because of corrupted postgresql.auto.conf?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Re] [Re] Re: [HACKERS] PANIC: could not write to log file

2012-11-23 Thread Tom Lane
"Cyril VELTER"  writes:
>I follow up on my previous message. Just got two more crash today very 
> similar to the first ones :

> PANIC:  could not write to log file 118, segment 237 at offset 2686976, 
> length 5578752: Invalid argument
> STATEMENT:  COMMIT
> PANIC:  could not write to log file 118, segment 227 at offset 9764864, 
> length 57344: Invalid argument
> STATEMENT:  COMMIT

> for memory the first ones are 

> PANIC:  could not write to log file 117, segment 117 at offset 5660672, 
> length 4096000: Invalid argument
> STATEMENT:  COMMIT
> PANIC:  could not write to log file 118, segment 74 at offset 12189696, 
> length 475136: Invalid argument
> STATEMENT:  COMMIT

Hm, those write lengths seem rather large.  What have you got
wal_buffers set to?  Does it help if you back it off to whatever you
were using in the 8.2 installation?  (The default back in 8.2 days
seems to have been 64kB, if that helps.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)

2012-11-23 Thread Tom Lane
Alvaro Herrera  writes:
> If the bgworker developer gets really tense about this stuff (or
> anything at all, really), they can create a completely new sigmask and
> do sigaddset() etc.  Since this is all C code, we cannot keep them from
> doing anything, really; I think what we need to provide here is just a
> framework to ease development of simple cases.

An important point here is that if a bgworker does need to do its own
signal manipulation --- for example, installing custom signal handlers
--- it would be absolutely catastrophic for us to unblock signals before
reaching worker-specific code; signals might arrive before the process
had a chance to fix their handling.  So I'm against Heikki's auto-unblock
proposal.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-11-23 Thread Alvaro Herrera
Jan,

Are you posting an updated patch?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [pgsql-cluster-hackers] Question: Can i cut NON-HOT chain Pointers if there are no concurrent updates?

2012-11-23 Thread Markus Wanner
Henning,

On 11/23/2012 03:17 PM, "Henning Mälzer" wrote:
> Can somebody help me?

Sure, but you might get better answers on the -hackers mailing list. I'm
redirecting there. The cluster-hackers one is pretty low volume and low
subscribers, I think.

> Question:
> What would be the loss if i cut NON-HOT chain Pointers, meaning i set 
> t_ctid=t_self in the case where it points to a tuple on another page?

READ COMMITTED would stop to work correctly in the face of concurrent
transactions, I guess. See the fine manual:

http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-READ-COMMITTED

The problem essentially boils down to: READ COMMITTED transactions need
to learn about tuples *newer* than what their snapshot would see.

> I am working on a project based on "postgres (PostgreSQL) 8.5devel" with the 
> code from several master thesises befor me.

Care to elaborate a bit? Can (part of) that code be released under an
open license?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-11-23 Thread Kevin Grittner
Merlin Moncure wrote:
> Kevin Grittner  wrote:
>> Merlin Moncure  wrote:
>>
>>> Hm, I bet it's possible (although probably not easy) to deduce
>>> volatility from the function body...maybe through the validator.
>>> If you could do that (perhaps warning in cases where you can't),
>>> then the performance regression-inducing-argument (which I agree
>>> with) becomes greatly ameliorated.
>>
>> For about the last 10 years the Wisconsin Courts have been parsing
>> SQL code to generate Java query classes, including "stored
>> procedures", and determining information like this. For example,
>> we set a readOnly property for the query class by examining the
>> statements in the procedure and the readOnly status of each called
>> procedure. It wasn't that hard for us, and I'm not sure what would
>> make much it harder in PostgreSQL, if we can do it where a parse
>> tree for the function is handy.
> 
> hm, what do you do about 'after the fact' changes to things the
> procedure body is pointing to? what would the server have to do?

We did a regeneration of the whole set near the end of each release
cycle, and that or smaller changes as needed during the release
cycle. Of course, we didn't have any equivalent of pg_depend.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-11-23 Thread Michael Paquier
On Sat, Nov 17, 2012 at 2:48 AM, Phil Sorber  wrote:

> On Thu, Nov 15, 2012 at 10:55 PM, Michael Paquier
>  wrote:
> > On Fri, Nov 16, 2012 at 12:34 PM, Phil Sorber  wrote:
> >> On Thu, Nov 15, 2012 at 9:23 PM, Michael Paquier
> >>  wrote:
> >> > 3) Having an output close to what ping actually does would also be
> nice,
> >> > the
> >> > current output like Accepting/Rejecting Connections are not that
> >>
> >> Could you be more specific? Are you saying you don't want to see
> >> accepting/rejecting info output?
> >
> > OK sorry.
> >
> > I meant something like that for an accessible server:
> > $ pg_ping -c 3 -h server.com
> > PING server.com (192.168.1.3)
> > accept from 192.168.1.3: icmp_seq=0 time=0.241 ms
> > accept from 192.168.1.3: icmp_seq=0 time=0.240 ms
> > accept from 192.168.1.3: icmp_seq=0 time=0.242 ms
> >
> > Like that for a rejected connection:
> > reject from 192.168.1.3: icmp_seq=0 time=0.241 ms
> >
> > Like that for a timeout:
> > timeout from 192.168.1.3: icmp_seq=0
> > Then print 1 line for each ping taken to stdout.
>
> How does icmp_seq fit into this? Or was that an oversight?
>
> Also, in standard ping if you don't pass -c it will continue to loop
> until interrupted. Would you suggest that pg_ping mimic that, or that
> we add an additional flag for that behavior?
>
> FWIW, I would use 'watch' with the existing output for cases that I
> would need something like that.
>
We waited a couple of days for feedback for this feature. So based on all
the comments provided by everybody on this thread, perhaps we should move
on and implement the options that would make pg_ping a better wrapper for
PQPing. Comments?
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] fix ecpg core dump when there's a very long struct variable name in .pgc file

2012-11-23 Thread Michael Meskes
On Thu, Nov 22, 2012 at 06:09:20PM +0800, Chen Huajun wrote:
> When use a struct variable whose name length is very very long such as 12KB 
> in .pgc source,
> ecpg will core dump because of buffer overflow if precompile the .pgc file.

How on earth did you run into this? :)

I absolutely agree that this is better be fixed and cjust committed the second
version of your patch.

Thanks.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)

2012-11-23 Thread Heikki Linnakangas

On 15.11.2012 17:16, Heikki Linnakangas wrote:

On 15.11.2012 16:55, Tom Lane wrote:

Heikki Linnakangas writes:

This is a fairly general issue, actually. Looking around, I can see at
least two similar cases in existing code, with BasicOpenFile, where we
will leak file descriptors on error:


Um, don't we automatically clean those up during transaction abort?


Not the ones allocated with PathNameOpenFile or BasicOpenFile. Files
allocated with AllocateFile() and OpenTemporaryFile() are cleaned up at
abort.


If we don't, we ought to think about that, not about cluttering calling
code with certain-to-be-inadequate cleanup in error cases.


Agreed. Cleaning up at end-of-xact won't help walsender or other
non-backend processes, though, because they don't do transactions. But a
top-level ResourceOwner that's reset in the sigsetjmp() cleanup routine
would work.


This is what I came up with. It adds a new function, OpenFile, that 
returns a raw file descriptor like BasicOpenFile, but the file 
descriptor is associated with the current subtransaction and 
automatically closed at end-of-xact, in the same way that AllocateFile 
and AllocateDir work. In other words, OpenFile is to open() what 
AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw 
fd and it's solely the caller's responsibility to close it, but many of 
the places that used to call BasicOpenFile now use the safer OpenFile 
function instead.


This patch plugs three existing fd (or virtual fd) leaks:

1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile
2. XLogFileLinit() - fixed by adding close() calls to the error cases. 
Can't use OpenFile here because the fd is supposed to persist over 
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenFile instead of 
PathNameOpenFile.


In addition, this replaces many BasicOpenFile() calls with OpenFile() 
that were not leaking, because the code meticulously closed the file on 
error. That wasn't strictly necessary, but IMHO it's good for robustness.


One thing I'm not too fond of is the naming. I'm calling the new 
functions OpenFile and CloseFile. There's some danger of confusion 
there, as the function to close a virtual file opened with 
PathNameOpenFile is called FileClose. OpenFile is really the same kind 
of operation as AllocateFile and AllocateDir, but returns an unbuffered 
fd. So it would be nice if it was called AllocateSomething, too. But 
AllocateFile is already taken. And I don't much like the Allocate* 
naming for these anyway, you really would expect the name to contain "open".


Do we want to backpatch this? We've had zero complaints, but this seems 
fairly safe to backpatch, and at least the leak in copy_file() can be 
quite annoying. If you run out of disk space in CREATE DATABASE, the 
target file is kept open even though it's deleted, so the space isn't 
reclaimed until you disconnect.


- Heikki
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index dd69c23..cd60dd8 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -531,7 +531,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruFlush fdata)
 		int			i;
 
 		for (i = 0; i < fdata->num_files; i++)
-			close(fdata->fd[i]);
+			CloseFile(fdata->fd[i]);
 	}
 
 	/* Re-acquire control lock and update page state */
@@ -593,7 +593,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	 * SlruPhysicalWritePage).	Hence, if we are InRecovery, allow the case
 	 * where the file doesn't exist, and return zeroes instead.
 	 */
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
+	fd = OpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
 	if (fd < 0)
 	{
 		if (errno != ENOENT || !InRecovery)
@@ -614,7 +614,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	{
 		slru_errcause = SLRU_SEEK_FAILED;
 		slru_errno = errno;
-		close(fd);
+		CloseFile(fd);
 		return false;
 	}
 
@@ -623,11 +623,11 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	{
 		slru_errcause = SLRU_READ_FAILED;
 		slru_errno = errno;
-		close(fd);
+		CloseFile(fd);
 		return false;
 	}
 
-	if (close(fd))
+	if (CloseFile(fd))
 	{
 		slru_errcause = SLRU_CLOSE_FAILED;
 		slru_errno = errno;
@@ -740,7 +740,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		 * don't use O_EXCL or O_TRUNC or anything like that.
 		 */
 		SlruFileName(ctl, path, segno);
-		fd = BasicOpenFile(path, O_RDWR | O_CREAT | PG_BINARY,
+		fd = OpenFile(path, O_RDWR | O_CREAT | PG_BINARY,
 		   S_IRUSR | S_IWUSR);
 		if (fd < 0)
 		{
@@ -773,7 +773,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		slru_errcause = SLRU_SEEK_FAILED;
 		slru_errno = errno;
 		if (!fdata)
-			close(fd);
+			CloseFile(fd);
 		return false;
 	}
 
@@ -786,7 +786,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		slru_errcause = SLRU_WRITE_FAILED;
 		slru_e

Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-23 Thread Amit Kapila
>>Shouldn't that data be in the shared buffers if not the OS cache and hence
approximately same IO will be required?

 

>I don't think so as the data in OS cache or PG Shared buffers doesn't have
any direct relation, >OS can flush its buffers based on its scheduler
algorithm.

 

>Let us try to see by example:

>Total RAM - 22G

>Database size - 16G

 

>Case -1 (Shared Buffers - 5G)

>a. Load all the files in OS buffers. Chances are good that all 16G data
will be there in OS >buffers as OS has still 17G of memory available.

>b. Try to load all in Shared buffers. Last 5G will be there in shared
buffers.

>c. Chances are high that remaining 11G buffers access will not lead to IO
as they will be in OS >buffers.

 

>Case -2 (Shared Buffers - 10G)

>a. Load all the files in OS buffers. In best case OS buffers can
contain10-12G data as OS has >12G of memory available.

>b. Try to load all in Shared buffers. Last 10G will be there in shared
buffers.

>c. Now as there is no direct correlation of data between Shared Buffers and
OS buffers, so >whenever PG has to access any data

>   which is not there in Shared Buffers, good chances are there that it can
lead to IO.

 

 

>> Again, the drop in the performance is so severe that it seems worth
investigating that further, especially because you can reproduce it
reliably.

 

>   Yes, I agree that it is worth investigating, but IMO this is a different
problem which might >not be addressed with the Patch in discussion. 

>The 2 reasons I can think for dip in performance when Shared Buffers
increase beyond certain > threshhold percentage of RAM are, 

> a. either the algorithm of Buffer Management has some bottleneck

>   b. due to the way data is managed in Shared Buffers and OS buffer cache

 

The point I want to tell is explained at below link as well.

http://blog.kimiensoftware.com/2011/05/postgresql-vs-oracle-differences-4-sh
ared-memory-usage-257

 

So if above is true, I think the performance will regain if in the test
Shared Buffers are set to 16G. I shall once try that setting for test run.

 

With Regards,

Amit Kapila.

 



Re: [HACKERS] PQconninfo function for libpq

2012-11-23 Thread Magnus Hagander
On Fri, Nov 23, 2012 at 6:30 AM, Fujii Masao  wrote:
> On Thu, Nov 22, 2012 at 10:05 PM, Boszormenyi Zoltan  wrote:
>> 2012-11-22 12:44 keltezéssel, Magnus Hagander írta:
 Also, a question was buried in the other review which is - are we OK
 to remove the requiressl parameter. Both these patches do so, because
 the code becomes much simpler if we can do that. It has been
 deprecated since 7.2. Is it OK to remove it, or do we need to put
 back
 in the more complex code to deal with both?
>>
>> Just going to highlight that we're looking for at least one third
>> party to comment on this :)
>
>
> Yes, me too. A +1 for removing wouldn't count from me. ;-)
>
> +1

Actually, with the cleaner code that resulted from the rewrite,
reintroducing it turned out to be pretty darn simple - if I'm not
missing something. PFA a patch that comes *with* requiressl=
support, without making the code that much more complex.

It also fixes the fact that pg_service.conf was broken by the previous one.

It also includes the small fixes from Zoltans latest round (the one
that was for libpq, not the one for pg_receivexlog that turned out to
be wrong).

And a pgindent run :)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


PQconninfo-mha-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: plpgsql return a row-expression

2012-11-23 Thread Pavel Stehule
Hello

2012/11/23 Asif Rehman :
> Hi,
>
> I forgot to add documentation changes in the earlier patch. In the attached
> patch, I have added documentation as well as fixed other issues you raised.
>

ok

so

* applied and compiled cleanly
* All 133 tests passed.
* there are doc
* there are necessary regress tests
* automatic conversion is works like plpgsql user expects
* there are no performance issue now
* code respects our coding standards
+ code remove redundant lines

I have no any objection

this patch is ready to commit!

Regards

Pavel




> Regards,
> --Asif
>
>
> On Thu, Nov 22, 2012 at 10:47 PM, Asif Rehman 
> wrote:
>>
>> Thanks for the review Pavel. I have taken care of the points you raised.
>> Please see the updated patch.
>>
>> Regards,
>> --Asif
>>
>>
>> On Wed, Nov 21, 2012 at 1:47 AM, Pavel Stehule 
>> wrote:
>>>
>>> related to
>>> http://archives.postgresql.org/message-id/caauglxwpedfwae6dajmf7sxewfusa0f68p07retbbpf_fsa...@mail.gmail.com
>>>
>>> * patched and compiled withou warnings
>>>
>>> * All 133 tests passed.
>>>
>>>
>>> but
>>>
>>> I don't like
>>>
>>> * call invalid function from anonymous block - it is messy (regress
>>> tests) - there is no reason why do it
>>>
>>> +create or replace function foo() returns footype as $$
>>> +declare
>>> +  v record;
>>> +  v2 record;
>>> +begin
>>> +  v := (1, 'hello');
>>> +  v2 := (1, 'hello');
>>> +  return (v || v2);
>>> +end;
>>> +$$ language plpgsql;
>>> +DO $$
>>> +declare
>>> +  v footype;
>>> +begin
>>> +  v := foo();
>>> +  raise info 'x = %', v.x;
>>> +  raise info 'y = %', v.y;
>>> +end; $$;
>>> +ERROR:  operator does not exist: record || record
>>> +LINE 1: SELECT (v || v2)
>>> +  ^
>>>
>>> * there is some performance issue
>>>
>>> create or replace function fx2(a int)
>>> returns footype as $$
>>> declare x footype;
>>> begin
>>>   x = (10,20);
>>>   return x;
>>> end;
>>> $$ language plpgsql;
>>>
>>> postgres=# select sum(fx2.x) from generate_series(1,10) g(i),
>>> lateral fx2(i);
>>>sum
>>> -
>>>  100
>>> (1 row)
>>>
>>> Time: 497.129 ms
>>>
>>> returns footype as $$
>>> begin
>>>   return (10,20);
>>> end;
>>> $$ language plpgsql;
>>>
>>> postgres=# select sum(fx2.x) from generate_series(1,10) g(i),
>>> lateral fx2(i);
>>>sum
>>> -
>>>  100
>>> (1 row)
>>>
>>> Time: 941.192 ms
>>>
>>> following code has same functionality and it is faster
>>>
>>> if (stmt->expr != NULL)
>>> {
>>> if (estate->retistuple)
>>> {
>>> TupleDesc   tupdesc;
>>> Datum   retval;
>>> Oid rettype;
>>> boolisnull;
>>> int32   tupTypmod;
>>> Oid tupType;
>>> HeapTupleHeader td;
>>> HeapTupleData   tmptup;
>>>
>>> retval = exec_eval_expr(estate,
>>>
>>> stmt->expr,
>>> &isnull,
>>>
>>> &rettype);
>>>
>>> /* Source must be of RECORD or composite type */
>>> if (!type_is_rowtype(rettype))
>>> ereport(ERROR,
>>>
>>> (errcode(ERRCODE_DATATYPE_MISMATCH),
>>>  errmsg("cannot return
>>> non-composite value from composite type returning function")));
>>>
>>> if (!isnull)
>>> {
>>> /* Source is a tuple Datum, so safe to
>>> do this: */
>>> td = DatumGetHeapTupleHeader(retval);
>>> /* Extract rowtype info and find a
>>> tupdesc */
>>> tupType = HeapTupleHeaderGetTypeId(td);
>>> tupTypmod = HeapTupleHeaderGetTypMod(td);
>>> tupdesc =
>>> lookup_rowtype_tupdesc(tupType, tupTypmod);
>>>
>>> /* Build a temporary HeapTuple control
>>> structure */
>>> tmptup.t_len =
>>> HeapTupleHeaderGetDatumLength(td);
>>> ItemPointerSetInvalid(&(tmptup.t_self));
>>> tmptup.t_tableOid = InvalidOid;
>>> tmptup.t_data = td;
>>>
>>> estate->retval =
>>> PointerGetDatum(heap_copytuple(&tmptup));
>>> estate->rettupdesc =
>>> CreateTupleDescCopy(tupdesc);
>>> ReleaseTupleDesc(tupdesc);
>>> }
>>>
>>> estate->retisnull = isnull;
>>> }
>>>
>>>
>>>
>>> * it is to restrictive (maybe) - almost all plpgsql' statements does
>>> automatic conversions (IO conversions when is necessary)
>>>

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-23 Thread Amit Kapila
On Thursday, November 22, 2012 10:09 PM Fujii Masao wrote:
> On Thu, Nov 22, 2012 at 9:38 PM, Amit kapila 
> wrote:
> > Patch to implement SET PERSISTENT command is attached with this mail.
> > Now it can be reviewed.
> 
> I got the following compile warnings:
> xact.c:1847: warning: implicit declaration of function
> 'AtEOXact_UpdateAutoConfFiles'
> guc.c:5134: warning: enumeration value 'PGC_ENUM' not handled in switch
> 
> "SET PERSISTENT name TO value" failed, though "SET PERSISTENT name =
> value"
> succeeded.
> 
> =# SET PERSISTENT synchronous_commit TO 'local';
> ERROR:  syntax error at or near "TO"
> LINE 1: SET PERSISTENT synchronous_commit TO 'local';
> =# SET PERSISTENT synchronous_commit = 'local';
> SET
> 
> SET PERSISTENT succeeded even if invalid setting value was set.
> 
> =# SET PERSISTENT synchronous_commit = 'hoge';
> SET
> 
> SET PERSISTENT syntax should be explained by "\help SET" psql command.

Thank you. I shall take care of above in next patch and do more test.
> 
> When I remove postgresql.auto.conf, SET PERSISTENT failed.
> 
> =# SET PERSISTENT synchronous_commit = 'local';
> ERROR:  failed to open "postgresql.auto.conf" file
 
There can be 2 ways to handle this, either we recreate the
"postgresql.auto.conf" file or give error.
I am not sure if user tries to delete internal files, what should be exact
behavior?
Any suggestion?

> We should implement "RESET PERSISTENT"? Otherwise, there is no way
> to get rid of the parameter setting from postgresql.auto.conf, via SQL.
> Also
> We should implement "SET PERSISTENT name TO DEFAULT"?

Till now, I have not implemented this in patch, thinking that it can be done
as a 2nd part if basic stuff is ready.
However I think you are right without one of "RESET PERSISTENT" or "SET
PERSISTENT name TO DEFAULT", it is difficult for user 
to get rid of parameter.
Will "SET PERSISTENT name TO DEFAULT" be sufficient or do you think both are
necessary, because RESET PERSISTENT also internally might need
to behave similarly.

For implementation of "SET PERSISTENT name TO DEFAULT", there can be 2 ways
1) Delete the entry from postgresql.auto.conf
2) Update the entry value in postgresql.auto.conf to default value

Incase we just do update, then user might not be able to modify
postgresql.conf manually once the command is executed.
So I think Delete is better option. Let me know if you think otherwise or
you have some other idea to achieve it.

> Is it helpful to output the notice message like 'Run pg_reload_conf() or
> restart the server if you want new settings to take effect' always after
> SET PERSISTENT command?

Not sure because if someone uses SET PERSISTENT command, he should know the
effect or behavior of same.
I think it's good to put this in UM along with "PERSISTENT" option
explanation.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQconninfo function for libpq

2012-11-23 Thread Boszormenyi Zoltan

2012-11-23 06:30 keltezéssel, Fujii Masao írta:

On Thu, Nov 22, 2012 at 10:05 PM, Boszormenyi Zoltan  wrote:

2012-11-22 12:44 keltezéssel, Magnus Hagander írta:

Also, a question was buried in the other review which is - are we OK
to remove the requiressl parameter. Both these patches do so, because
the code becomes much simpler if we can do that. It has been
deprecated since 7.2. Is it OK to remove it, or do we need to put
back
in the more complex code to deal with both?

Just going to highlight that we're looking for at least one third
party to comment on this :)


Yes, me too. A +1 for removing wouldn't count from me. ;-)

+1


Thanks. :-)




The second one is the product of what caught my attention while
I was looking at pg_receivexlog. The current coding may write
beyond the end of the allocated arrays and the password may
overwrite a previously set keyword/value pair.

ISTM that such problem doesn't happen at all because argcount is
incremented as follows.

if (dbhost)
argcount++;
if (dbuser)
argcount++;
if (dbport)
argcount++;


Right, forget about the second patch.



Regards,




--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: plpgsql return a row-expression

2012-11-23 Thread Asif Rehman
Hi,

I forgot to add documentation changes in the earlier patch. In the attached
patch, I have added documentation as well as fixed other issues you raised.

Regards,
--Asif

On Thu, Nov 22, 2012 at 10:47 PM, Asif Rehman wrote:

> Thanks for the review Pavel. I have taken care of the points you raised.
> Please see the updated patch.
>
> Regards,
> --Asif
>
>
> On Wed, Nov 21, 2012 at 1:47 AM, Pavel Stehule wrote:
>
>> related to
>> http://archives.postgresql.org/message-id/caauglxwpedfwae6dajmf7sxewfusa0f68p07retbbpf_fsa...@mail.gmail.com
>>
>> * patched and compiled withou warnings
>>
>> * All 133 tests passed.
>>
>>
>> but
>>
>> I don't like
>>
>> * call invalid function from anonymous block - it is messy (regress
>> tests) - there is no reason why do it
>>
>> +create or replace function foo() returns footype as $$
>> +declare
>> +  v record;
>> +  v2 record;
>> +begin
>> +  v := (1, 'hello');
>> +  v2 := (1, 'hello');
>> +  return (v || v2);
>> +end;
>> +$$ language plpgsql;
>> +DO $$
>> +declare
>> +  v footype;
>> +begin
>> +  v := foo();
>> +  raise info 'x = %', v.x;
>> +  raise info 'y = %', v.y;
>> +end; $$;
>> +ERROR:  operator does not exist: record || record
>> +LINE 1: SELECT (v || v2)
>> +  ^
>>
>> * there is some performance issue
>>
>> create or replace function fx2(a int)
>> returns footype as $$
>> declare x footype;
>> begin
>>   x = (10,20);
>>   return x;
>> end;
>> $$ language plpgsql;
>>
>> postgres=# select sum(fx2.x) from generate_series(1,10) g(i),
>> lateral fx2(i);
>>sum
>> -
>>  100
>> (1 row)
>>
>> Time: 497.129 ms
>>
>> returns footype as $$
>> begin
>>   return (10,20);
>> end;
>> $$ language plpgsql;
>>
>> postgres=# select sum(fx2.x) from generate_series(1,10) g(i),
>> lateral fx2(i);
>>sum
>> -
>>  100
>> (1 row)
>>
>> Time: 941.192 ms
>>
>> following code has same functionality and it is faster
>>
>> if (stmt->expr != NULL)
>> {
>> if (estate->retistuple)
>> {
>> TupleDesc   tupdesc;
>> Datum   retval;
>> Oid rettype;
>> boolisnull;
>> int32   tupTypmod;
>> Oid tupType;
>> HeapTupleHeader td;
>> HeapTupleData   tmptup;
>>
>> retval = exec_eval_expr(estate,
>>
>> stmt->expr,
>> &isnull,
>> &rettype);
>>
>> /* Source must be of RECORD or composite type */
>> if (!type_is_rowtype(rettype))
>> ereport(ERROR,
>>
>> (errcode(ERRCODE_DATATYPE_MISMATCH),
>>  errmsg("cannot return
>> non-composite value from composite type returning function")));
>>
>> if (!isnull)
>> {
>> /* Source is a tuple Datum, so safe to
>> do this: */
>> td = DatumGetHeapTupleHeader(retval);
>> /* Extract rowtype info and find a
>> tupdesc */
>> tupType = HeapTupleHeaderGetTypeId(td);
>> tupTypmod = HeapTupleHeaderGetTypMod(td);
>> tupdesc =
>> lookup_rowtype_tupdesc(tupType, tupTypmod);
>>
>> /* Build a temporary HeapTuple control
>> structure */
>> tmptup.t_len =
>> HeapTupleHeaderGetDatumLength(td);
>> ItemPointerSetInvalid(&(tmptup.t_self));
>> tmptup.t_tableOid = InvalidOid;
>> tmptup.t_data = td;
>>
>> estate->retval =
>> PointerGetDatum(heap_copytuple(&tmptup));
>> estate->rettupdesc =
>> CreateTupleDescCopy(tupdesc);
>> ReleaseTupleDesc(tupdesc);
>> }
>>
>> estate->retisnull = isnull;
>> }
>>
>>
>>
>> * it is to restrictive (maybe) - almost all plpgsql' statements does
>> automatic conversions (IO conversions when is necessary)
>>
>> create type footype2 as (a numeric, b varchar)
>>
>> postgres=# create or replace function fx3(a int)
>> returns footype2 as
>> $$
>> begin
>>   return (1000.22234213412342143,'ewqerwqre');
>> end;
>> $$ language plpgsql;
>> CREATE FUNCTION
>> postgres=# select fx3(10);
>> ERROR:  returned record type does not match expected record type
>> DETAIL:  Returned type unknown does not match expected type character
>> varying in column 2.
>> CONTEXT:  PL/pgSQL function fx3(integer) while c