Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites

2011-02-05 Thread Noah Misch
Robert,

Thanks for the obviously thought-out review.

On Sat, Feb 05, 2011 at 01:29:35AM -0500, Robert Haas wrote:
 On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote:
  Done as attached. ?This preserves compatibility with our current handling of
  composite type dependencies. ?The rest you've seen before.
 
 OK, so I took a look at this in more detail today.  The current logic
 for table rewrites looks like this:
 
 1. If we're changing the data type of a column, or adding a column
 with a default, or adding/dropping OIDs, rewrite the table.  Stop.
 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table
 and check constraints.
 3. If we're changing tablespaces, copy the table block-by-block.

Correct.  It's perhaps obvious, but rewriting the table will always reindex it.

 It seems to me that the revised logic needs to look like this:
 
 1. If we're changing the data type of a column and the existing
 contents are not binary coercible to the new contents, or if we're
 adding a column with a default or adding/dropping OIDs, rewrite the
 table.  Stop.
 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table
 and check constraints.

With this patch, step 2 changes changes to Otherwise, if we're adding a
constraint or NOT NULL, or changing a column to a binary-compatible domain with
a domain CHECK constraint, scan the table and check constraints.

 3. If we're changing the data type of a column in the table, reindex the 
 table.

Rebuild indexes that depend on a changing column.  Other indexes can stay.

 4. If we're changing tablespaces, copy the table block-by-block.
 
 I might be missing something, but I don't see that the patch includes
 step #3, which I think is necessary.  For example, citext is binary
 coercible to text, but you can't reuse the index because the
 comparison function is different.  Of course, if you're changing the
 type of a column to its already-current type, you can skip #3, but if
 that's the only case we can optimize, it's not much of an
 accomplishment.  I guess this gets back to the ALTER TYPE 7 patch,
 which I haven't looked at in detail, but I have a feeling it may be
 controversial.

It's there, but it's happening rather implicitly.  ATExecAlterColumnType builds
lists of indexes and constraints that depend on changing columns.  Specifically,
it stashes their OIDs and the SQL to recreate them.  ATPostAlterTypeCleanup
drops those objects by OID, then parses the SQL statements, now based on the
updated table definition.  ATExecAddIndex and ATExecAddConstraint use those
parsed statements to recreate the objects.  The key is the skip_build logic in
ATExecAddIndex: if ATRewriteTables will rewrite the table (and therefore *all*
indexes), we skip the build at that earlier stage to avoid building the same
index twice.  The only thing I had to do was update the skip_build condition so
it continues to mirror the corresponding test in ATRewriteTables.

Originally I had this patch doing a full reindex, with an eye to having the next
patch reduce the scope to dependent indexes.  However, all the infrastructure
was already there, and it actually made this patch smaller to skip directly to
what it does today.

ALTER TYPE 7 additionally skips builds of indexes that depend on a changing
column but can be proven compatible.  So it's in the business of, for example
figuring out that text and varchar are compatible but text and citext are not.

 Another problem here is that if you have to do both #2 and #3, you
 might have been better off (or just as well off) doing #1, unless you
 can somehow jigger things so that the same scan does both the
 constraint checks and the index rebuild.  That doesn't look simple.

We have no such optimization during #1, either, so #2+#3 is never worse.  In
particular, #1 scans the table (# indexes total) + 1 times, while #2+#3 scans it
(# of indexes depending on changed columns) + 1 times.

There are some nice optimization opportunities here, to be sure.  As a specific
first step, teach index_build to create multiple indexes with a single scan,
then have reindex_relation use that.  Probably not simple.  Combining that with
the ATRewriteTable scan would be less simple still.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-general 2011-1-21:] Are there any projects interested in object functionality? (+ rule bases)

2011-02-05 Thread Nick Rudnick

May I sum up?

o   in the recent there are no efforts known to experiment with 
reference types, methods, or rule inference on top of PostgreSQL -- 
advice that can be given mostly points to the given documented functionality


o   inside the PostgreSQL community, there is not many knowledge in 
circulation in regard of performance effects of using deeply nested data 
structures (with the possible exception of XML handling) or doing rule 
inference on top oof PostgreSQL -- but at least, there also are no 
substantial contraindications


o   extensions of PostgreSQL to support such a kind of usage have to be 
expected to be expected to be rejected from integration to the code base 
core -- i.e., if they are done, students have to be told «you can't 
expect this to become a part of PostgreSQL»


Is this understood correctly, especially the last point, or did 
Robert/Tom just specifically address syntactical conflicts (between 
schema and object semantics) with the point notation?


If not, it might be discouraging for lecture, as there might be interest 
to present something which at least might be imagined once to become a 
standard tool.


Otherwise, the striking lack of academical initiatives in the area of OO 
and rule inference on top of PostgreSQL appears to me as a demand to


a. check out academic sources, whether principle efficience issues of 
backend design discourage it so obviously that people do not even try it out


b. if this is not the case, to propose this professor to try to fill the 
gap... ;-) In this case, regarding method semantics extensions, avoiding 
conflicts with existent language constructs certainly will be 
preferable, as these will be small projects.


Cheers, Nick


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 3:05 AM, Noah Misch n...@leadboat.com wrote:
 Correct.  It's perhaps obvious, but rewriting the table will always reindex 
 it.

Right.

 3. If we're changing the data type of a column in the table, reindex the 
 table.

 Rebuild indexes that depend on a changing column.  Other indexes can stay.

Good point.

 4. If we're changing tablespaces, copy the table block-by-block.

 I might be missing something, but I don't see that the patch includes
 step #3, which I think is necessary.  For example, citext is binary
 coercible to text, but you can't reuse the index because the
 comparison function is different.  Of course, if you're changing the
 type of a column to its already-current type, you can skip #3, but if
 that's the only case we can optimize, it's not much of an
 accomplishment.  I guess this gets back to the ALTER TYPE 7 patch,
 which I haven't looked at in detail, but I have a feeling it may be
 controversial.

 It's there, but it's happening rather implicitly.

I see now.  So you're actually not really making any change to that
machinery.  It's sufficient to just skip the rewrite of the heap when
it isn't needed, and without any particular code change the indexes
will sort themselves out.

 We have no such optimization during #1, either, so #2+#3 is never worse.  In
 particular, #1 scans the table (# indexes total) + 1 times, while #2+#3 scans 
 it
 (# of indexes depending on changed columns) + 1 times.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites

2011-02-05 Thread Robert Haas
On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote:
 Done as attached.

Looking at this still more, it appears that independent of any change
this patch may wish to make, there's a live bug here related to the
foreign table patch I committed back in December.  Creating a foreign
table creates an eponymous rowtype, which can be used as a column in a
regular table.  You can then change the data type of a column in the
foreign table, read from the regular table, and crash the server.

The simple fix for this is to just change the code in
ATPrepAlterColumnType to handle the foreign table case also:

if (tab-relkind == RELKIND_COMPOSITE_TYPE)
{
/*
 * For composite types, do this check now.  Tables will check
 * it later when the table is being rewritten.
 */
find_composite_type_dependencies(rel-rd_rel-reltype,
 NULL,
 RelationGetRelationName(rel));
}

But this is a little unsatisfying, for two reasons.  First, the error
message will be subtly wrong: we can make it complain about a table or
a type, but not a foreign table.  At a quick look, it likes the right
fix might be to replace the second and third arguments to
find_composite_type_dependencies() with a Relation.  Second, I wonder
if we shouldn't refactor things so that all the checks fire in
ATRewriteTables() rather than doing them in different places.  Seems
like that might be cleaner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Cédric Villemain
2011/1/19 Robert Haas robertmh...@gmail.com:
 On Wed, Jan 19, 2011 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... So what we
 want to do is write a percentage of them, in a way that guarantees
 that they'll all eventually get written if people continue to access
 the same data.

 The word guarantee seems quite inappropriate here, since as far as I
 can see this approach provides no such guarantee --- even after many
 cycles you'd never be really certain all the bits were set.

 What I asked for upthread was that we continue to have some
 deterministic, practical way to force all hint bits in a table to be
 set.  This is not *remotely* responding to that request.  It's still not
 deterministic, and even if it were, vacuuming a large table 20 times
 isn't a very practical solution.

 I get the impression you haven't spent as much time reading my email
 as I spent writing it.  Perhaps I'm wrong, but in any case the code
 doesn't do what you're suggesting.  In the most recently posted
 version of this patch, which is v2, if VACUUM hits a page that is

Please update the commitfest with the accurate patch, there is only
the old immature v1 of the patch in it.
I was about reviewing it...

https://commitfest.postgresql.org/action/patch_view?id=500

 hint-bit-dirty, it always writes it.  Full stop.  The 20 times bit
 applies to a SELECT * FROM table, which is a rather different case.

 As I write this, I realize that there is a small fly in the ointment
 here, which is that neither VACUUM nor SELECT force out all the pages
 they modify to disk.  So there is some small amount of remaining
 nondeterminism, even if you VACUUM, because VACUUM will leave the last
 few pages it dirties in shared_buffers, and whether those hint bits
 hit the disk will depend on a decision made at the time they're
 evicted, not at the time they were dirtied.  Possibly I could fix that
 by making SetBufferCommitInfoNeedsSave() set BM_DIRTY during vacuum
 and BM_HINT_BITS at other times.  That would nail the lid shut pretty
 tight.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OCLASS_FOREIGN_TABLE support is incomplete

2011-02-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Err... wait.  Actually, I think the right thing to do might be to
 remove OCLASS_FOREIGN_TABLE altogether.  I don't think it's actually
 used for anything.  For a foreign table, we use OCLASS_CLASS, just as
 we do for indexes and sequences.  I think that's just leftover crap
 that I failed to rip out.

OK.  I'll fix it as part of the extensions patch, since it's touching
all those same places anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We need to log aborted autovacuums

2011-02-05 Thread Cédric Villemain
2011/2/4 Robert Haas robertmh...@gmail.com:
 On Sun, Jan 30, 2011 at 10:26 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 30, 2011 at 10:03 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of dom ene 30 23:37:51 -0300 2011:

 Unless I'm missing something, making autovacuum.c call
 ConditionalLockRelationOid() is not going to work, because the vacuum
 transaction isn't started until we get all the way down to
 vacuum_rel().

 Maybe we need ConditionalLockRelationOidForSession or something like
 that?

 That'd be another way to go, if there are objections to what I've
 implemented here.

 Seeing as how there seem to be neither objections nor endorsements,
 I'm inclined to commit what I proposed


what do you implement exactly ?
* The original request from Josh to get LOG when autovac can not run
because of locks
* VACOPT_NOWAIT, what is it ?


more or less as-is.  There
 remains the issue of what do about the log spam.  Josh Berkus
 suggested logging it when log_autovacuum_min_duration != -1, which
 seems like a bit of an abuse of that setting, but it's certainly not
 worth adding another setting for, and the alternative of logging it
 at, say, DEBUG2 seems unappealing because you'll then have to turn on
 logging for a lot of unrelated crap to get this information.  So on
 balance I think that proposal is perhaps the least of evils.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI patch version 14

2011-02-05 Thread Kevin Grittner
Heikki Linnakangas  wrote:
 On 02.02.2011 19:36, Kevin Grittner wrote:
  
 Then there's one still lurking outside the new predicate* files,
 in heapam.c. It's about 475 lines into the heap_update function
 (25 lines from the bottom). In reviewing where we needed to
 acquire predicate locks, this struck me as a place where we might
 need to duplicate the predicate lock from one tuple to another,
 but I wasn't entirely sure. I tried for a few hours one day to
 construct a scenario which would show a serialization anomaly if I
 didn't do this, and failed create one. If someone who understands
 both the patch and heapam.c wants to look at this and offer an
 opinion, I'd be most grateful. I'll take another look and see if I
 can get my head around it better, but failing that, I'm
 disinclined to either add locking I'm not sure we need or to
 remove the comment that says we *might* need it there.

 Have you convinced yourself that there's nothing to do? If so, we
 should replace the todo comment with a comment explaining why.
 
It turns out that nagging doubt from my right-brain was on target.
Here's the simplest example I was able to construct of a false
negative due to the lack of some code there:
 
-- setup
create table t (id int not null, txt text) with (fillfactor=50);
insert into t (id)
  select x from (select * from generate_series(1, 100)) a(x);
alter table t add primary key (id);

-- session 1
-- T1
begin transaction isolation level serializable;
select * from t where id = 100;

-- session 2
-- T2
begin transaction isolation level serializable;
update t set txt = 'x' where id = 100;
commit;
-- T3
begin transaction isolation level serializable;
update t set txt = 'y' where id = 100;
select * from t where id = 50;

-- session 3
-- T4
begin transaction isolation level serializable;
update t set txt = 'z' where id = 50;
select * from t where id = 1;
commit;

-- session 2
commit;

-- session 1
update t set txt = 'a' where id = 1;
commit;

Based on visibility of work, there's a cycle in the apparent order of
execution:
 
T1 - T2 - T3 - T4 - T1

So now that I'm sure we actually do need code there, I'll add it. 
And I'll add the new test to the isolation suite.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OCLASS_FOREIGN_TABLE support is incomplete

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Err... wait.  Actually, I think the right thing to do might be to
 remove OCLASS_FOREIGN_TABLE altogether.  I don't think it's actually
 used for anything.  For a foreign table, we use OCLASS_CLASS, just as
 we do for indexes and sequences.  I think that's just leftover crap
 that I failed to rip out.

 OK.  I'll fix it as part of the extensions patch, since it's touching
 all those same places anyway.

Excellent, thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] is_absolute_path incorrect on Windows

2011-02-05 Thread Bruce Momjian
Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
   I have reviewed is_absolute_path() and have implemented
   path_is_relative_and_below_cwd() to cleanly handle cases like 'E:abc' on
   Win32;  patch attached.
   
   This patch appears to remove some security-critical restrictions.
   Why did you delete the path_contains_parent_reference calls?
  
   They are now in path_is_relative_and_below_cwd(),
  
  ... and thus not invoked in the absolute-path case.  This is a security
  hole.
  
I don't see a general reason to prevent
   .. in absolute paths, only relative ones.
  
  load '/path/to/database/../../../path/to/anywhere'
 
 Ah, good point. I was thinking about someone using .. in the part of
 the path that is compared to /data or /log, but using it after that
 would clearly be a security problem.
 
 I have attached an updated patch that restructures the code to be
 clearer and adds the needed checks.

I decided that my convert_and_check_filename() usage was too intertwined
so I have developed a simplified version that is easier to understand; 
patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/adminpack/adminpack.c b/contrib/adminpack/adminpack.c
index 381554d..c149dd6 100644
*** a/contrib/adminpack/adminpack.c
--- b/contrib/adminpack/adminpack.c
*** convert_and_check_filename(text *arg, bo
*** 73,104 
  
  	canonicalize_path(filename);	/* filename can change length here */
  
- 	/* Disallow .. in the path */
- 	if (path_contains_parent_reference(filename))
- 		ereport(ERROR,
- (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- 			(errmsg(reference to parent directory (\..\) not allowed;
- 
  	if (is_absolute_path(filename))
  	{
! 		/* Allow absolute references within DataDir */
! 		if (path_is_prefix_of_path(DataDir, filename))
! 			return filename;
! 		/* The log directory might be outside our datadir, but allow it */
! 		if (logAllowed 
! 			is_absolute_path(Log_directory) 
! 			path_is_prefix_of_path(Log_directory, filename))
! 			return filename;
! 
! 		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
   (errmsg(absolute path not allowed;
- 		return NULL;			/* keep compiler quiet */
- 	}
- 	else
- 	{
- 		return filename;
  	}
  }
  
  
--- 73,102 
  
  	canonicalize_path(filename);	/* filename can change length here */
  
  	if (is_absolute_path(filename))
  	{
! 		/* Disallow '/a/b/data/..' */
! 		if (path_contains_parent_reference(filename))
! 			ereport(ERROR,
! (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! (errmsg(reference to parent directory (\..\) not allowed;
! 		/*
! 		 *	Allow absolute paths if within DataDir or Log_directory, even
! 		 *	though Log_directory might be outside DataDir.
! 		 */
! 		if (!path_is_prefix_of_path(DataDir, filename) 
! 			(!logAllowed || !is_absolute_path(Log_directory) ||
! 			 !path_is_prefix_of_path(Log_directory, filename)))
! 			ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
   (errmsg(absolute path not allowed;
  	}
+ 	else if (!path_is_relative_and_below_cwd(filename))
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+  (errmsg(path must be in or below the current directory;
+ 
+ 	return filename;
  }
  
  
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 93bc401..bbcddfb 100644
*** a/src/backend/utils/adt/genfile.c
--- b/src/backend/utils/adt/genfile.c
*** convert_and_check_filename(text *arg)
*** 51,81 
  	filename = text_to_cstring(arg);
  	canonicalize_path(filename);	/* filename can change length here */
  
- 	/* Disallow .. in the path */
- 	if (path_contains_parent_reference(filename))
- 		ereport(ERROR,
- (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- 			(errmsg(reference to parent directory (\..\) not allowed;
- 
  	if (is_absolute_path(filename))
  	{
! 		/* Allow absolute references within DataDir */
! 		if (path_is_prefix_of_path(DataDir, filename))
! 			return filename;
! 		/* The log directory might be outside our datadir, but allow it */
! 		if (is_absolute_path(Log_directory) 
! 			path_is_prefix_of_path(Log_directory, filename))
! 			return filename;
! 
! 		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
   (errmsg(absolute path not allowed;
- 		return NULL;			/* keep compiler quiet */
- 	}
- 	else
- 	{
- 		return filename;
  	}
  }
  
  
--- 51,80 
  	filename = text_to_cstring(arg);
  	canonicalize_path(filename);	/* filename can change length here */
  
  	if (is_absolute_path(filename))
  	{
! 		/* Disallow '/a/b/data/..' */
! 		if (path_contains_parent_reference(filename))
! 			ereport(ERROR,
! (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! (errmsg(reference to parent directory 

Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 Please update the commitfest with the accurate patch, there is only
 the old immature v1 of the patch in it.
 I was about reviewing it...

 https://commitfest.postgresql.org/action/patch_view?id=500

Woops, sorry about that.  Here's an updated version, which I will also
add to the CommitFest application.

The need for this patch has been somewhat ameliorated by the fsync
queue compaction patch.  I tested with:

create table s as select g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,100) g;
checkpoint;

The table was large enough not to fit in shared_buffers.  Then, repeatedly:

select sum(1) from s;

At the time I first posted this patch, running against git master, the
first run took about 1600 ms vs. ~207-216 ms for subsequent runs.  But
that was actually running up against the fsync queue problem.
Retesting today, the first run took 360 ms, and subsequent runs took
197-206 ms.  I doubt that the difference in the steady-state is
significant, since the tests were done on different days and not
controlled all that carefully, but clearly the response time spike for
the first scan is far lower than previously.  Setting the log level to
DEBUG1 revealed that the first scan did two fsync queue compactions.

The patch still does help to smooth things out, though.  Here are the
times for one series of selects, with the patch applied, after setting
up as described above:

257.108
259.245
249.181
245.896
250.161
241.559
240.538
241.091
232.727
232.779
232.543
226.265
225.029
222.015
217.106
216.426
217.724
210.604
209.630
203.507
197.521
204.448
196.809

Without the patch, as seen above, the first run is about ~80% slower.
With the patch applied, the first run is about 25% slower than the
steady state, and subsequent scans decline steadily from there.  Runs
21 and following flush no further data and run at full speed.  These
numbers aren't representative of all real-world scenarios, though.
On a system with many concurrent clients, CLOG contention might be an
issue; on the flip side, if this table were larger than RAM (not just
larger than shared_buffers) the decrease in write traffic as we scan
through the table might actually be a more significant benefit than it
is here, where it's mostly a question of kernel time; the I/O system
isn't actually taxed.  So I think this probably needs more testing
before we decide whether or not it's a good idea.

I adopted a few suggestions made previously in this version of the
patch.  Tom Lane recommended not messing with BM_JUST_DIRTY and
leaving that for another day.  I did that.  Also, per my previous
musings, I've adjusted this version so that vacuum behaves differently
when dirtying pages rather than when flushing them.  In versions 1 and
2, vacuum would always write pages that were dirty-only-for-hint-bits
when allocating a new buffer; in this version the buffer allocation
logic is the same for vacuum, but it marks pages dirty even when only
hint bits have changed.  The result is that VACUUM followed by
CHECKPOINT is enough to make sure all hint bits are set on disk, just
as is the case today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5663711..e8f8781 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -209,11 +209,12 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast,
 		CommitTransactionCommand();
 	}
 
-	/* Turn vacuum cost accounting on or off */
+	/* Adjust vacuum cost accounting state and update VacuumActive flag */
 	PG_TRY();
 	{
 		ListCell   *cur;
 
+		VacuumActive = true;
 		VacuumCostActive = (VacuumCostDelay  0);
 		VacuumCostBalance = 0;
 
@@ -254,8 +255,9 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast,
 	}
 	PG_CATCH();
 	{
-		/* Make sure cost accounting is turned off after error */
+		/* Make sure vacuum state variables are fixed up on error */
 		VacuumCostActive = false;
+		VacuumActive = false;
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1f89e52..1146dee 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -81,6 +81,10 @@ static bool IsForInput;
 /* local state for LockBufferForCleanup */
 static volatile BufferDesc *PinCountWaitBuf = NULL;
 
+/* local state for BufferAlloc */
+static int hint_bit_write_allowance;
+static int buffer_allocation_count;
+
 
 static Buffer ReadBuffer_common(SMgrRelation reln, char relpersistence,
   ForkNumber forkNum, BlockNumber blockNum,
@@ -578,6 +582,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	for (;;)
 	{
 		bool		lock_held;
+		bool		write_buffer = false;
 
 		/*
 		 * Select a victim buffer.	The buffer is returned with its header
@@ -600,13 

Re: [HACKERS] We need to log aborted autovacuums

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 12:54 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 what do you implement exactly ?
 * The original request from Josh to get LOG when autovac can not run
 because of locks
 * VACOPT_NOWAIT, what is it ?

What the patch implements is:

If autovacuum can't get the table lock immediately, it skips the
table.  This leaves the table still appearing to need autovacuuming,
so autovacuum will keep retrying (every 1 minute, or whatever you have
autovacuum_naptime set to) until it gets the lock.  This seems clearly
better than the historical behavior of blocking on the lock, which can
potentially keep other tables in the database from getting vacuumed.

In the case where a table is skipped for this reason, we log a message
at log level LOG.  The version of the patch I posted does that
unconditionally, but my intention was to change it before commit so
that it only logs the message if log_autovacuum_min_duration is
something other than -1.

Regular vacuum behaves as before - it waits for each table lock
individually.  We could expose a NOWAIT option at the SQL level as
well, so that someone could do VACOPT (NOWAIT), if that's something
people want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Cédric Villemain
2011/2/5 Robert Haas robertmh...@gmail.com:
 On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 Please update the commitfest with the accurate patch, there is only
 the old immature v1 of the patch in it.
 I was about reviewing it...

 https://commitfest.postgresql.org/action/patch_view?id=500

 Woops, sorry about that.  Here's an updated version, which I will also
 add to the CommitFest application.

 The need for this patch has been somewhat ameliorated by the fsync
 queue compaction patch.  I tested with:

 create table s as select g,
 random()::text||random()::text||random()::text||random()::text from
 generate_series(1,100) g;
 checkpoint;

 The table was large enough not to fit in shared_buffers.  Then, repeatedly:

 select sum(1) from s;

 At the time I first posted this patch, running against git master, the
 first run took about 1600 ms vs. ~207-216 ms for subsequent runs.  But
 that was actually running up against the fsync queue problem.
 Retesting today, the first run took 360 ms, and subsequent runs took
 197-206 ms.  I doubt that the difference in the steady-state is
 significant, since the tests were done on different days and not
 controlled all that carefully, but clearly the response time spike for
 the first scan is far lower than previously.  Setting the log level to
 DEBUG1 revealed that the first scan did two fsync queue compactions.

 The patch still does help to smooth things out, though.  Here are the
 times for one series of selects, with the patch applied, after setting
 up as described above:

 257.108
 259.245
 249.181
 245.896
 250.161
 241.559
 240.538
 241.091
 232.727
 232.779
 232.543
 226.265
 225.029
 222.015
 217.106
 216.426
 217.724
 210.604
 209.630
 203.507
 197.521
 204.448
 196.809

 Without the patch, as seen above, the first run is about ~80% slower.
 With the patch applied, the first run is about 25% slower than the
 steady state, and subsequent scans decline steadily from there.  Runs
 21 and following flush no further data and run at full speed.  These
 numbers aren't representative of all real-world scenarios, though.
 On a system with many concurrent clients, CLOG contention might be an
 issue; on the flip side, if this table were larger than RAM (not just
 larger than shared_buffers) the decrease in write traffic as we scan
 through the table might actually be a more significant benefit than it
 is here, where it's mostly a question of kernel time; the I/O system
 isn't actually taxed.  So I think this probably needs more testing
 before we decide whether or not it's a good idea.

I *may* have an opportunity to test that in a real world application
where this hint bit was an issue.


 I adopted a few suggestions made previously in this version of the
 patch.  Tom Lane recommended not messing with BM_JUST_DIRTY and
 leaving that for another day.

yes, good.

 I did that.  Also, per my previous
 musings, I've adjusted this version so that vacuum behaves differently
 when dirtying pages rather than when flushing them.  In versions 1 and
 2, vacuum would always write pages that were dirty-only-for-hint-bits
 when allocating a new buffer; in this version the buffer allocation
 logic is the same for vacuum, but it marks pages dirty even when only
 hint bits have changed.  The result is that VACUUM followed by
 CHECKPOINT is enough to make sure all hint bits are set on disk, just
 as is the case today.

for now it looks better to reduce this impact, yes..
Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint
bit, right ?

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 3:07 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 So I think this probably needs more testing
 before we decide whether or not it's a good idea.

 I *may* have an opportunity to test that in a real world application
 where this hint bit was an issue.

That would be great.  But note that you'll also need to compare it
against an unpatched 9.1devel; otherwise we won't be able to tell
whether it's this helping, or some other 9.1 patch (particularly, the
fsync compaction patch).

 I did that.  Also, per my previous
 musings, I've adjusted this version so that vacuum behaves differently
 when dirtying pages rather than when flushing them.  In versions 1 and
 2, vacuum would always write pages that were dirty-only-for-hint-bits
 when allocating a new buffer; in this version the buffer allocation
 logic is the same for vacuum, but it marks pages dirty even when only
 hint bits have changed.  The result is that VACUUM followed by
 CHECKPOINT is enough to make sure all hint bits are set on disk, just
 as is the case today.

 for now it looks better to reduce this impact, yes..
 Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint
 bit, right ?

In v1, you'd need to actually dirty the pages, so yeah, VACUUM
(FREEZE) would be pretty much the only way.  In v2, regular VACUUM
would mostly work, except it might miss a smattering of hint bits at
the very end of its scan.  In this version (v3), that's been fixed as
well and now just plain VACUUM should be entirely sufficient.  (The
last few pages examined might not get evicted to disk right away, just
as in the current code, but they're guaranteed to be written
eventually unless a system crash intervenes, again just as in the
current code.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We need to log aborted autovacuums

2011-02-05 Thread Cédric Villemain
2011/2/5 Robert Haas robertmh...@gmail.com:
 On Sat, Feb 5, 2011 at 12:54 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 what do you implement exactly ?
 * The original request from Josh to get LOG when autovac can not run
 because of locks
 * VACOPT_NOWAIT, what is it ?

 What the patch implements is:

 If autovacuum can't get the table lock immediately, it skips the
 table.  This leaves the table still appearing to need autovacuuming,
 so autovacuum will keep retrying (every 1 minute, or whatever you have
 autovacuum_naptime set to) until it gets the lock.  This seems clearly
 better than the historical behavior of blocking on the lock, which can
 potentially keep other tables in the database from getting vacuumed.

great :)


 In the case where a table is skipped for this reason, we log a message
 at log level LOG.  The version of the patch I posted does that
 unconditionally, but my intention was to change it before commit so
 that it only logs the message if log_autovacuum_min_duration is
 something other than -1.

yes looks better, I also note that someone suggest to not add a GUC for that.
I am unsure about that, and adding a parameter for that does not hurt me at all:
reducing number of parameters for memory/performance/... is fine
(well, it is very good when we can), but to improve the log output I
think it is ok to add more without much trouble.


 Regular vacuum behaves as before - it waits for each table lock
 individually.  We could expose a NOWAIT option at the SQL level as
 well, so that someone could do VACOPT (NOWAIT), if that's something
 people want.

Might be useful, yes.

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We need to log aborted autovacuums

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 3:20 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 In the case where a table is skipped for this reason, we log a message
 at log level LOG.  The version of the patch I posted does that
 unconditionally, but my intention was to change it before commit so
 that it only logs the message if log_autovacuum_min_duration is
 something other than -1.

 yes looks better, I also note that someone suggest to not add a GUC for that.

I think a GUC just to control this one message is overkill.  Chances
are that if you're logging some or all of your autovacuum runs you'll
want to have this too, or at least it won't be a major problem to get
it anyway.  If for some reason you want ONLY this message, you can
effectively get that behavior too, but setting
log_autovacuum_min_duration to an enormous value.  So I can't really
imagine why you'd need a separate knob just for this one thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We need to log aborted autovacuums

2011-02-05 Thread Cédric Villemain
2011/2/5 Robert Haas robertmh...@gmail.com:
 On Sat, Feb 5, 2011 at 3:20 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 In the case where a table is skipped for this reason, we log a message
 at log level LOG.  The version of the patch I posted does that
 unconditionally, but my intention was to change it before commit so
 that it only logs the message if log_autovacuum_min_duration is
 something other than -1.

 yes looks better, I also note that someone suggest to not add a GUC for that.

 I think a GUC just to control this one message is overkill.  Chances
 are that if you're logging some or all of your autovacuum runs you'll
 want to have this too, or at least it won't be a major problem to get
 it anyway.  If for some reason you want ONLY this message, you can
 effectively get that behavior too, but setting
 log_autovacuum_min_duration to an enormous value.  So I can't really
 imagine why you'd need a separate knob just for this one thing.

to not output it when I want log_autovac_min_duration = 0 but I know
that some tables do have locks contention and may loose maintenance
for one hour or so.

Anyway, without GUC is fine too as it won't fill the /var/log itself !
I am just not opposed to have new GUC in those areas (log  debug).

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI patch version 14

2011-02-05 Thread Kevin Grittner
Kevin Grittner  wrote:
 
 So now that I'm sure we actually do need code there, I'll add it.
 
In working on this I noticed the apparent need to move two calls to
PredicateLockTuple a little bit to keep them inside the buffer lock. 
Without at least a share lock on the buffer, it seems that here is a
window where a read could miss the MVCC from a write and the write
could fail to see the predicate lock.  Please see whether this seems
reasonable:
 
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=7841a22648c3f4ae46f674d7cf4a7c2673cf9ed2
 
 And I'll add the new test to the isolation suite.
 
We don't need all permutations for this test, which is a good thing
since it has such a long setup time.  Is there an easy way to just
run the one schedule of statements on three connections?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Cédric Villemain
2011/2/5 Robert Haas robertmh...@gmail.com:
 On Sat, Feb 5, 2011 at 3:07 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 So I think this probably needs more testing
 before we decide whether or not it's a good idea.

 I *may* have an opportunity to test that in a real world application
 where this hint bit was an issue.

 That would be great.  But note that you'll also need to compare it
 against an unpatched 9.1devel; otherwise we won't be able to tell
 whether it's this helping, or some other 9.1 patch (particularly, the
 fsync compaction patch).

mmhh, sure.


 I did that.  Also, per my previous
 musings, I've adjusted this version so that vacuum behaves differently
 when dirtying pages rather than when flushing them.  In versions 1 and
 2, vacuum would always write pages that were dirty-only-for-hint-bits
 when allocating a new buffer; in this version the buffer allocation
 logic is the same for vacuum, but it marks pages dirty even when only
 hint bits have changed.  The result is that VACUUM followed by
 CHECKPOINT is enough to make sure all hint bits are set on disk, just
 as is the case today.

 for now it looks better to reduce this impact, yes..
 Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint
 bit, right ?

 In v1, you'd need to actually dirty the pages, so yeah, VACUUM
 (FREEZE) would be pretty much the only way.  In v2, regular VACUUM
 would mostly work, except it might miss a smattering of hint bits at
 the very end of its scan.  In this version (v3), that's been fixed as
 well and now just plain VACUUM should be entirely sufficient.  (The
 last few pages examined might not get evicted to disk right away, just
 as in the current code, but they're guaranteed to be written
 eventually unless a system crash intervenes, again just as in the
 current code.)


just reading the patch...
I understand the idea of the 5% flush.
*maybe* it make sense to use effective_io_concurrency GUC here to
improve the ratio, but it might be perceived as a bad usage ..
currently effective_io_concurrency is for planning purpose.


-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Bruce Momjian
Robert Haas wrote:
 On Sat, Feb 5, 2011 at 10:37 AM, C?dric Villemain
 cedric.villemain.deb...@gmail.com wrote:
  Please update the commitfest with the accurate patch, there is only
  the old immature v1 of the patch in it.
  I was about reviewing it...
 
  https://commitfest.postgresql.org/action/patch_view?id=500
 
 Woops, sorry about that.  Here's an updated version, which I will also
 add to the CommitFest application.
 
 The need for this patch has been somewhat ameliorated by the fsync
 queue compaction patch.  I tested with:

Uh, in this C comment:

+* or not we want to take the time to write it.  We allow up to 5% of
+* otherwise-not-dirty pages to be written due to hint bit changes,

5% of what?  5% of all buffers?  5% of all hint-bit-dirty ones?  Can you
clarify this in the patch?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 4:19 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 just reading the patch...
 I understand the idea of the 5% flush.
 *maybe* it make sense to use effective_io_concurrency GUC here to
 improve the ratio, but it might be perceived as a bad usage ..
 currently effective_io_concurrency is for planning purpose.

effective_io_concurrency is supposed to be set based on how many
spindles your RAID array has.  There's no reason to think that the
correct flush percentage is in any way related to that value. The
reason why we might not want backends to write out too many
dirty-only-for-hint-bits buffers during a large sequential scan are
that (a) the actual write() system calls take time to copy the buffers
into kernel space, slowing the scan, and (b) flushing too many buffers
this way could lead to I/O spikes.  Increasing the flush percentage
slows down the first few scans, but takes fewer scans to reach optimal
performance (all hit bits set on disk).  Decreasing the flush
percentage speeds up the first few scans, but is overall less
efficient.

We could make this a tunable, but I'm not clear that there is much
point.  If writing 100% of the pages that have only hint-bit updates
slows the scan by 80% and writing 5% of the pages slows the scan by
25%, then dropping below 5% doesn't seem likely to buy much further
improvement.  You could argue for raising the flush percentage above
5%, but if you go too much higher then it's not clear that you're
gaining anything over just flushing them all.  I don't think we
necessarily have enough experience to know whether this is a good idea
at all, so worrying about whether different people need different
percentages seems a bit premature.

Another point here is that no matter how many times you
sequential-scan the table, you never get performance as good as what
you would get if you vacuumed it, even if the table contains no dead
tuples.  I believe this is because VACUUM will not only set the
HEAP_XMIN_COMMITTED hint bits; it'll also set PD_ALL_VISIBLE on the
page.  I wonder if we shouldn't be autovacuuming even tables that are
insert-only for precisely this reason, as well as to prevent the case
where someone inserts small batches of records for a long time and
then finally deletes some stuff.  There are no visibility map bits set
so, boom, you get this huge, expensive vacuum.  This will, of course,
be even more of an issue when we get index-only scans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian br...@momjian.us wrote:
 Uh, in this C comment:

 +        * or not we want to take the time to write it.  We allow up to 5% of
 +        * otherwise-not-dirty pages to be written due to hint bit changes,

 5% of what?  5% of all buffers?  5% of all hint-bit-dirty ones?  Can you
 clarify this in the patch?

5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
that's exactly what the comment you just quoted says on its face, but
I'm open to some other wording you want to propose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Varchar and binary protocol

2011-02-05 Thread Radosław Smogura
Hi,

I do performance tests against orignal JDBC driver and my version in binary 
and in text mode. I saw strange results when I was reading varchar values.
Here is some output from simple benchmark

Plain strings speed   Execution: 8316582, local: 2116608, all: 
10433190
Binary strings speed  Execution: 9354613, local: 2755949, all: 
12110562
Text NG strings speed Execution: 8346902, local: 2704242, all: 
11051144

Plain is standard JDBC driver, Binary is my version with binary transfer, Text 
is my version with normal transfer. 1st column, Execution is time spend on 
query execution this includes send, recivie proto message, store it, etc, no 
conversion to output format. Values are in nanoseconds.

In new version I added some functionality, but routines to read parts in 
Execution block are almost same for binary and text.

But as you see the binary version is 10-20% slower then orginal, and my text 
version, if I increase number of read records this proportion will not change. 
I done many checks, against even skip proto message content driver, end 
results was same 10-20% slower.

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Cédric Villemain
2011/2/5 Bruce Momjian br...@momjian.us:
 Robert Haas wrote:
 On Sat, Feb 5, 2011 at 10:37 AM, C?dric Villemain
 cedric.villemain.deb...@gmail.com wrote:
  Please update the commitfest with the accurate patch, there is only
  the old immature v1 of the patch in it.
  I was about reviewing it...
 
  https://commitfest.postgresql.org/action/patch_view?id=500

 Woops, sorry about that.  Here's an updated version, which I will also
 add to the CommitFest application.

 The need for this patch has been somewhat ameliorated by the fsync
 queue compaction patch.  I tested with:

 Uh, in this C comment:

 +        * or not we want to take the time to write it.  We allow up to 5% of
 +        * otherwise-not-dirty pages to be written due to hint bit changes,

 5% of what?  5% of all buffers?  5% of all hint-bit-dirty ones?  Can you
 clarify this in the patch?


The patch currently allow 100 buffers to be written consecutively each
2000 BufferAlloc.
mmmhhh

Robert, I am unsure with the hint_bit_write_allowance counter. It
looks a bit fragile because
nothing prevent  hint_bit_write_allowance counter to increase a lot,
so that is not 100 but X*100 next hint bit will be written. Isn't it ?

Also, won't buffer_allocation_count hit INT limit ?

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Foreign servers and user mappings versus the extensions patch

2011-02-05 Thread Tom Lane
Currently, the extensions patch considers that foreign data wrappers,
foreign servers, and user mapping objects can all be parts of extensions.
This is slightly problematic for pg_dump, where somebody decided to take
a shortcut and not implement user mappings using the full DumpableObject
infrastructure.  That could be fixed, but I'm wondering whether it's
worth the trouble.  I can see the point of writing an FDW as an
extension but it's a lot harder to see why either foreign server or user
mapping objects would ever be part of an extension.  So it might just be
best to remove those two object types from the set that can be managed
by an extension.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-02-05 Thread Noah Misch
On Mon, Jan 17, 2011 at 06:33:08PM -0600, Kevin Grittner wrote:
 Itagaki Takahiro  wrote:
  Shigeru HANADA  wrote:
  
  Attached patch would avoid this leak by adding per-copy context to
  CopyState. This would be overkill, and ResetCopyFrom() might be
  reasonable though.
 
  Good catch. I merged your fix into the attached patch.
  
 Review for CF:
...
 I tried to read through the code to look for problems, but there are
 so many structures and techniques involved that I haven't had to deal
 with yet, that it would take me days to get up to speed enough to
 desk check this adequately.  Since this API is a prerequisite for
 other patches, and already being used by them, I figured I should do
 what I could quickly and then worry about how to cover that.

I've studied the patch from this angle.  My two substantive questions concern
CopyFromErrorCallback() error strings and the use of the per-tuple memory
context; see below.  The rest either entail trivial fixes, or they do not
indicate anything that clearly ought to change.


The patch mostly moves code around; it's a bit difficult to grasp what's really
changing by looking at the diff (not a criticism of the approach -- I see no way
to avoid that).  The final code structure is at least as easy to follow as the
structure it replaces.  Here is a summary of the changes:

  file_fdw wishes to borrow the COPY FROM code for parsing structured text and
  building tuples fitting a given relation.  file_fdw must bypass the parts that
  insert those tuples.  Until now, copy.c has exposed a single API, DoCopy().
  To meet file_fdw's needs, the patch adds four new APIs for use as a set:

BeginCopyFrom() - once-per-COPY initialization and validation
NextCopyFrom() - call N times to get N values/null arrays
EndCopyFrom() - once-per-COPY cleanup
CopyFromErrorCallback() - for caller use in ErrorContextCallback.callback

  Implementing these entails breaking apart the existing code structure.  For
  one, the patch separates from DoCopy the once-per-COPY-statement code, both
  initialization and cleanup.  Secondly, it separates the CopyFrom code for
  assembling a tuple based on COPY input from the code that actually stores said
  tuples in the target relation.  To avoid duplicating code, the patch then
  calls these new functions from DoCopy and CopyFrom.  To further avoid
  duplicating code and to retain symmetry, it refactors COPY TO setup and
  teardown along similar lines.  We have four new static functions:

BeginCopyTo() - parallel to BeginCopyFrom(), for DoCopy() alone
BeginCopy() - shared bits between BeginCopyTo() and BeginCopyFrom()
EndCopyTo() - parallel to EndCopyFrom(), for DoCopy() alone
EndCopy() - shared bits between EndCopyTo() and EndCopyFrom()

  Most parse analysis-type bits of DoCopy() move to BeginCopy().  Several
  checks remain in DoCopy(): superuser for reading a server-local file,
  permission on the relation, and a read-only transaction check.  The first
  stays where it does because a superuser may define a file_fdw foreign table
  and then grant access to others.  The other two remain in DoCopy() because the
  new API uses the relation as a template without reading or writing it.

  The catalog work (looking up defaults, I/O functions, etc) in CopyFrom() moves
  to BeginCopyFrom(), and applicable local variables become CopyState members.

  copy_in_error_callback changes name to CopyFromErrorCallback() to better fit
  with the rest of the new API.  Its implementation does not change at all.

  Since it's now possible for one SQL statement to call into the COPY machinery
  an arbitrary number of times, the patch introduces a per-COPY memory context.

The patch implements added refactorings that make sense but are peripheral to
the API change at hand.  CopyState.fe_copy is gone, now calculated locally in
DoCopyTo().  CopyState.processed is gone, with the row count now bubbling up
through return values.  We now initialize the CopyState buffers for COPY FROM
only; they are unused in COPY TO.  The purest of patches would defer all these,
but I don't object to them tagging along.

For COPY TO, the relkind checks move from DoCopyTo() to BeginCopyTo().  I'm
skeptical about this one.  It's not required for correctness, and the relkind
checks for COPY FROM remain in CopyFrom().

file_fdw uses CopyFromErrorCallback() to give errors the proper context.  The
function uses template strings like COPY %s, line %d, where %s is the name of
the relation being copied.  Presumably file_fdw and other features using this
API would wish to customize that error message prefix, and the relation name
might not be apropos at all.  How about another argument to BeginCopyFrom,
specifying an error prefix to be stashed in the CopyState?

We could easily regret requiring a Relation in BeginCopyFrom(); another user may
wish to use a fabricated TupleDesc.  Code paths in the new API use the Relation
for a TupleDesc, relhashoids, column defaults, 

Re: [HACKERS] Foreign servers and user mappings versus the extensions patch

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 5:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Currently, the extensions patch considers that foreign data wrappers,
 foreign servers, and user mapping objects can all be parts of extensions.
 This is slightly problematic for pg_dump, where somebody decided to take
 a shortcut and not implement user mappings using the full DumpableObject
 infrastructure.  That could be fixed, but I'm wondering whether it's
 worth the trouble.  I can see the point of writing an FDW as an
 extension but it's a lot harder to see why either foreign server or user
 mapping objects would ever be part of an extension.  So it might just be
 best to remove those two object types from the set that can be managed
 by an extension.

 Comments?

I agree it's probably not that useful to make a foreign server or
foreign user mapping part of an extension, but I'd rather not have us
fail to support it just because we can't think of a use case right
now.  So my vote would be to fix it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limiting hint bit I/O

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 5:04 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 Robert, I am unsure with the hint_bit_write_allowance counter. It
 looks a bit fragile because
 nothing prevent  hint_bit_write_allowance counter to increase a lot,
 so that is not 100 but X*100 next hint bit will be written. Isn't it ?

hint_bit_write_allowance can never be more than 100.  The only things
we ever do are set it to exactly 100, and decrease it by 1 if it's
positive.

 Also, won't buffer_allocation_count hit INT limit ?

Sure, if the backend sticks around long enough, but it's no big deal
if it overflows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites

2011-02-05 Thread Noah Misch
On Sat, Feb 05, 2011 at 10:03:59AM -0500, Robert Haas wrote:
 Looking at this still more, it appears that independent of any change
 this patch may wish to make, there's a live bug here related to the
 foreign table patch I committed back in December.  Creating a foreign
 table creates an eponymous rowtype, which can be used as a column in a
 regular table.  You can then change the data type of a column in the
 foreign table, read from the regular table, and crash the server.
 
 The simple fix for this is to just change the code in
 ATPrepAlterColumnType to handle the foreign table case also:
 
 if (tab-relkind == RELKIND_COMPOSITE_TYPE)
 {
 /*
  * For composite types, do this check now.  Tables will check
  * it later when the table is being rewritten.
  */
 find_composite_type_dependencies(rel-rd_rel-reltype,
  NULL,
  RelationGetRelationName(rel));
 }
 
 But this is a little unsatisfying, for two reasons.  First, the error
 message will be subtly wrong: we can make it complain about a table or
 a type, but not a foreign table.  At a quick look, it likes the right
 fix might be to replace the second and third arguments to
 find_composite_type_dependencies() with a Relation.

Seems like a clear improvement.

 Second, I wonder
 if we shouldn't refactor things so that all the checks fire in
 ATRewriteTables() rather than doing them in different places.  Seems
 like that might be cleaner.

Offhand, this seems reasonable, too.  I assumed there was some good reason it
couldn't be done there for non-tables, but nothing comes to mind.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Include more status information in walsender results

2011-02-05 Thread Robert Haas
On Thu, Feb 3, 2011 at 7:56 AM, Magnus Hagander mag...@hagander.net wrote:
 Include more status information in walsender results

 Add the current xlog insert location to the response of
 IDENTIFY_SYSTEM, and adds result sets containing start
 and stop location of backups to BASE_BACKUP responses.

Does this have anything to do with the following error message I'm now
getting when trying to use streaming replication?

FATAL:  invalid response from primary server
DETAIL:  Expected 1 tuple with 2 fields, got 1 tuples with 3 fields.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI patch version 14

2011-02-05 Thread Kevin Grittner
Kevin Grittner  wrote:
 
 So now that I'm sure we actually do need code there, I'll add it.
 
Hmmm...  First cut at this here:
 
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=3ccedeb451ee74ee78ef70735782f3550b621eeb
 
It passes all the usual regression tests, the new isolation tests,
and the example posted earlier in the thread of a test case which was
allowing an anomaly.  (That is to say, the anomaly is now prevented.)
 
I didn't get timings, but it *seems* noticeably slower; hopefully
that's either subjective or fixable.  Any feedback on whether this
seems a sane approach to the issue is welcome.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Include more status information in walsender results

2011-02-05 Thread Magnus Hagander
On Sun, Feb 6, 2011 at 01:54, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Feb 3, 2011 at 7:56 AM, Magnus Hagander mag...@hagander.net wrote:
 Include more status information in walsender results

 Add the current xlog insert location to the response of
 IDENTIFY_SYSTEM, and adds result sets containing start
 and stop location of backups to BASE_BACKUP responses.

 Does this have anything to do with the following error message I'm now
 getting when trying to use streaming replication?

 FATAL:  invalid response from primary server
 DETAIL:  Expected 1 tuple with 2 fields, got 1 tuples with 3 fields.

Yes. I believe I forgot to git add that one file that was off hiding
in another directory.

Thanks, fix applied.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Range Types

2011-02-05 Thread Jeff Davis
New patch. All known TODO items are closed, although I should do a
cleanup pass over the code and docs.

Fixed in this patch:

  * Many documentation improvements
  * Added INT8RANGE
  * Renamed PERIOD[TZ] - TS[TZ]RANGE
  * Renamed INTRANGE - INT4RANGE
  * Improved parser's handling of whitespace and quotes
  * Support for PL/pgSQL functions with ANYRANGE arguments/returns
  * Make subtype_float function no longer a requirement for GiST,
but it should still be supplied for the penalty function to be
useful.

There are some open issues remaining, however:

  * Is typmod worth doing? I could complete it pretty quickly, but it
involves introducing a new Node type, which seems a little messy for the
benefit.

  * Should pg_range reference the btree opclass or the compare function
directly?

  * Should subtype_cmp default to the default btree opclass's compare
function?

  * Right now only superusers can define a range type. Should we open it
up to normal users?

  * Should the SQL (inlinable) function length, which relies on
polymorphic -, be immutable, strict, or volatile?

  * Later we might consider whether we should include btree_gist in
core, to make range types more useful with exclusion constraints
out-of-the-box. This should be left for later, I'm just including this
for the archives so it doesn't get lost.

Regards,
Jeff Davis


rangetypes-20110205.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites

2011-02-05 Thread Robert Haas
On Sat, Feb 5, 2011 at 7:44 PM, Noah Misch n...@leadboat.com wrote:
 But this is a little unsatisfying, for two reasons.  First, the error
 message will be subtly wrong: we can make it complain about a table or
 a type, but not a foreign table.  At a quick look, it likes the right
 fix might be to replace the second and third arguments to
 find_composite_type_dependencies() with a Relation.

 Seems like a clear improvement.

That didn't quite work, because there's a caller in typecmds.c that
doesn't have the relation handy.  So I made it take a relkind and a
name, which works fine.

 Second, I wonder
 if we shouldn't refactor things so that all the checks fire in
 ATRewriteTables() rather than doing them in different places.  Seems
 like that might be cleaner.

 Offhand, this seems reasonable, too.  I assumed there was some good reason it
 couldn't be done there for non-tables, but nothing comes to mind.

Actually, thinking about this more, I'm thinking if we're going to
change anything, it seems we ought to go the other way.  If we ever
actually did support recursing into wherever the composite type
dependencies take us, we'd want to detect that before phase 3 and add
work-queue items for each table that we needed to futz with.

The attached patch takes this approach.  It's very slightly more code,
but it reduces the amount of spooky action at a distance.  The
existing coding is basically relying on the assumption that operations
which require finding composite type dependencies also require a table
rewrite.  That was probably true at one point in time, but it's not
really quite right.  It already requires compensating code foreign
tables and composite types (which can require this treatment even
though there's nothing to rewrite) and your proposed changes to avoid
table rewrites in cases where a type is changed to a compatible type
would break it in the opposite direction (the check would still be
needed even if the parent table doesn't need a rewrite, because the
rowtype could be indexed in some fashion that depends on the type of
the column being changed).

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6a17399..83f8be0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3378,23 +3378,6 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	}
 
 	/*
-	 * If we change column data types or add/remove OIDs, the operation has to
-	 * be propagated to tables that use this table's rowtype as a column type.
-	 * newrel will also be non-NULL in the case where we're adding a column
-	 * with a default.  We choose to forbid that case as well, since composite
-	 * types might eventually support defaults.
-	 *
-	 * (Eventually we'll probably need to check for composite type
-	 * dependencies even when we're just scanning the table without a rewrite,
-	 * but at the moment a composite type does not enforce any constraints,
-	 * so it's not necessary/appropriate to enforce them just during ALTER.)
-	 */
-	if (newrel)
-		find_composite_type_dependencies(oldrel-rd_rel-reltype,
-		 oldrel-rd_rel-relkind,
-		 RelationGetRelationName(oldrel));
-
-	/*
 	 * Generate the constraint and default execution states
 	 */
 
@@ -4055,7 +4038,7 @@ ATExecAddColumn(AlteredTableInfo *tab, Relation rel,
 	Oid			typeOid;
 	int32		typmod;
 	Form_pg_type tform;
-	Expr	   *defval;
+	Expr	   *defval = NULL;
 
 	attrdesc = heap_open(AttributeRelationId, RowExclusiveLock);
 
@@ -4304,6 +4287,20 @@ ATExecAddColumn(AlteredTableInfo *tab, Relation rel,
 		tab-new_changeoids = true;
 
 	/*
+	 * If we're adding an OID column, the operation has to be propagated to
+	 * tables that use this table's rowtype as a column type.  But we don't
+	 * currently have the infrastructure for that, so just throw an error.
+	 * We also forbid the case where we're adding a column with a default, since
+	 * composite types might eventually support defaults, and ALTER TABLE ..
+	 * ADD COLUMN .. DEFAULT would be expected to initialize the newly added
+	 * column to the default in each instantiation of the rowtype.
+	 */
+	if (isOid || defval)
+		find_composite_type_dependencies(rel-rd_rel-reltype,
+		 rel-rd_rel-relkind,
+		 RelationGetRelationName(rel));
+
+	/*
 	 * Add needed dependency entries for the new column.
 	 */
 	add_column_datatype_dependency(myrelid, newattnum, attribute.atttypid);
@@ -4975,6 +4972,16 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
 
 		/* Tell Phase 3 to physically remove the OID column */
 		tab-new_changeoids = true;
+
+		/*
+		 * The OID removal operation needs to be propagated to tables that use
+		 * this table's rowtype as a column type.  But we don't currently have
+		 * the infrastructure for that, so just throw an error if it would be
+		 * required.
+		 */
+		

Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)

2011-02-05 Thread Greg Smith

Tomas Vondra wrote:

Because when I create a database, the field is
NULL - that's true. But once I connect to the database, the stats are
updated and the field is set (thanks to the logic in pgstat.c).
  


OK--so it does what I was hoping for, I just didn't test it the right 
way.  Let's call that a documentation issue and move on.


Attached is an updated patch that fixes the docs and some other random 
bits.  Looks ready for committer to me now.  Make sure to adjust 
PGSTAT_FILE_FORMAT_ID, do a cat version bump, and set final OIDs for the 
new functions.


Below is what changed since the last posted version, mainly as feedback 
for Tomas:


-Explained more clearly that pg_stat_reset and 
pg_stat_reset_single_counters will both touch the database reset time, 
and that it's initialized upon first connection to the database.


-Added the reset time to the list of fields in pg_stat_database and 
pg_stat_bgwriter.


-Fixed some tab/whitespace issues.  It looks like you had tab stops set 
at 8 characters during some points when you were editing non-code 
files.  Also, there were a couple of spot where you used a tab while 
text in the area used spaces.  You can normally see both types of errors 
if you read a patch, they showed up as misaligned things in the context 
diff.


-Removed some extra blank lines that didn't fit the style of the 
surrounding code.


Basically, all the formatting bits I'm nitpicking about I found just by 
reading the patch itself; they all stuck right out.  I'd recommend a 
pass of that before submitting things if you want to try and avoid those 
in the future.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ca83421..100f938 100644
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
*** postgres: replaceableuser/ replacea
*** 267,273 
by backends (that is, not by the background writer), how many times
those backends had to execute their own fsync calls (normally the
background writer handles those even when the backend does its own
!   write), and total buffers allocated.
   /entry
   /row
  
--- 267,273 
by backends (that is, not by the background writer), how many times
those backends had to execute their own fsync calls (normally the
background writer handles those even when the backend does its own
!   write), total buffers allocated, and time of last statistics reset.
   /entry
   /row
  
*** postgres: replaceableuser/ replacea
*** 278,286 
number of transactions committed and rolled back in that database,
total disk blocks read, total buffer hits (i.e., block
read requests avoided by finding the block already in buffer cache),
!   number of rows returned, fetched, inserted, updated and deleted, and
total number of queries cancelled due to conflict with recovery (on
!   standby servers).
   /entry
   /row
  
--- 278,286 
number of transactions committed and rolled back in that database,
total disk blocks read, total buffer hits (i.e., block
read requests avoided by finding the block already in buffer cache),
!   number of rows returned, fetched, inserted, updated and deleted, the
total number of queries cancelled due to conflict with recovery (on
!   standby servers), and time of last statistics reset.
   /entry
   /row
  
*** postgres: replaceableuser/ replacea
*** 663,668 
--- 663,681 
   /row
  
   row
+   entryliteralfunctionpg_stat_get_db_stat_reset_time/function(typeoid/type)/literal/entry
+   entrytypetimestamptz/type/entry
+   entry
+Time of the last statistics reset for the database.  Initialized to the
+system time during the first connection to each database.  The reset time
+is updated when you call functionpg_stat_reset/function on the
+database, as well as upon execution of
+functionpg_stat_reset_single_table_counters/function against any
+table or index in it.
+   /entry
+  /row
+ 
+  row
entryliteralfunctionpg_stat_get_numscans/function(typeoid/type)/literal/entry
entrytypebigint/type/entry
entry
*** postgres: replaceableuser/ replacea
*** 1125,1130 
--- 1138,1153 
 varnamebgwriter_lru_maxpages/varname parameter
/entry
   /row
+  
+  row
+   entryliteralfunctionpg_stat_get_bgwriter_stat_reset_time()/function/literal/entry
+   entrytypetimestamptz/type/entry
+   entry
+ Time of the last statistics reset for the background writer, updated
+ when executing functionpg_stat_reset_shared('bgwriter')/function
+ 

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-02-05 Thread Alex Hunsaker
On Thu, Feb 3, 2011 at 18:29, Alex Hunsaker bada...@gmail.com wrote:
 On Mon, Jan 31, 2011 at 01:34, Alexey Klyukin al...@commandprompt.com wrote:
 I've looked at the patch and added a test for arrays exceeding or equal 
 maximum dimensions to check, whether the recursive function won't bring 
 surprises there. I've also added check_stack_depth calls to both split_array 
 and plperl_hash_from_tuple. Note that the regression fails currently due to 
 the incorrect error reporting in
 PostgreSQL, per 
 http://archives.postgresql.org/pgsql-hackers/2011-01/msg02888.php.

 The v5 is attached.

 One thing I find odd is we allow for nested arrays, but not nested
 composites.  For example:
...
 On the other end, the same is true when returning. If you try to
 return a nested composite or array, the child better be a string as it
 wont let you pass a hash:

So here is a v6 that does the above. It does so by providing a generic
plperl_sv_to_datum() function and converting the various places to use
it. One cool thing is you can now use the composite types or arrays
passed in (or returned from another spi!) as arguments for
spi_exec_prepared(). Before you would have to convert the hashes to
strings. It also provides a real plperl_convert_to_pg_array (now
plperl_array_to_datum) that can handle composite and other random
datatypes. This also means we don't have to convert arrays to a string
representation first. (which removes part of the argument for #5 @
http://archives.postgresql.org/pgsql-hackers/2011-02/msg00197.php)

I have attached a stand alone version of the above so its easier to
look over. The diffstat is fairly small (ignoring added regression
tests)
 1 files changed, 259 insertions(+), 165 deletions(-)

Comments?


plperl_uniform_inout.patch.gz
Description: GNU Zip compressed data


pg_to_perl_arrays_v6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers