Re: [HACKERS] JSON for PG 9.2

2012-01-22 Thread Andrew Dunstan



On 01/21/2012 11:40 PM, Jeff Janes wrote:

On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstanand...@dunslane.net  wrote:


On 01/14/2012 03:06 PM, Andrew Dunstan wrote:




OK, here's a patch that does both query_to_json and array_to_json, along
with docs and regression tests. It include Robert's original patch, although
I can produce a differential patch if required. It can also be pulled from
https://bitbucket.org/adunstan/pgdevel




Here's an update that adds row_to_json, plus a bit more cleanup.

This is bit-rotted such that initdb fails

creating template1 database in
/tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL:  could
not create unique index pg_proc_oid_index
DETAIL:  Key (oid)=(3145) is duplicated.

I bumped up those oids in the patch, and it passes make check once I
figure out how to get the test run under UTF-8.  Is it supposed to
pass under other encodings?  I can't tell from the rest of thread
whether it supposed to pass in other encodings or not.



Yeah, regression tests generally are supposed to run in all encodings. 
Either we could knock out the offending test, or we could supply an 
alternative result file. If we do the latter, maybe we should modify the 
query slightly, so it reads


   SELECT 'getdatabaseencoding() = 'UTF8' as is_utf8, \uaBcD'::json;

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Fix Leaky View Problem

2012-01-22 Thread Jeff Janes
On Tue, Jan 17, 2012 at 7:08 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 8, 2012 at 10:32 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I guess you concerned about that expected/select_views_1.out is
 patched, not expected/select_views.out.
 I'm not sure the reason why regression test script tries to make diff
 between results/select_views and expected/select_views_1.out.

 select_views.out and select_views_1.out are alternate expected output
 files.  The regression tests pass if the actual output matches either
 one.  Thus, you have to patch both.

 It was new for me. The attached patch updates both of the expected
 files, however, I'm not certain whether select_view.out is suitable, or
 not, because my results/select_view.out matched with 
 expected/select_view_1.out.

 Committed.  We'll see what the buildfarm thinks.

This passes installcheck initially.  Then upon second invocation of
installcheck, it fails.

It creates the role alice, and doesn't clean it up.  On next
invocation alice already exists and cases a failure in test
select_views.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Fix Leaky View Problem

2012-01-22 Thread Kohei KaiGai
2012/1/21 Jeff Janes jeff.ja...@gmail.com:
 On Tue, Jan 17, 2012 at 7:08 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 8, 2012 at 10:32 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I guess you concerned about that expected/select_views_1.out is
 patched, not expected/select_views.out.
 I'm not sure the reason why regression test script tries to make diff
 between results/select_views and expected/select_views_1.out.

 select_views.out and select_views_1.out are alternate expected output
 files.  The regression tests pass if the actual output matches either
 one.  Thus, you have to patch both.

 It was new for me. The attached patch updates both of the expected
 files, however, I'm not certain whether select_view.out is suitable, or
 not, because my results/select_view.out matched with 
 expected/select_view_1.out.

 Committed.  We'll see what the buildfarm thinks.

 This passes installcheck initially.  Then upon second invocation of
 installcheck, it fails.

 It creates the role alice, and doesn't clean it up.  On next
 invocation alice already exists and cases a failure in test
 select_views.

Thanks for your pointing out.

The attached patch adds cleaning-up part of object being defined
within this test;
includes user alice.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


pgsql-v9.2-fix-regtest-select-views-cleanup.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PG-Strom - A GPU optimized asynchronous executor module

2012-01-22 Thread Kohei KaiGai
Hi,

I tried to implement a fdw module that is designed to utilize GPU
devices to execute
qualifiers of sequential-scan on foreign tables managed by this module.

It was named PG-Strom, and the following wikipage gives a brief
overview of this module.
http://wiki.postgresql.org/wiki/PGStrom

In our measurement, it achieves about x10 times faster on
sequential-scan with complex-
qualifiers, of course, it quite depends on type of workloads.

Example)
A query counts number of records with (x,y) located within a particular range.
A regular table 'rtbl' and foreign table 'ftbl' contains same
contents; with 10 million of records.

postgres=# SELECT count(*) FROM rtbl WHERE sqrt((x-25.6)^2 + (y-12.8)^2)  51.2;
 count
---
 43134
(1 row)

Time: 10537.069 ms

postgres=# SELECT count(*) FROM ftbl WHERE sqrt((x-25.6)^2 + (y-12.8)^2)  51.2;
 count
---
 43134
(1 row)

Time: 744.252 ms

(*) Let's see the How to use section of the wikipage to reproduce my testcase.

It seems to me quite good result. However, I doubt myself whether the case of
sequential-scan on regular table was not tuned appropriately.
Could you tell me some hint to tune up sequential scan on large tables?
All I did on the test case is expansion of shared_buffers to 1024MB that is
enough to load whole of the example tables on memory.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-22 Thread Andrew Dunstan



On 01/22/2012 04:28 AM, Andrew Dunstan wrote:



On 01/21/2012 11:40 PM, Jeff Janes wrote:
On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstanand...@dunslane.net  
wrote:


On 01/14/2012 03:06 PM, Andrew Dunstan wrote:




OK, here's a patch that does both query_to_json and array_to_json, 
along
with docs and regression tests. It include Robert's original patch, 
although
I can produce a differential patch if required. It can also be 
pulled from

https://bitbucket.org/adunstan/pgdevel




Here's an update that adds row_to_json, plus a bit more cleanup.

This is bit-rotted such that initdb fails

creating template1 database in
/tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL:  could
not create unique index pg_proc_oid_index
DETAIL:  Key (oid)=(3145) is duplicated.

I bumped up those oids in the patch, and it passes make check once I
figure out how to get the test run under UTF-8.  Is it supposed to
pass under other encodings?  I can't tell from the rest of thread
whether it supposed to pass in other encodings or not.



Yeah, regression tests generally are supposed to run in all encodings. 
Either we could knock out the offending test, or we could supply an 
alternative result file. If we do the latter, maybe we should modify 
the query slightly, so it reads


   SELECT 'getdatabaseencoding() = 'UTF8' as is_utf8, \uaBcD'::json;




Actually, given recent discussion I think that test should just be 
removed from json.c. We don't actually have any test that the code point 
is valid (e.g. that it doesn't refer to an unallocated code point). We 
don't do that elsewhere either - the unicode_to_utf8() function the 
scanner uses to turn \u escapes into utf8 doesn't look for 
unallocated code points. I'm not sure how much other validation we 
should do - for example on correct use of surrogate pairs. I'd rather 
get this as right as possible now - every time we tighten encoding rules 
to make sure incorrectly encoded data doesn't get into the database it 
causes someone real pain.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_database deadlock counter

2012-01-22 Thread Jaime Casanova
On Mon, Jan 16, 2012 at 3:19 PM, Magnus Hagander mag...@hagander.net wrote:
 Attached patch adds a counter for number of deadlocks in a database to
 pg_stat_database.


A little review:

- it applies with a few hunks
- the oid you have chosen for the function pg_stat_get_db_deadlocks()
is already in use, please choose another one using the unused_oids
script (3150)
- pg_stat_reset() doesn't reset deadlock counter (see
pgstat_recv_resetcounter())


everything else seems fine to me

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-22 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@commandprompt.com writes:
 The problem is that this one doesn't have the
 Content-Disposition: attachment
 line in the MIME header.  I don't know what we can do about it.

It's sent with an inline attachment AFAICT, some MA will make it easy
to process the attachment and some others will just make the content
appear within the mail. It seems the vast majority falls into the
unhelpful second category.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-22 Thread Dimitri Fontaine
Alex Shulgin a...@commandprompt.com writes:
 Another idea: introduce some simple tag system in mails sent to -hackers
 to be treated specially, e.g:
[...]
 How does that sound?

Very much like what debbugs does already.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-22 Thread Daniel Farina
On Fri, Jan 20, 2012 at 3:33 PM, Daniel Farina dan...@heroku.com wrote:
 On Fri, Jan 20, 2012 at 2:48 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Even if you give the version number in the CREATE EXTENSION command, it's by
 convention that people actually maintain a sane versioning policy. If people
 don't take version management seriously, you will quickly end up with five
 different versions of an extension, all with version number 0.1.

 Projects are taking it seriously, and invest a lot of effort in it.
 There is no shortage of schema versioning frameworks, of varying
 levels of maturitybut some are quite complete by the standards of
 their users.  However, there is little knowledge shared between them,
 and the no database gives them much support, so idiosyncrasy becomes
 inevitable.

Speak of the devil. Someone just posted use of extension versioning to
manage schemas (using the existing plain-old-files mechanism):

http://philsorber.blogspot.com/2012/01/deploy-schemata-like-boss.html

He also links to a -hackers post Dimitri wrote last December.

A few anecdotes does not constitute evidence, but it does look like
some people pay attention to any additional versioning foothold they
can get.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remembering bug #6123

2012-01-22 Thread Tom Lane
OK, here's an updated version of the patch.  I changed the error message
texts as per discussion (except I decided to use one message string for
all the cases rather than saddle translators with several variants).
Also, I put in an error in GetTupleForTrigger, which fixes the
self-reference case I illustrated before (now added to the regression
test).  However, I found out that changing the other two callers of
heap_lock_tuple would open an entirely new can of worms, so for now
they still have the historical behavior of ignoring self-updated tuples.

The problem with changing ExecLockRows or EvalPlanQualFetch can be
illustrated by the regression test case it breaks, which basically is

BEGIN;
DECLARE c1 CURSOR FOR SELECT * FROM table FOR UPDATE;
UPDATE table SET ...;
FETCH ALL FROM c1;
COMMIT;

When the FETCH comes to a row that's been updated by the UPDATE, it sees
that row as HeapTupleSelfUpdated with a cmax greater than es_output_cid
(which is the CID assigned to the DECLARE).  So if we make these callers
throw an error for the case, coding like the above will fail, which
seems to me to be pretty darn hard to justify.  It is not a corner case
that could be caused only by questionable use of trigger side effects.
So that seems to leave us with two choices: (1) ignore the row, or
(2) attempt to lock the latest version instead of the visible version.
(1) is our historical behavior but seems arguably wrong.  I tried to
make the patch do (2) but it crashed and burned because heap_lock_tuple
spits up if asked to lock an invisible row.  We could possibly finesse
that by having EvalPlanQualFetch sometimes pass a CID later than
es_output_cid to heap_lock_tuple, but it seems ticklish.  More, I think
it would also take some adjustments to the behavior of
HeapTupleSatisfiesDirty, else we'll not see such tuples in the first
place.  So this looks messy, and also rather orthogonal to the current
goals of the patch.

Also, I'm not sure that your testing would exercise such cases at all,
as you have to be using SELECT FOR UPDATE and/or READ COMMITTED mode to
get to any of the relevant code.  I gather your software mostly relies
on SERIALIZABLE mode to avoid such issues.  So I stopped with this.

regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 99a431a95ff0662c82de5dfbd253f00777ca2c7c..25cbbd3ef283022d4fa7634dc79537efe9dfb735 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*** simple_heap_insert(Relation relation, He
*** 2317,2343 
   *
   *	relation - table to be modified (caller must hold suitable lock)
   *	tid - TID of tuple to be deleted
-  *	ctid - output parameter, used only for failure case (see below)
-  *	update_xmax - output parameter, used only for failure case (see below)
   *	cid - delete command ID (used for visibility test, and stored into
   *		cmax if successful)
   *	crosscheck - if not InvalidSnapshot, also check tuple against this
   *	wait - true if should wait for any conflicting update to commit/abort
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we did delete it.  Failure return codes are
   * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
   * (the last only possible if wait == false).
   *
!  * In the failure cases, the routine returns the tuple's t_ctid and t_xmax.
!  * If t_ctid is the same as tid, the tuple was deleted; if different, the
!  * tuple was updated, and t_ctid is the location of the replacement tuple.
!  * (t_xmax is needed to verify that the replacement tuple matches.)
   */
  HTSU_Result
  heap_delete(Relation relation, ItemPointer tid,
! 			ItemPointer ctid, TransactionId *update_xmax,
! 			CommandId cid, Snapshot crosscheck, bool wait)
  {
  	HTSU_Result result;
  	TransactionId xid = GetCurrentTransactionId();
--- 2317,2342 
   *
   *	relation - table to be modified (caller must hold suitable lock)
   *	tid - TID of tuple to be deleted
   *	cid - delete command ID (used for visibility test, and stored into
   *		cmax if successful)
   *	crosscheck - if not InvalidSnapshot, also check tuple against this
   *	wait - true if should wait for any conflicting update to commit/abort
+  *	hufd - output parameter, filled in failure cases (see below)
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we did delete it.  Failure return codes are
   * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
   * (the last only possible if wait == false).
   *
!  * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
!  * t_xmax, and t_cmax (the last only for HeapTupleSelfUpdated, since we
!  * cannot obtain cmax from a combocid generated by another transaction).
!  * See comments for struct HeapUpdateFailureData for additional info.
   */
  HTSU_Result
  heap_delete(Relation relation, ItemPointer tid,
! 			CommandId 

Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-01-22 Thread Simon Riggs
On Sat, Jan 21, 2012 at 8:42 PM, Noah Misch n...@leadboat.com wrote:

 You currently forbid multi-column EACH FKs.  I agree that we should allow only
 one array column per FK; with more, the set of required PK rows would be
 something like the Cartesian product of the elements of array columns.
 However, there are no definitional problems, at least for NO ACTION, around a
 FK constraint having one array column and N scalar columns.  Whether or not
 you implement that now, let's choose a table_constraint syntax leaving that
 opportunity open.  How about:
        FOREIGN KEY(col_a, EACH col_b, col_c) REFERENCES pktable (a, b, c)


I don't think we should be trying to cover every possible combination
of arrays, non-arrays and all the various options. The number of
combinations is making this patch larger than it needs to be and as a
result endangers its being committed in this release just on committer
time to cope with the complexity. We have a matter of weeks to get
this rock solid.

Yes, lets keep syntax open for future additions, but lets please
focus/edit this down to a solid, useful patch for 9.2.

For me, one array column, no other non-array columns and delete
restrict would cover 90+% of use cases. Bearing in mind you can cover
other cases by writing your own triggers, I don't think solving every
problem makes sense in a single release. Once we have a solid base we
can fill in the rare cases later.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-01-22 Thread Alexander Korotkov
Hi!

Updated patch is attached. I've updated comment
of mcelem_array_contained_selec with more detailed description of
probability distribution assumption. Also, I found that rest behavious
should be better described by Poisson distribution, relevant changes were
made.

On Tue, Jan 17, 2012 at 2:33 PM, Noah Misch n...@leadboat.com wrote:

 By summary frequency of elements, do you mean literally P_0 + P_1 ... +
 P_N?
 If so, I can follow the above argument for column  const and column @
 const, but not for column @ const.  For column @ const, selectivity
 cannot exceed the smallest frequency among const elements.  A number of
 high-frequency elements will drive up the sum of the frequencies without
 changing the true selectivity much at all.

Referencing to summary frequency is not really correct. It would be more
correct to reference to number of element in const. When there are many
elements in const, column @ const selectivity tends to be close to 0
and  column @ const tends to be close to 1. Surely, it's true when
elements have some kind of middle values of frequencies (not very close to
0 and not very close to 1). I've replaced summary frequency of elements
by number of elements.

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.12.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-22 Thread Mikko Tiihonen


Previous title was: Add minor version to v3 protocol to allow changes without 
breaking backwards compatibility

On 01/20/2012 04:45 AM, Noah Misch wrote:

On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote:

On Thu, Jan 19, 2012 at 10:37 AM, Noah Mischn...@leadboat.com  wrote:

I agree with Merlin; the frontend/backend protocol is logically distinct from
the binary send/recv formats of data types. ?For one key point, the latter is
not exclusively core-defined; third-party extensions change their send/recv
formats on a different schedule. ?They can add myext.binary_format_version
GUCs of their own to cope in a similar way.


I agree.  It occurs to me that we recently changed the default *text*
output format for bytea for reasons not dissimilar to those
contemplated here.  Presumably, that's a much more disruptive change,
and yet we've had minimal complaints because anyone who gets bitten
can easily set bytea_output='escape' and the problem goes away.  The
same thing seems like it would work here, only the number of people
needing to change the parameter will probably be even smaller, because
fewer people use binary than text.

Having said that, if we're to follow the precedent set by
bytea_format, maybe we ought to just add
binary_array_format={huge,ittybitty} and be done with it, rather than
inventing a minor protocol version GUC for something that isn't really
a protocol version change at all.  We could introduce a
differently-named general mechanism, but I guess I'm not seeing the
point of that either.  Just because someone has a
backward-compatibility issue with one change of this type doesn't mean
they have a similar issue with all of them.  So I think adding a
special-purpose GUC is more logical and more parallel to what we've
done in the past, and it doesn't saddle us with having to be certain
that we've designed the mechanism generally enough to handle all the
cases that may come later.


That makes sense.  An attraction of a single binary format version was avoiding
the Is this worth a GUC? conversation for each change.  However, adding a GUC
should be no more notable than bumping a binary format version.


I see the main difference between the GUC per feature vs minor version being 
that
in versioned changes old clients keep working because the have to explicitly
request a specific version. Whereas in separate GUC variables each feature will 
be
enabled by default and users have to either keep up with new client versions or
figure out how to explicitly disable the changes.

However, due to popular vote I removed the minor version proposal for now.


Here is a second version of the patch. The binary array encoding changes
stay the same but all code around was rewritten.

Changes from previous versions based on received comments:
* removed the minor protocol version concept
* introduced a new GUC variable array_output copying the current
  bytea_output type, with values full (old value) and
  smallfixed (new default)
* added documentation for the new GUC variable
* used constants for the array flags variable values

-Mikko
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index e55b503..179a081
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** COPY postgres_log FROM '/full/path/to/lo
*** 4888,4893 
--- 4888,4910 
/listitem
   /varlistentry
  
+  varlistentry id=guc-array-output xreflabel=array_output
+   termvarnamearray_output/varname (typeenum/type)/term
+   indexterm
+primaryvarnamearray_output/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Sets the output format for binary encoding of values of
+ type typearray/type. Valid values are
+ literalsmallfixed/literal (the default)
+ and literalfull/literal (the traditional PostgreSQL
+ format).  The typearray/type type always
+ accepts both formats on input, regardless of this setting.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-xmlbinary xreflabel=xmlbinary
termvarnamexmlbinary/varname (typeenum/type)/term
indexterm
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
new file mode 100644
index 5582a06..6192a2e
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
***
*** 30,41 
--- 30,45 
   * GUC parameter
   */
  bool		Array_nulls = true;
+ int			array_output = ARRAY_OUTPUT_SMALLFIXED;
  
  /*
   * Local definitions
   */
  #define ASSGN	 =
  
+ #define FLAG_HASNULLS 1
+ #define FLAG_FIXEDLEN 2
+ 
  typedef enum
  {
  	ARRAY_NO_LEVEL,
*** static void ReadArrayBinary(StringInfo b
*** 86,92 
  FmgrInfo *receiveproc, Oid typioparam, int32 typmod,
  int typlen, bool typbyval, char typalign,
  Datum *values, bool *nulls,
! bool *hasnulls, int32 *nbytes);
  static void 

Re: [HACKERS] CLOG contention, part 2

2012-01-22 Thread Jeff Janes
On Fri, Jan 20, 2012 at 6:44 AM, Simon Riggs si...@2ndquadrant.com wrote:

 OT: It would save lots of time if we had 2 things for the CF app:

..
 2. Something that automatically tests patches. If you submit a patch
 we run up a blank VM and run patch applies on all patches. As soon as
 we get a fail, an email goes to patch author. That way authors know as
 soon as a recent commit invalidates something.

Well, first the CF app would need to reliably be able to find the
actual patch.  That is currently not a given.

Also, it seems that OID collisions are a dime a dozen, and I'm
starting to doubt that they are even worth reporting in the absence of
a more substantive review.  And in the patches I've looked at, it
seems like the OID is not even cross-referenced anywhere else in the
patch, the cross-references are all based on symbolic names.  I freely
admit I have no idea what I am talking about, but it seems like the
only purpose of OIDs is to create bit rot.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Next steps on pg_stat_statements normalisation

2012-01-22 Thread Simon Riggs
On Sun, Jan 22, 2012 at 5:30 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:

 So, having received feedback from Tom and others in relation to this
 patch, I would like to state how I think I should go about addressing
 various concerns to ensure that a revision of the patch gets into the
 9.2 release. As I've said time and again, I think that it is very
 important that we have this, sooner rather than later.

Nothing can be ensured completely, but I would add this is a very
important feature. Without it, large systems without prepared
statements are mostly untunable and therefore untuned, which is a bad
thing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-22 Thread Greg Smith

Jeff Janes wrote:

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes by design due to the use buffer access strategy
during bulk inserts.

If I'm running a tightly controlled benchmark on an otherwise unused
server and I know that no BAS is being used, then I can meaningfully
use backend_writes.  That is a pretty limiting limit.
  


I don't think it's quite that bad in general; this presumes a moderate 
amount of BAS writes relative to other activity.  But I understand your 
concern better now.  I don't think the sorts of workloads you seem to 
have a lot of were considered very carefully before here.



I think we should either create a separate column to segregate BAS
backend_writes, or just don't report them at all and report only the
non-BAS ones into pg_stat_bgwriter.
  


We can't not report them.  One of the goals of pg_stat_bgwriter is to 
account for all writes out of the buffer cache.  If there enough BAS 
writes on your system that them being lumped together is a serious 
problem, having them go missing altogether would be even worse.  And any 
whacking around of pg_stat_bgwriter might as well fix that too.


Do you think you could put together a quick test case that's similar to 
the ones you're seeing unfair accounting for here?  This isn't quite 
buggy behavior, but a solid example I could test against showing it's a 
sketchy approach would be enough for me to incorporate a fix for it into 
this suggested redesign.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Jim Nasby
On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:
 On 04.01.2012 13:14, Simon Riggs wrote:
 On Tue, Jan 3, 2012 at 11:28 PM, Tom Lanet...@sss.pgh.pa.us  wrote:
 Jim Nasbyj...@nasby.net  writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.
 
 Couldn't we just leave the buffers alone? Once an index is dropped and 
 that's pushed out through the catalog then nothing should be trying to 
 access them and they'll eventually just get aged out.
 
 No, we can't, because if they're still dirty then the bgwriter would
 first try to write them to the no-longer-existing storage file.  It's
 important that we kill the buffers immediately during relation drop.
 
 I'm still thinking that it might be sufficient to mark the buffers
 invalid and let the clock sweep find them, thereby eliminating the need
 for a freelist.
 
 My patch puts things on the freelist only when it is free to do so.
 Not having a freelist at all is probably a simpler way of avoiding the
 lock contention, so I'll happily back that suggestion instead. Patch
 attached, previous patch revoked.
 
 So, you're proposing that we remove freelist altogether? Sounds reasonable, 
 but that needs to be performance tested somehow. I'm not sure what exactly 
 the test should look like, but it probably should involve an OLTP workload, 
 and large tables being created, populated to fill the cache with pages from 
 the table, and dropped, while the OLTP workload is running. I'd imagine that 
 the worst case scenario looks something like that.

We should also look at having the freelist do something useful, instead of just 
dropping it completely. Unfortunately that's probably more work...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum rate limit in KBps

2012-01-22 Thread Jim Nasby
On Jan 19, 2012, at 4:23 PM, Greg Smith wrote:
 On 1/18/12 4:18 PM, Jim Nasby wrote:
 What about doing away with all the arbitrary numbers completely, and just 
 state data rate limits for hit/miss/dirty?
 
 Since many workloads will have a mix of all three, it still seems like 
 there's some need for weighing these individually, even if they each got 
 their own rates.  If someone says read=8MB/s and write=4MB/s (the current 
 effective defaults), I doubt they would be happy with seeing 12MB/s happen.
 
 BTW, this is a case where it would be damn handy to know if the miss was 
 really a miss or not... in the case where we're already rate limiting 
 vacuum, could we afford the cost of get_time_of_day() to see if a miss 
 actually did have to come from disk?
 
 We certainly might if it's a system where timing information is reasonably 
 cheap, and measuring that exact area will be easy if the timing test contrib 
 module submitted into this CF gets committed.  I could see using that to 
 re-classify some misses as hits if the read returns fast enough.
 
 There's not an obvious way to draw that line though.  The fast=hit vs. 
 slow=miss transition happens at very different place on SSD vs. regular 
 disks, as the simplest example.  I don't see any way to wander down this path 
 that doesn't end up introducing multiple new GUCs, which is the opposite of 
 what I'd hoped to do--which was at worst to keep the same number, but reduce 
 how many were likely to be touched.

Your two comments together made me realize something... at the end of the day 
people don't care about MB/s. They care about impact to other read and write 
activity in the database.

What would be interesting is if we could monitor how long all *foreground* IO 
requests took. If they start exceeding some number, that means the system is at 
or near full capacity, and we'd like background stuff to slow down.

Dealing with SSDs vs real media would be a bit challenging... though, I think 
it would only be an issue if the two were randomly mixed together. Kept 
separately I would expect them to have distinct behavior patterns that could be 
measured and identified.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2012-01-22 Thread Jim Nasby
On Jan 4, 2012, at 5:59 PM, Gianni Ciolli wrote:
 On Wed, Jan 04, 2012 at 04:58:08PM -0600, Jim Nasby wrote:
 Except AFAIR Oracle uses the term to indicate something that is
 happening *outside* of your current transaction, which is definitely
 not what the proposal is talking about.
 
 That feature is commonly translated in PostgreSQL to a dblink-based
 solution, which itself is not distant from the current proposal, at
 least in terms of inside/outside (the biggest difference I can see is
 on sharing temporary tables).
 
 But I am not sure I understand your remark; it would be clearer to me
 if you could provide an example explaining the difference.

As I understand your proposal, you are doing everything in a single backend and 
a single transaction... you're just providing a means to split one transaction 
into smaller pieces.

Is that not the case?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2012-01-22 Thread Jim Nasby
On Jan 10, 2012, at 3:07 AM, Simon Riggs wrote:
 I think we could add an option to check the checksum immediately after
 we pin a block for the first time but it would be very expensive and
 sounds like we're re-inventing hardware or OS features again. Work on
 50% performance drain, as an estimate.
 
 That is a level of protection no other DBMS offers, so that is either
 an advantage or a warning. Jim, if you want this, please do the
 research and work out what the probability of losing shared buffer
 data in your ECC RAM really is so we are doing it for quantifiable
 reasons (via old Google memory academic paper) and to verify that the
 cost/benefit means you would actually use it if we built it. Research
 into requirements is at least as important and time consuming as
 research on possible designs.

Maybe I'm just dense, but it wasn't clear to me how you could use the 
information in the google paper to extrapolate data corruption probability.

I can say this: we have seen corruption from bad memory, and our Postgres 
buffer pool (8G) is FAR smaller than available memory on all of our servers 
(192G or 512G). So at least in our case, CRCs that protect the filesystem cache 
would protect the vast majority of our memory (96% or 98.5%).
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum rate limit in KBps

2012-01-22 Thread Greg Smith

Jim Nasby wrote:

Your two comments together made me realize something... at the end of the day 
people don't care about MB/s. They care about impact to other read and write 
activity in the database.

What would be interesting is if we could monitor how long all *foreground* IO 
requests took. If they start exceeding some number, that means the system is at 
or near full capacity, and we'd like background stuff to slow down.
  


My hope for 9.2 was to get VACUUM moved over into some human-readable 
units.  Having the whole thing work only via these abstract cost units 
is driving most of my customers with larger databases crazy.  The patch 
I suggested was the easiest refactoring I thought moved in the right 
direction.  While it may not be the perfect thing to care about, the 
very positive reaction I've gotten to the already landed patch to log in 
MB/s has suggested to me people are a lot more comfortable with that 
than the cost limit numbers.


For 9.3, this whole mess needs to become integrated with a full-system 
monitoring approach, to really solve this well.  pg_stat_bgwriter knows 
how many writes are coming from the various parts of the system, the 
total amount of write I/O.  Given that, I can turn VACUUM completely 
dynamic based on what else is happening in many common situations.  The 
sort of end goal I was thinking about was be able to say something like 
let VACUUM use up to 4MB/s on writes, but subtract off the average 
write level of everything else.  Now it's a background process running 
only when there's capacity to spare for it.  You could turn it up a lot 
higher, if you knew it was only going to run at that level when the 
system wasn't as busy.  That's one reason I started by suggesting a 
write-based limit; it fit into that longer-range plan better.  Maybe 
that idea is junk and focusing on actual read I/O is the real problem 
with VACUUM for most people.  I can tell you once I get more data out of 
systems that are logging in MB/s.


If instead or in addition we get some better field data on systems that 
can afford to time a lot more things, and then start building feedback 
limiters based on how long all sorts of operations take to occur, that's 
a whole different parallel approach for auto-tuning this.  I haven't 
thought about that as much simply because it only just became clear 
recently when the timing data is cheap to collect.  I need to get a lot 
more production server data about that overhead to work with here too.



Dealing with SSDs vs real media would be a bit challenging... though, I think 
it would only be an issue if the two were randomly mixed together. Kept 
separately I would expect them to have distinct behavior patterns that could be 
measured and identified


This might just turn into another one of those things where we will 
eventually need to have some more information on a per-tablespace 
basis.  I envision allowing the server to collect more timing data as 
being something you can turn on for a bit, let it populate statistics 
about just what fast or slow means for each tablespace.  Then you can 
keep those results around to guide future decisions even after timing is 
turned off.  Maybe toggle it back on a day a month to make sure the 
numbers are still sane, if it's too expensive to time things every day.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Robert Haas
On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby j...@nasby.net wrote:
 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...

That's kinda my feeling as well.  The free list in its current form is
pretty much useless, but I don't think we'll save much by getting rid
of it, because that's just a single test.  The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!).  So I guess
my vote would be to leave it alone pending further study, and maybe
remove it later if we can't find a way to rejigger it to be more
useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-22 Thread Robert Haas
On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I'm finding the backend_writes column pretty unfortunate.  The only
 use I know of for it is to determine if the bgwriter is lagging
 behind.  Yet it doesn't serve even this purpose because it lumps
 together the backend writes due to lagging background writes, and the
 backend writes by design due to the use buffer access strategy
 during bulk inserts.

+1 for separating those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Next steps on pg_stat_statements normalisation

2012-01-22 Thread Robert Haas
On Sun, Jan 22, 2012 at 5:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sun, Jan 22, 2012 at 5:30 AM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 So, having received feedback from Tom and others in relation to this
 patch, I would like to state how I think I should go about addressing
 various concerns to ensure that a revision of the patch gets into the
 9.2 release. As I've said time and again, I think that it is very
 important that we have this, sooner rather than later.

 Nothing can be ensured completely, but I would add this is a very
 important feature. Without it, large systems without prepared
 statements are mostly untunable and therefore untuned, which is a bad
 thing.

I, too, am pretty excited about the potential for this feature.
Having not read the patch I'm not able to comment on code quality or
design at this point, but I'm definitely +1 on the concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG-Strom - A GPU optimized asynchronous executor module

2012-01-22 Thread Robert Haas
On Sun, Jan 22, 2012 at 10:48 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I tried to implement a fdw module that is designed to utilize GPU
 devices to execute
 qualifiers of sequential-scan on foreign tables managed by this module.

 It was named PG-Strom, and the following wikipage gives a brief
 overview of this module.
    http://wiki.postgresql.org/wiki/PGStrom

 In our measurement, it achieves about x10 times faster on
 sequential-scan with complex-
 qualifiers, of course, it quite depends on type of workloads.

That's pretty neat.  In terms of tuning the non-GPU based
implementation, have you done any profiling?  Sometimes that leads to
an oh, woops moment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-01-22 Thread Noah Misch
On Sun, Jan 22, 2012 at 09:06:49PM +, Simon Riggs wrote:
 On Sat, Jan 21, 2012 at 8:42 PM, Noah Misch n...@leadboat.com wrote:
 
  You currently forbid multi-column EACH FKs. ?I agree that we should allow 
  only
  one array column per FK; with more, the set of required PK rows would be
  something like the Cartesian product of the elements of array columns.
  However, there are no definitional problems, at least for NO ACTION, around 
  a
  FK constraint having one array column and N scalar columns. ?Whether or not
  you implement that now, let's choose a table_constraint syntax leaving that
  opportunity open. ?How about:
  ? ? ? ?FOREIGN KEY(col_a, EACH col_b, col_c) REFERENCES pktable (a, b, c)
 
 
 I don't think we should be trying to cover every possible combination
 of arrays, non-arrays and all the various options. The number of
 combinations is making this patch larger than it needs to be and as a
 result endangers its being committed in this release just on committer
 time to cope with the complexity. We have a matter of weeks to get
 this rock solid.
 
 Yes, lets keep syntax open for future additions, but lets please
 focus/edit this down to a solid, useful patch for 9.2.

+1.  Let's change the syntax to leave that door open but not use the
flexibility at this time.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-22 Thread Noah Misch
On Sun, Jan 22, 2012 at 11:47:06PM +0200, Mikko Tiihonen wrote:
 On 01/20/2012 04:45 AM, Noah Misch wrote:
 On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote:
 Having said that, if we're to follow the precedent set by
 bytea_format, maybe we ought to just add
 binary_array_format={huge,ittybitty} and be done with it, rather than
 inventing a minor protocol version GUC for something that isn't really
 a protocol version change at all.  We could introduce a
 differently-named general mechanism, but I guess I'm not seeing the
 point of that either.  Just because someone has a
 backward-compatibility issue with one change of this type doesn't mean
 they have a similar issue with all of them.  So I think adding a
 special-purpose GUC is more logical and more parallel to what we've
 done in the past, and it doesn't saddle us with having to be certain
 that we've designed the mechanism generally enough to handle all the
 cases that may come later.

 That makes sense.  An attraction of a single binary format version was 
 avoiding
 the Is this worth a GUC? conversation for each change.  However, adding a 
 GUC
 should be no more notable than bumping a binary format version.

 I see the main difference between the GUC per feature vs minor version being 
 that
 in versioned changes old clients keep working because the have to explicitly
 request a specific version. Whereas in separate GUC variables each feature 
 will be
 enabled by default and users have to either keep up with new client versions 
 or
 figure out how to explicitly disable the changes.

No, we can decide that anew for each GUC.  If you'd propose that array_output
= full be the default, that works for me.

 Here is a second version of the patch. The binary array encoding changes
 stay the same but all code around was rewritten.

 Changes from previous versions based on received comments:
 * removed the minor protocol version concept
 * introduced a new GUC variable array_output copying the current
   bytea_output type, with values full (old value) and
   smallfixed (new default)

How about the name array_binary_output?

 * added documentation for the new GUC variable
 * used constants for the array flags variable values

 *** a/doc/src/sgml/config.sgml
 --- b/doc/src/sgml/config.sgml
 *** COPY postgres_log FROM '/full/path/to/lo
 *** 4888,4893 
 --- 4888,4910 
 /listitem
/varlistentry
   
 +  varlistentry id=guc-array-output xreflabel=array_output
 +   termvarnamearray_output/varname (typeenum/type)/term
 +   indexterm
 +primaryvarnamearray_output/ configuration parameter/primary
 +   /indexterm
 +   listitem
 +para
 + Sets the output format for binary encoding of values of
 + type typearray/type. Valid values are
 + literalsmallfixed/literal (the default)
 + and literalfull/literal (the traditional PostgreSQL
 + format).  The typearray/type type always

It's not The array type but Array types, a class.

 + accepts both formats on input, regardless of this setting.
 +/para
 +   /listitem
 +  /varlistentry

The section Array Input and Output Syntax should reference this GUC.

 *** a/src/backend/utils/misc/guc.c
 --- b/src/backend/utils/misc/guc.c
 ***
 *** 64,69 
 --- 64,70 
   #include storage/predicate.h
   #include tcop/tcopprot.h
   #include tsearch/ts_cache.h
 + #include utils/array.h
   #include utils/builtins.h
   #include utils/bytea.h
   #include utils/guc_tables.h
 *** static const struct config_enum_entry by
 *** 225,230 
 --- 226,243 
   };
   
   /*
 +  * Options for enum values defined in this module.
 +  *
 +  * NOTE! Option values may not contain double quotes!
 +  */

Don't replicate this comment.

 + 
 + static const struct config_enum_entry array_output_options[] = {
 + {full, ARRAY_OUTPUT_FULL, false},
 + {smallfixed, ARRAY_OUTPUT_SMALLFIXED, false},
 + {NULL, 0, false}
 + };
 + 
 + /*
* We have different sets for client and server message level options 
 because
* they sort slightly different (see log level)
*/
 *** static struct config_enum ConfigureNames
 *** 3047,3052 
 --- 3060,3075 
   NULL, NULL, NULL
   },
   
 + {
 + {array_output, PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,

You've put the GUC in COMPAT_OPTIONS_PREVIOUS, but you've put its
documentation in the section for CLIENT_CONN_STATEMENT.  I don't have a strong
opinion on which one to use, but be consistent.

 + gettext_noop(Sets the binary output format for 
 arrays.),
 + NULL
 + },
 + bytea_output,

array_output

 + ARRAY_OUTPUT_SMALLFIXED, array_output_options,
 + NULL, NULL, NULL
 + },
 + 
   {
   {client_min_messages, PGC_USERSET, LOGGING_WHEN,
   gettext_noop(Sets the message levels 

Re: [HACKERS] Inline Extension

2012-01-22 Thread Robert Haas
On Sun, Jan 22, 2012 at 3:20 PM, Daniel Farina dan...@heroku.com wrote:
 A few anecdotes does not constitute evidence, but it does look like
 some people pay attention to any additional versioning foothold they
 can get.

Sure, but just because some people do it doesn't make it a good idea.
I can personally attest to having done many things over the years
which initially appeared to be good ideas, but later turned out to be
mind-bogglingly dumb.  It's true that if we make it easy for people to
use the extension mechanism as a way of storing versioning information
for user code, then people will use it for that purpose, but the
question is whether that's really what we want.  I don't see what
advantage it gives us.  Dimitri's proposal was to neuter the pg_dump
support that is the raison d'être of the extension mechanism.  That's
clearly necessary if you don't want to end up with an unreloadable
database, but it begs the question (which no one's really answered
AFAICT) of what good the extension mechanism is without that feature.
There are certainly easier ways to remember a version number than
building support for it into core.  If people create their own
versioning mechanisms, they can create something which is tailor-made
for their particular requirements, rather than relying on decisions
which we made in core that may or may not be right for them (e.g. the
lack of version ordering, or even that we have versions rather than
some more general type of control table).  I don't want to prejudge
the outcome and say, oh, we should never have support for this concept
in core - but neither do I want to embark on that project without a
detailed understanding of where and how it is adding value.  If people
are doing management via pure FEBE, good for them: but that doesn't
explain why it shoudn't be done all in userspace, with all of the
flexibility that gives.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby j...@nasby.net wrote:
 We should also look at having the freelist do something useful, instead of 
 just dropping it completely. Unfortunately that's probably more work...

 That's kinda my feeling as well.  The free list in its current form is
 pretty much useless, but I don't think we'll save much by getting rid
 of it, because that's just a single test.  The expensive part of what
 we do while holding BufFreelistLock is, I think, iterating through
 buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested.  So I think
it would be advisable to prove rather than just assume that that's a
problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum rate limit in KBps

2012-01-22 Thread Christopher Browne
On Sat, Jan 21, 2012 at 5:54 PM, Jim Nasby j...@nasby.net wrote:
 What would be interesting is if we could monitor how long all *foreground* IO 
 requests took. If they start exceeding some number, that means the system is 
 at or near full capacity, and we'd like background stuff to slow down.

There's something to that...

On occasion, we've had a bit of hue and cry when sysadmins noticed
that nearly 100% of bandwidth was being chewed up by VACUUM.

A closer look showed that there wasn't a *true* problem, as there
wasn't anything else material looking for I/O.  Letting VACUUM have
all the bandwidth it wants when the system isn't otherwise busy is a
pretty fine idea.

At such times, limiting VACUUM is counterproductive; it would be
better to let it go through and clean up as many tables in as short a
time as possible.

However, as soon as the number of processes waiting for I/O starts to
grow, you want VACUUM to back off.

That's an entirely more dynamic kind of dynamic than the suggested
parameterization indicates.  There's enough there that I'm not sure
how yet to measure that it may be necessary to start with the more
static indicator of maximum acceptable bandwidth usage.

As long as the parameters can be changed without requiring a
postmaster restart, it should be possible to do something more
adaptive if and when measurements emerge.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... If people
 are doing management via pure FEBE, good for them: but that doesn't
 explain why it shoudn't be done all in userspace, with all of the
 flexibility that gives.

On reflection it seems like this patch is simply offering the wrong
solution for the problem.  I agree that it could be useful to install
extensions without having direct access to the server's filesystem,
but it doesn't seem to follow that we must lobotomize existing extension
features in order to have that.  I pointed out earlier that you could
get such functionality via contrib/adminpack, though people not
unreasonably complained that that was pretty ugly and low-level.
But couldn't we define some SQL-level operations to allow installing
extension control and script files?

Probably the worst issue with that is that in typical installations,
the share/extension/ directory would be read-only to the server, and a
lot of people might be uncomfortable with making it writable.  Not sure
whether we should consider inventing another place to keep
SQL-command-installed extensions, or just say if you want this
functionality you have to make share/extension/ writable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG-Strom - A GPU optimized asynchronous executor module

2012-01-22 Thread Kohei KaiGai
2012/1/23 Robert Haas robertmh...@gmail.com:
 On Sun, Jan 22, 2012 at 10:48 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I tried to implement a fdw module that is designed to utilize GPU
 devices to execute
 qualifiers of sequential-scan on foreign tables managed by this module.

 It was named PG-Strom, and the following wikipage gives a brief
 overview of this module.
    http://wiki.postgresql.org/wiki/PGStrom

 In our measurement, it achieves about x10 times faster on
 sequential-scan with complex-
 qualifiers, of course, it quite depends on type of workloads.

 That's pretty neat.  In terms of tuning the non-GPU based
 implementation, have you done any profiling?  Sometimes that leads to
 an oh, woops moment.

Not yet, except for \timing.

What options are available to see rate of workloads of components
within a particular query?
I tried to google some keywords, but does not hit to me.


As an aside, I also tries to modify is_device_executable_qual() always
return false to disable qualifiers pushed-down.
In this case, 2100ms of 7679ms was consumed within this module, thus,
I guess rest of 5500ms was mostly consumed by ExecQual(), although
it is just an estimation...

postgres=# SET pg_strom.exec_profile = on;
SET
Time: 1.075 ms
postgres=# SELECT count(*) FROM ftbl WHERE sqrt((x-25.6)^2 + (y-12.8)^2)  10;
INFO:  PG-Strom Exec Profile on ftbl
INFO:  Total PG-Strom consumed time: 2100.898 ms
INFO:  Time to JIT Compile GPU code: 0.000 ms
INFO:  Time to initialize devices:   0.000 ms
INFO:  Time to Load column-stores:   7.013 ms
INFO:  Time to Scan column-stores:   1219.746 ms
INFO:  Time to Fetch virtual tuples: 874.095 ms
INFO:  Time of GPU Synchronization:  0.000 ms
INFO:  Time of Async memcpy: 0.000 ms
INFO:  Time of Async kernel exec:0.000 ms
 count
---
  3159
(1 row)

Time: 7679.342 ms


Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New replication mode: write

2012-01-22 Thread Simon Riggs
On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Please add the Apply mode.

 OK, will do.

 Done. Attached is the updated version of the patch.

I notice that the Apply mode isn't fully implemented. I had in mind
that you would add the latch required to respond more quickly when
only the Apply pointer has changed.

Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
was there another reason for not implementing that?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers