date:20120209

Re: [HACKERS] psql tab completion for SELECT

2012-02-09 Thread Tom Lane

Peter Eisentraut  writes:
> That seems pretty useful, and it's more or less a one-line change, as in
> the attached patch.

That seems pretty nearly entirely bogus.  What is the argument for
supposing that the word right after SELECT is a function name?  I would
think it would be a column name (from who-knows-what table) much more
often.  Also, if it is useful, people will expect it to work in more
places than just the first word after SELECT --- for instance, somebody
who didn't realize what a kluge it was would expect it to also work
right after a top-level comma after SELECT.  Or probably after a left
parenthesis in the SELECT list.  Etc.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch: fix pg_dump for inherited defaults & not-null flags

2012-02-09 Thread Tom Lane

Attached is a proposed patch to deal with the issue described here:
http://archives.postgresql.org/pgsql-bugs/2012-02/msg0.php

Even though we'd previously realized that comparing the text of
inherited CHECK expressions is an entirely unsafe way to detect
expression equivalence (cf comments for guessConstraintInheritance),
pg_dump is still doing that for inherited DEFAULT expressions, with
the predictable result that it does the wrong thing in this sort
of example.

Furthermore, as I looked more closely at the code, I realized that
there is another pretty fundamental issue: if an inherited column has a
default expression or NOT NULL bit that it did not inherit from its
parent, flagInhAttrs forces the column to be treated as non-inherited,
so that it will be emitted as part of the child table's CREATE TABLE
command.  This is *wrong* if the column is not attislocal; it will
result in the column incorrectly having the attislocal property after
restore.  (Note: such a situation could only arise if the user had
altered the column's default or NOT NULL property with ALTER TABLE after
creation.)

All of this logic predates the invention of attislocal, and really is
attempting to make up for the lack of that bit, so it's not all that
surprising that it falls down.

So the attached patch makes the emit-column-or-not behavior depend only
on attislocal (except for binary upgrade which has its own kluge
solution for the problem; though note that the whether-to-dump tests now
exactly match the special cases for binary upgrade, which they did not
before).  Also, I've dropped the former attempts to exploit inheritance
of defaults, and so the code will now emit a default explicitly for each
child in an inheritance hierarchy, even if it didn't really need to.
Since the backend doesn't track whether defaults are inherited, this
doesn't cause any failure to restore the catalog state properly.

Although this is a bug fix, it's a nontrivial change in the logic and
so I'm hesitant to back-patch into stable branches.  Given the lack of
prior complaints, maybe it would be best to leave it unfixed in existing
branches?  Not sure.  Thoughts?

regards, tom lane

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 5a72d0da917a368ebbdbf164406c183693eb939c..266441df61d2c1eedd7faa39b8987254fe076f1c 100644
*** a/src/bin/pg_dump/common.c
--- b/src/bin/pg_dump/common.c
*** flagInhTables(TableInfo *tblinfo, int nu
*** 281,287 
  
  /* flagInhAttrs -
   *	 for each dumpable table in tblinfo, flag its inherited attributes
!  * so when we dump the table out, we don't dump out the inherited attributes
   *
   * modifies tblinfo
   */
--- 281,293 
  
  /* flagInhAttrs -
   *	 for each dumpable table in tblinfo, flag its inherited attributes
!  *
!  * What we need to do here is detect child columns that inherit NOT NULL
!  * bits from their parents (so that we needn't specify that again for the
!  * child) and child columns that have DEFAULT NULL when their parents had
!  * some non-null default.  In the latter case, we make up a dummy AttrDefInfo
!  * object so that we'll correctly emit the necessary DEFAULT NULL clause;
!  * otherwise the backend will apply an inherited default to the column.
   *
   * modifies tblinfo
   */
*** flagInhAttrs(TableInfo *tblinfo, int num
*** 297,303 
  		TableInfo  *tbinfo = &(tblinfo[i]);
  		int			numParents;
  		TableInfo **parents;
- 		TableInfo  *parent;
  
  		/* Sequences and views never have parents */
  		if (tbinfo->relkind == RELKIND_SEQUENCE ||
--- 303,308 
*** flagInhAttrs(TableInfo *tblinfo, int num
*** 314,445 
  		if (numParents == 0)
  			continue;			/* nothing to see here, move along */
  
! 		/*
! 		 * For each attr, check the parent info: if no parent has an attr
! 		 * with the same name, then it's not inherited. If there *is* an
! 		 * attr with the same name, then only dump it if:
! 		 *
! 		 * - it is NOT NULL and zero parents are NOT NULL
! 		 *	 OR
! 		 * - it has a default value AND the default value does not match
! 		 *	 all parent default values, or no parents specify a default.
! 		 *
! 		 * See discussion on -hackers around 2-Apr-2001.
! 		 *
! 		 */
  		for (j = 0; j < tbinfo->numatts; j++)
  		{
- 			bool		foundAttr;		/* Attr was found in a parent */
  			bool		foundNotNull;	/* Attr was NOT NULL in a parent */
! 			bool		defaultsMatch;	/* All non-empty defaults match */
! 			bool		defaultsFound;	/* Found a default in a parent */
! 			AttrDefInfo *attrDef;
! 
! 			foundAttr = false;
! 			foundNotNull = false;
! 			defaultsMatch = true;
! 			defaultsFound = false;
  
! 			attrDef = tbinfo->attrdefs[j];
  
  			for (k = 0; k < numParents; k++)
  			{
  int			inhAttrInd;
  
- parent = parents[k];
  inhAttrInd = strInArray(tbinfo->attnames[j],

Re: [HACKERS] xlog location arithmetic

2012-02-09 Thread Euler Taveira de Oliveira

On 08-02-2012 09:35, Fujii Masao wrote:

Fujii, new patch attached. Thanks for your tests.

> But another problem happened. When I changed pg_proc.h so that the unused
> OID was assigned to pg_xlog_location_diff(), and executed the above again,
> I encountered the segmentation fault:
> 
I reproduced the problems in my old 32-bit laptop. I fixed it casting to
int64. I also updated the duplicated OID.

> Why OID needs to be reassigned?
> 
There isn't a compelling reason. It is just a way to say: "hey, it is another
function with the same old name".

I'll not attach another version for pg_size_pretty because it is a matter of
updating a duplicated OID.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 236a60a..826f002 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14446,11 +14446,15 @@ SELECT set_config('log_statement_stats', 'off', false);

 pg_xlogfile_name_offset

+   
+pg_xlog_location_diff
+   
 

 The functions shown in  assist in making on-line backups.
-These functions cannot be executed during recovery.
+	These functions cannot be executed during recovery (except
+	pg_xlog_location_diff).

 

@@ -14518,6 +14522,13 @@ SELECT set_config('log_statement_stats', 'off', false);
text, integer
Convert transaction log location string to file name and decimal byte offset within file
   
+  
+   
+pg_xlog_location_diff(location text, location text)
+
+   numeric
+   Calculate the difference between two transaction log locations
+  
  
 

@@ -14611,6 +14622,13 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());

 

+	pg_xlog_location_diff calculates the difference in bytes
+	between two transaction log locations. It can be used with
+	pg_stat_replication or some functions shown in
+	 to get the replication lag.
+   
+
+   
 For details about proper usage of these functions, see
 .

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 2e10d4d..be7d388 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -26,6 +26,7 @@
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
+#include "utils/numeric.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -465,3 +466,83 @@ pg_is_in_recovery(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_BOOL(RecoveryInProgress());
 }
+
+static void
+validate_xlog_location(char *str)
+{
+#define	MAXLSNCOMPONENT		8
+
+	int	len1, len2;
+
+	len1 = strspn(str, "0123456789abcdefABCDEF");
+	if (len1 < 1 || len1 > MAXLSNCOMPONENT || str[len1] != '/')
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for transaction log location: \"%s\"", str)));
+	len2 = strspn(str + len1 + 1, "0123456789abcdefABCDEF");
+	if (len2 < 1 || len2 > MAXLSNCOMPONENT || str[len1 + 1 + len2] != '\0')
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+			 errmsg("invalid input syntax for transaction log location: \"%s\"", str)));
+}
+
+/*
+ * Compute the difference in bytes between two WAL locations.
+ */
+Datum
+pg_xlog_location_diff(PG_FUNCTION_ARGS)
+{
+	text		*location1 = PG_GETARG_TEXT_P(0);
+	text		*location2 = PG_GETARG_TEXT_P(1);
+	char		*str1, *str2;
+	XLogRecPtr	loc1, loc2;
+	Numeric		result;
+
+	/*
+	 * Read and parse input
+	 */
+	str1 = text_to_cstring(location1);
+	str2 = text_to_cstring(location2);
+
+	validate_xlog_location(str1);
+	validate_xlog_location(str2);
+
+	if (sscanf(str1, "%X/%X", &loc1.xlogid, &loc1.xrecoff) != 2)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg("could not parse transaction log location \"%s\"", str1)));
+	if (sscanf(str2, "%X/%X", &loc2.xlogid, &loc2.xrecoff) != 2)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg("could not parse transaction log location \"%s\"", str2)));
+
+	/*
+	 * Sanity check
+	 */
+	if (loc1.xrecoff > XLogFileSize)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg("xrecoff \"%X\" is out of valid range, 0..%X", loc1.xrecoff, XLogFileSize)));
+	if (loc2.xrecoff > XLogFileSize)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg("xrecoff \"%X\" is out of valid range, 0..%X", loc2.xrecoff, XLogFileSize)));
+
+	/*
+	 * result = XLogFileSize * (xlogid1 - xlogid2) + xrecoff1 - xrecoff2
+	 */
+	result = DatumGetNumeric(DirectFunctionCall2(numeric_sub,
+			DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) loc1.xlogid)),
+			DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) loc2.xlogid;
+	result = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
+			DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) XL

Re: [HACKERS] psql tab completion for SELECT

2012-02-09 Thread Dimitri Fontaine

Peter Eisentraut  writes:
> Make tab-completion complete also function names – like: SELECT
> pg_get to see all functions that start with pg_get.
>
> Make tab-completion work for columns in SELECT. I know that when writing
> SELECT clause, psql doesn’t know which table it will deal with, but it
> could search through all the columns in database.
>
> That seems pretty useful, and it's more or less a one-line change, as in
> the attached patch.

Does that includes support for completing SRF functions in the FROM clause?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only quals vs. security_barrier views

2012-02-09 Thread Jesper Krogh


On 2012-02-09 21:09, Robert Haas wrote:

That doesn't make sense to me.  If you probe index A for rows where a
= 1 and find that CTID (100,1) is such a row, and now want to return a
column value b that is not present in that index, the fastest way to
get the row is going to be to fetch block 100 from the heap and return
the data out of the first tuple.  To get the value out of some other
index that does include column b would require scanning the entire
index looking for that CTID, just so you could then grab the
corresponding index tuple, which wouldn't make any sense at all.


You're right, in my head, everything it wired up against my primary
keys, of-course that isn't the case for the DB. Sorry for the noise.

--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RFC: Making TRUNCATE more "MVCC-safe"

2012-02-09 Thread Marti Raudsepp

Hi!

I've always been a little wary of using the TRUNCATE command due to
the warning in the documentation about it not being "MVCC-safe":
queries may silently give wrong results and it's hard to tell when
they are affected.

That got me thinking, why can't we handle this like a standby server
does -- if some query has data removed from underneath it, it aborts
with a serialization failure.

Does this solution sound like a good idea?

The attached patch is a lame attempt at implementing this. I added a
new pg_class.relvalidxmin attribute which tracks the Xid of the last
TRUNCATE (maybe it should be called reltruncatexid?). Whenever
starting a relation scan with a snapshot older than relvalidxmin, an
error is thrown. This seems to work out well since TRUNCATE updates
pg_class anyway, and anyone who sees the new relfilenode automatically
knows when it was truncated.

Am I on the right track? Are there any better ways to attach this
information to a relation?
Should I also add another counter to pg_stat_database_conflicts?
Currently this table is only used on standby servers.

Since I wrote it just this afternoon, there are a few things still
wrong with the patch (it doesn't handle xid wraparound for one), so
don't be too picky about the code yet. :)

Example:
  CREATE TABLE foo (i int);
Session A:
  BEGIN ISOLATION LEVEL REPEATABLE READ;
  SELECT txid_current(); -- Force snapshot open
Session B:
  TRUNCATE TABLE foo;
Session A:
  SELECT * FROM foo;
ERROR:  canceling statement due to conflict with TRUNCATE TABLE on foo
DETAIL:  Rows visible to this transaction have been removed.


Patch also available in my github 'truncate' branch:
https://github.com/intgr/postgres/commits/truncate

Regards,
Marti
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
new file mode 100644
index 99a431a..e909b17
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*** heap_beginscan_internal(Relation relatio
*** 1175,1180 
--- 1175,1196 
  	HeapScanDesc scan;
  
  	/*
+ 	 * If the snapshot is older than relvalidxmin then that means TRUNCATE has
+ 	 * removed data that we should still see. Abort rather than giving
+ 	 * potentially bogus results.
+ 	 */
+ 	if (relation->rd_rel->relvalidxmin != InvalidTransactionId &&
+ 		snapshot->xmax != InvalidTransactionId &&
+ 		NormalTransactionIdPrecedes(snapshot->xmax, relation->rd_rel->relvalidxmin))
+ 	{
+ 		ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+  errmsg("canceling statement due to conflict with TRUNCATE TABLE on %s",
+ 		 NameStr(relation->rd_rel->relname)),
+  errdetail("Rows visible to this transaction have been removed.")));
+ 	}
+ 
+ 	/*
  	 * increment relation ref count while scanning relation
  	 *
  	 * This is just to make really sure the relcache entry won't go away while
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
new file mode 100644
index aef410a..3f9bd5d
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
*** InsertPgClassTuple(Relation pg_class_des
*** 787,792 
--- 787,793 
  	values[Anum_pg_class_relhastriggers - 1] = BoolGetDatum(rd_rel->relhastriggers);
  	values[Anum_pg_class_relhassubclass - 1] = BoolGetDatum(rd_rel->relhassubclass);
  	values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel->relfrozenxid);
+ 	values[Anum_pg_class_relvalidxmin - 1] = TransactionIdGetDatum(rd_rel->relvalidxmin);
  	if (relacl != (Datum) 0)
  		values[Anum_pg_class_relacl - 1] = relacl;
  	else
*** AddNewRelationTuple(Relation pg_class_de
*** 884,889 
--- 885,891 
  		new_rel_reltup->relfrozenxid = InvalidTransactionId;
  	}
  
+ 	new_rel_reltup->relvalidxmin = InvalidTransactionId;
  	new_rel_reltup->relowner = relowner;
  	new_rel_reltup->reltype = new_type_oid;
  	new_rel_reltup->reloftype = reloftype;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
new file mode 100644
index bfbe642..0d96a6a
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
*** reindex_index(Oid indexId, bool skip_con
*** 2854,2860 
  		}
  
  		/* We'll build a new physical relation for the index */
! 		RelationSetNewRelfilenode(iRel, InvalidTransactionId);
  
  		/* Initialize the index and rebuild */
  		/* Note: we do not need to re-establish pkey setting */
--- 2854,2860 
  		}
  
  		/* We'll build a new physical relation for the index */
! 		RelationSetNewRelfilenode(iRel, InvalidTransactionId, InvalidTransactionId);
  
  		/* Initialize the index and rebuild */
  		/* Note: we do not need to re-establish pkey setting */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
new file mode 100644
index 1f95abc..ab4aec2
*** a/src/backend/commands/sequence.c
--- b/src/backend/commands/sequence.c
*** ResetSequence(Oid seq_relid)
*** 287,293 
  	 * Create a new storage file for the sequence.	We want to keep the

[HACKERS] psql tab completion for SELECT

2012-02-09 Thread Peter Eisentraut

In his blog entry http://www.depesz.com/2011/07/08/wish-list-for-psql/
depesz described a simple way to do tab completion for SELECT in psql:

"""
Make tab-completion complete also function names – like: SELECT
pg_get to see all functions that start with pg_get.

Make tab-completion work for columns in SELECT. I know that when writing
SELECT clause, psql doesn’t know which table it will deal with, but it
could search through all the columns in database.
"""

That seems pretty useful, and it's more or less a one-line change, as in
the attached patch.

diff --git i/src/bin/psql/tab-complete.c w/src/bin/psql/tab-complete.c
index 3854f7f..416aa2f 100644
--- i/src/bin/psql/tab-complete.c
+++ w/src/bin/psql/tab-complete.c
@@ -2555,7 +2555,9 @@ psql_completion(char *text, int start, int end)
 		COMPLETE_WITH_CONST("IS");
 
 /* SELECT */
-	/* naah . . . */
+	/* complete with columns and functions */
+	else if (pg_strcasecmp(prev_wd, "SELECT") == 0)
+		COMPLETE_WITH_QUERY("SELECT name FROM (SELECT attname FROM pg_attribute UNION SELECT proname || '(' FROM pg_proc) t (name) WHERE substring(name,1,%d)='%s'");
 
 /* SET, RESET, SHOW */
 	/* Complete with a variable name */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-09 Thread Jay Levitt


Tom Lane wrote:

Jay Levitt  writes:

[Posted at Andres's request]
TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in
various builds.



1. In 9.1.2, inserting 10x rows takes 19x the time.
 - 9.1-HEAD and 9.2 "fix" this; it now slows down linearly
 - but: 10s>  8s>  5s!
 - but: comparing Ubuntu binary w/vanilla source build on virtual disks,
might not be significant


FWIW, I find it really hard to believe that there is any real difference
between 9.1.2 and 9.1 branch tip on this.  There have been no
significant changes in either the gist or contrib/cube code in that
branch.  I suspect you have a measurement issue there.


I suspect you're right, given that five runs in a row produced times from 7s 
to 10s.  I just wanted to include it for completeness and in case it 
triggered any "a-ha" moments.



4. 9.1-HEAD never successfully indexes 10 million rows ("never" = at least
20 minutes on two runs; I will follow up in a few hours)


Works for me (see above), though it's slower than you might've expected.


So my pre-built 9.1.2 takes 434s, my source-built 9.2 takes 509s, and 
(probably both of our) 9.1-HEAD takes 1918s... is that something to worry 
about, and if so, are there any tests I can run to assist? That bug doesn't 
affect me personally, but y'know, community and all that.  Also, I wonder if 
it's something like "9.2 got way faster doing X, but meanwhile, HEAD got way 
slower doing Y.", and this is a canary in the coal mine.


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only quals vs. security_barrier views

2012-02-09 Thread Robert Haas

On Thu, Feb 9, 2012 at 1:33 PM, Jesper Krogh  wrote:
> On 2012-02-09 18:02, Robert Haas wrote:
>>
>> I don't have any appetite for trying to do anything more with
>> index-only scans for 9.2, though maybe someone else will think
>> otherwise.  But I would like very much to get KaiGai's leakproof stuff
>> committed, and so it seems like a good idea to reconcile the needs of
>> that machinery with what might eventually be needed here.
>
> Those were a couple of nice cases where index-only-scans
> could win more than they does today. I have another one here:
>
> 2012-02-09 19:17:28.788 jk=# \d testtable
>                          Table "public.testtable"
>  Column |   Type   |                       Modifiers
> +--+
>  id     | integer  | not null default nextval('testtable_id_seq'::regclass)
>  fts    | tsvector |
> Indexes:
>    "prk_idx" UNIQUE, btree (id)
>    "fts_id" gin (fts)
>
> 2012-02-09 19:19:39.054 jk=# explain select id from testtable where fts @@
> to_tsquery('english','test1000');
>                              QUERY PLAN
> ---
>  Bitmap Heap Scan on testtable  (cost=20.29..161.28 rows=37 width=4)
>   Recheck Cond: (fts @@ '''test1000'''::tsquery)
>   ->  Bitmap Index Scan on fts_id  (cost=0.00..20.28 rows=37 width=0)
>         Index Cond: (fts @@ '''test1000'''::tsquery)
> (4 rows)
>
> Time: 0.494 ms
> 2012-02-09 19:19:52.748 jk=#
>
> In this situation the tuple can be regenerated from the index, but
> not from the index-satisfying the where clause, this allows significantly
> more complex where-clauses and may also benefit situations where
> we only going for one or more of the primary-key/foreing-key columns
> for join-conditions.

I don't understand what you're saying here.

> Above situation does not need to involve a gin-index, but a btree index
> where the where clause can be matched up using one index, and the tuple
> constructed using another falls into the same category.

That doesn't make sense to me.  If you probe index A for rows where a
= 1 and find that CTID (100,1) is such a row, and now want to return a
column value b that is not present in that index, the fastest way to
get the row is going to be to fetch block 100 from the heap and return
the data out of the first tuple.  To get the value out of some other
index that does include column b would require scanning the entire
index looking for that CTID, just so you could then grab the
corresponding index tuple, which wouldn't make any sense at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] When do we lose column names?

2012-02-09 Thread Andrew Dunstan




On 02/09/2012 02:54 PM, Tom Lane wrote:

Andrew Dunstan  writes:

OK, the one place that needs to be fixed to avoid the crash caused by
the json regression tests with the original patch is in
 src/backend/parser/parse_expr.c:transformRowExpr().
Other candidates I have found that don't set colnames and should
probably use dummy names are:
   * src/backend/parser/gram.y (row: production)
   * src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator()
   * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator()

Hm, can't the last two get real column names from somewhere?


Possibly. I'll dig a bit deeper.


Also, why
would it be the responsibility of the grammar production to fill in the
list, rather than transformRowExpr?





Sure. I'll just comment the source accordingly.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] When do we lose column names?

2012-02-09 Thread Tom Lane

Andrew Dunstan  writes:
> OK, the one place that needs to be fixed to avoid the crash caused by 
> the json regression tests with the original patch is in

> src/backend/parser/parse_expr.c:transformRowExpr().

> Other candidates I have found that don't set colnames and should 
> probably use dummy names are:

>   * src/backend/parser/gram.y (row: production)
>   * src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator()
>   * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator()

Hm, can't the last two get real column names from somewhere?  Also, why
would it be the responsibility of the grammar production to fill in the
list, rather than transformRowExpr?

> Given a function:
> extern List *makeDummyColnames(List * args);
> what's the best place to put it? I couldn't see a terribly obvious place 
> in the source.

If there are enough potential users to justify having such a global
function at all, then we might as well not bother but just let execQual
fill in dummy names on the fly.  Providing such a function means that
there is nothing whatever pushing writers of new RowExpr constructors to
not be lazy --- the path of least resistance will be to call
makeDummyColnames.  I was hoping that there would be few enough places
where no information is available that we'd not need to have a global
function like that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] When do we lose column names?

2012-02-09 Thread Alvaro Herrera


Excerpts from Andrew Dunstan's message of jue feb 09 16:38:27 -0300 2012:

> OK, the one place that needs to be fixed to avoid the crash caused by 
> the json regression tests with the original patch is in
> 
> src/backend/parser/parse_expr.c:transformRowExpr().
> 
> Other candidates I have found that don't set colnames and should 
> probably use dummy names are:
> 
>   * src/backend/parser/gram.y (row: production)
>   * src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator()
>   * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator()
> 
> 
> Given a function:
> 
> extern List *makeDummyColnames(List * args);
> 
> 
> what's the best place to put it? I couldn't see a terribly obvious place 
> in the source.

parse_relation.c?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] When do we lose column names?

2012-02-09 Thread Andrew Dunstan




On 02/07/2012 03:03 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 02/07/2012 12:47 PM, Tom Lane wrote:

In general I think we'd have to require that colnames be supplied in all
RowExprs if we go this way.  Anyplace that's trying to slide by without
will have to be fixed.  I don't recall how many places that is.

I just had a thought that maybe we could make this simpler by dummying
up a list of colnames if we don't have one, instead of that assertion.
Or am I on the wrong track.

Well, if there are more than one or two RowExpr creators for which a
dummy set of colnames is the best we can do anyway, that might be a
reasonable answer.  But I think it would encourage people to be lazy
and let the dummy colnames be used even when they can do better, so
I'd rather not take this as a first-choice solution.




OK, the one place that needs to be fixed to avoid the crash caused by 
the json regression tests with the original patch is in


   src/backend/parser/parse_expr.c:transformRowExpr().

Other candidates I have found that don't set colnames and should 
probably use dummy names are:


 * src/backend/parser/gram.y (row: production)
 * src/backend/optimizer/prep/prepunion.c:adjust_appendrel_attrs_mutator()
 * src/backend/optimizer/util/var.c:flatten_join_alias_vars_mutator()


Given a function:

   extern List *makeDummyColnames(List * args);


what's the best place to put it? I couldn't see a terribly obvious place 
in the source.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notify system doesn't recover from "No space" error

2012-02-09 Thread Kevin Grittner

"Kevin Grittner"  wrote:

> In continued digging through logs I found something to indicate
> the transaction on which COMMIT failed for 2528 of the failures. 
> In all cases the transaction made a change which would have fired
> a NOTIFY at commit (in a deferred trigger).

Arg.  Sorry, I got confused there.  No deferred trigger involved in
the triggered change notification; that is only in the replication
code.  I haven't seen anything to suggest that the replication
trigger is contributing to this, but I guess I couldn't rule it out,
either.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notify system doesn't recover from "No space" error

2012-02-09 Thread Kevin Grittner

Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> The application LISTENs on channel tcn and a trigger function is
>> attached to most permanent tables to NOTIFY for DML on that
>> channel.
>> ...
>> The report to us was that testers were unable to start the
>> application.  I believe that the above error on COMMIT kept the
>> application from getting past initial tests that the connection
>> was good.
> 
> OK, so it was issuing a LISTEN and the LISTEN was failing at
> commit?  (It's a bit hard to believe that you'd be triggering one
> of the NOTIFYs during "initial connection tests".)

In continued digging through logs I found something to indicate the
transaction on which COMMIT failed for 2528 of the failures.  In all
cases the transaction made a change which would have fired a NOTIFY
at commit (in a deferred trigger).

So that initial assumption about the cause doesn't hold up.  I'm not
sure at this point why the tester perception was that they couldn't
get in.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only quals vs. security_barrier views

2012-02-09 Thread Jesper Krogh


On 2012-02-09 18:02, Robert Haas wrote:

I don't have any appetite for trying to do anything more with
index-only scans for 9.2, though maybe someone else will think
otherwise.  But I would like very much to get KaiGai's leakproof stuff
committed, and so it seems like a good idea to reconcile the needs of
that machinery with what might eventually be needed here.

Those were a couple of nice cases where index-only-scans
could win more than they does today. I have another one here:

2012-02-09 19:17:28.788 jk=# \d testtable
  Table "public.testtable"
 Column |   Type   |   Modifiers
+--+
 id | integer  | not null default nextval('testtable_id_seq'::regclass)
 fts| tsvector |
Indexes:
"prk_idx" UNIQUE, btree (id)
"fts_id" gin (fts)

2012-02-09 19:19:39.054 jk=# explain select id from testtable where fts 
@@ to_tsquery('english','test1000');

  QUERY PLAN
---
 Bitmap Heap Scan on testtable  (cost=20.29..161.28 rows=37 width=4)
   Recheck Cond: (fts @@ '''test1000'''::tsquery)
   ->  Bitmap Index Scan on fts_id  (cost=0.00..20.28 rows=37 width=0)
 Index Cond: (fts @@ '''test1000'''::tsquery)
(4 rows)

Time: 0.494 ms
2012-02-09 19:19:52.748 jk=#

In this situation the tuple can be regenerated from the index, but
not from the index-satisfying the where clause, this allows significantly
more complex where-clauses and may also benefit situations where
we only going for one or more of the primary-key/foreing-key columns
for join-conditions.

Above situation does not need to involve a gin-index, but a btree index
where the where clause can be matched up using one index, and the tuple
constructed using another falls into the same category.


--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add new keywords SNAPSHOT and TYPES to the keyword list in gram.

2012-02-09 Thread Tom Lane

Alvaro Herrera  writes:
> Excerpts from Tom Lane's message of jue feb 09 12:17:59 -0300 2012:
>> If people are sufficiently worried about this, a better answer would be
>> to teach the makefiles to run check_keywords.pl during every build.

> FWIW that script is throwing a warning here:
> Use of assignment to $[ is deprecated at 
> /pgsql/source/HEAD/src/tools/check_keywords.pl line 19.

Any Perl hackers want to improve that script?

> Also, we should at least do the attached to simplify the process.
> +check:
> + $(top_srcdir)/src/tools/check_keywords.pl $(top_srcdir)

Actually, what would make sense in that line is to attach it to the
existing maintainer-check target, no?  I don't think top-level
"make check" descends into this directory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add new keywords SNAPSHOT and TYPES to the keyword list in gram.

2012-02-09 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue feb 09 12:17:59 -0300 2012:
> 
> Alvaro Herrera  writes:
> > Excerpts from Heikki Linnakangas's message of jue feb 09 06:42:12 -0300 
> > 2012:
> >> src/tools/check_keywords.pl is your friend.
> 
> > I wondered a couple of weeks ago if we could, instead, generate the
> > lists in gram.y from kwlist.h.
> 
> We've looked into that in the past.  bison does not seem to have any
> sort of "include" directive, which means the only way to do it would be
> to have the gram.y file be built from pieces at compile time.

Doh, bummer.

> Although
> Michael is successfully doing something of the sort for ecpg, I find it
> way too ugly in return for the amount of benefit we'd get.

Agreed.

> If people are sufficiently worried about this, a better answer would be
> to teach the makefiles to run check_keywords.pl during every build.

FWIW that script is throwing a warning here:
Use of assignment to $[ is deprecated at 
/pgsql/source/HEAD/src/tools/check_keywords.pl line 19.

Also, we should at least do the attached to simplify the process.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


check-keywords.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notify system doesn't recover from "No space" error

2012-02-09 Thread Tom Lane

"Kevin Grittner"  writes:
> The application LISTENs on channel tcn and a trigger function is
> attached to most permanent tables to NOTIFY for DML on that channel.
> ...
> The report to us was that testers were unable to start the
> application.  I believe that the above error on COMMIT kept the
> application from getting past initial tests that the connection was
> good.

OK, so it was issuing a LISTEN and the LISTEN was failing at commit?
(It's a bit hard to believe that you'd be triggering one of the NOTIFYs
during "initial connection tests".)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notify system doesn't recover from "No space" error

2012-02-09 Thread Kevin Grittner

Tom Lane  wrote:
> "Kevin Grittner"  writes:

>> [2012-02-05 01:30:36.952 CST] 16383 
>> ERROR:  could not access status of transaction 0
>> [2012-02-05 01:30:36.952 CST] 16383 
>> DETAIL:  Could not read from file "pg_notify/03A5" at offset
>> 253952: Success.

> The substantive issue is that you say it didn't resume normal
> behavior once space became available again.  Can you provide more
> info about that?  In particular, what behavior was being seen by
> the application?

The application LISTENs on channel tcn and a trigger function is
attached to most permanent tables to NOTIFY for DML on that channel.
(See the recently committed triggered_change_notification patch if
you want to see the trigger function.)  At this point it might be
hard to determine (at least without re-creating the problem) whether
it was only transactions which issued a NOTIFY, but the application
never really got very far because of COMMIT statements failing with
the above message.

The report to us was that testers were unable to start the
application.  I believe that the above error on COMMIT kept the
application from getting past initial tests that the connection was
good.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notify system doesn't recover from "No space" error

2012-02-09 Thread Tom Lane

"Kevin Grittner"  writes:
> We got three errors in the log with exactly this DETAIL (file,
> offset and message):

> [2012-02-05 01:27:36.878 CST] 14892  ERROR: 
> could not access status of transaction 0
> [2012-02-05 01:27:36.878 CST] 14892  DETAIL:
>  Could not write to file "pg_notify/03A5" at offset 188416: No space
> left on device.

> After that the message changed to a new offset and a message of
> "Success." for the failure:

> [2012-02-05 01:30:36.952 CST] 16383  ERROR: 
> could not access status of transaction 0
> [2012-02-05 01:30:36.952 CST] 16383  DETAIL:
>  Could not read from file "pg_notify/03A5" at offset 253952:
> Success.

IIRC, the "could not access status of transaction" bits are artifacts
stemming from the use of the SLRU infrastructure for pg_notify access.
Sometime we ought to think a bit harder about how to specialize SLRU's
error messages for the cases where what it's manipulating aren't XIDs.

Also, the "Could not read ...: Success" bit presumably ought to be
spelled "Could not read ...: unexpected EOF".  I thought we'd fixed
that most places, but we must've missed a spot in SLRU.

However, both of those points are just cosmetic.  The substantive issue
is that you say it didn't resume normal behavior once space became
available again.  Can you provide more info about that?  In particular,
what behavior was being seen by the application?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-09 Thread Peter Geoghegan

On 9 February 2012 17:16, Bruce Momjian  wrote:
>> Yes, I am. The main reason that the loops exist in those functions
>> (which is the only way that they substantially differ) is because they
>> each have to get the other keys through various ways that characterise
>> the tuple class that they encapsulate (index_getattr(),
>> heap_getattr(), etc).
>
> Does this help all types for sorting, including strings?

Yes, it does, though of course that's not expected to make too much
difference with text when the C locale isn't used, because the
comparisons are inherently much more expensive, and there isn't a
whole lot we can do about that, at least here.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add protransform for numeric, varbit, and temporal types

2012-02-09 Thread Robert Haas

On Thu, Feb 9, 2012 at 7:18 AM, Noah Misch  wrote:
> The code has a thinko; a given interval typmod ultimately implies a single
> point from which we truncate rightward.  The precision only matters if the
> range covers SECOND.  Thanks; the attached patch improves this.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-09 Thread Bruce Momjian

On Thu, Feb 09, 2012 at 03:36:23PM +, Peter Geoghegan wrote:
> On 9 February 2012 14:51, Robert Haas  wrote:
> > I'm not sure I entirely follow all this, but I'll look at the code
> > once you have it.  Are you saying that all the comparetup_yadda
> > functions are redundant to each other in the single-key case?
> 
> Yes, I am. The main reason that the loops exist in those functions
> (which is the only way that they substantially differ) is because they
> each have to get the other keys through various ways that characterise
> the tuple class that they encapsulate (index_getattr(),
> heap_getattr(), etc).

Does this help all types for sorting, including strings?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] index-only quals vs. security_barrier views

2012-02-09 Thread Robert Haas

When Heikki worked up his original index-only scan patches (which
didn't end up looking much like what eventually got committed), he had
the notion of an index-only qual.  That is, given a query like this:

select sum(1) from foo where substring(a,1,3) = 'abc';

We could evaluate the substring qual before performing the heap fetch,
and fetch the tuple from the heap only if the qual passes.  The
current code is capable of generating an index-only scan plan for this
query, but the MVCC visibility check always happens first: if the page
is all-visible, we go ahead and evaluate the qual using only the index
data, but if the page is not all-visible, we do the heap fetch first
and then check the qual only if the tuple is visible to our snapshot.
It would be nice to have the ability to do those checks in the other
order, in the case where the projected cost of checking the qual is
less than the projected cost of the heap fetch.  This would allow
index-only scans to win in more situations than they can right now,
because we'd conceivably avoid quite a bit of random I/O if the qual
is fairly selective and there are a decent number of visibility map
bits that are unset.

In fact, this technique might pay off even if we don't have a covering index:

select * from foo where substring(a,1,3) = 'abc';

If the expected selectivity of the qual is low and the index is a lot
smaller than the table, we might want to iterate through all the index
tuples in the entire index and fetch the heap tuples for only those
where the qual passes.  This would allow index-only scans - or
whatever term you want to use, since there's not much that's index
"only" about this case - to potentially win even when no
visibility-map bits are set at all.

Now, there's a fly in the ointment here, which is that applying
arbitrary user-defined functions to tuples that might not be visible
doesn't sound very safe.  The user-defined function in question might
perform some action based on those invisible tuples that has side
effects, which would be bad, because now we're violating MVCC
semantics.  Or it might throw an error, killing the scan dead on the
basis of the contents of some tuple that the scan shouldn't even see.
 However, there is certainly a class of functions for which this type
of optimization would be safe, and it's an awful lot like the set of
functions that can be safely pushed down through a security_barrier
view - namely, things that don't have side effects or throw errors.
So it's possible that the proleakproof flag KaiGai is proposing to add
to pg_proc could do double duty, serving also to identify when it's
safe to apply a qual to an index tuple when the corresponding heap
tuple might be invisible.  However, I have some reservations about
assuming that the two concepts are exactly the same.  For one thing,
people are inevitably going to want to cheat a little bit more here
than is appropriate for security views, and encouraging people to mark
things LEAKPROOF when they're really not is a security disaster
waiting to happen.  For another thing, there are some important cases
that this doesn't cover, like:

select * from foo where substring(a,1,3) like '%def%';

The like operator doesn't seem to be leakproof in the security sense,
because it can throw an error if the pattern is something like a
single backslash (ERROR:  LIKE pattern must not end with escape
character) and indeed it doesn't seem like it would be safe here
either if the pattern were stored in the table.  But if the pattern
were constant, it'd be OK, or almost OK: there's still the edge case
where the table contains invisible rows but no visible ones - whether
or not we complain about the pattern there ought to be the same as
whether or not we complain about it on a completely empty table.  If
we got to that point, then we might as well consider the qual
leakproof for security purposes under the same set of circumstances
we'd consider it OK to apply to possibly-invisible tuples.

I don't have any appetite for trying to do anything more with
index-only scans for 9.2, though maybe someone else will think
otherwise.  But I would like very much to get KaiGai's leakproof stuff
committed, and so it seems like a good idea to reconcile the needs of
that machinery with what might eventually be needed here.

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Notify system doesn't recover from "No space" error

2012-02-09 Thread Kevin Grittner

On a development system which wasn't being monitored very closely
I think we've uncovered a bug in the listen/notify behavior.  The
system was struggling with lack of disk space for about a week
before this surfaced, so it's probably not the most critical sort
of bug we could have, but it probably merits a fix.
 
We were getting all sorts of logging about the space problem, and
PostgreSQL soldiered bravely on, doing the best it could under such
adverse conditions.  (That's pretty impressive, really.)  A lot of
messages like these:
 
[2012-02-05 01:24:36.816 CST] 14950  ERROR: 
could not extend file "base/16401/46507734": wrote only 4096 of 8192
bytes at block 0
[2012-02-05 01:24:36.816 CST] 14950  HINT: 
Check free disk space.

[2012-02-05 01:25:40.643 CST] 5323 LOG:  could not write temporary
statistics file "pg_stat_tmp/pgstat.tmp": No space left on device

[2012-02-05 01:26:38.971 CST] 15065  ERROR: 
could not write to hash-join temporary file: No space left on device

I don't know how much of a clue this is, but less than a minute
before our first message about pg_notify problems, the statistics
messages went from complaining about not being able to *write* the
file to also complaining about not being able to *close* the file --
like this:
 
[2012-02-05 01:26:41.159 CST] 5323 LOG:  could not close temporary
statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
[2012-02-05 01:26:44.663 CST] 5323 LOG:  could not write temporary
statistics file "pg_stat_tmp/pgstat.tmp": No space left on device
[2012-02-05 01:26:45.705 CST] 16252 WARNING:  pgstat wait timeout
 
We got three errors in the log with exactly this DETAIL (file,
offset and message):
 
[2012-02-05 01:27:36.878 CST] 14892  ERROR: 
could not access status of transaction 0
[2012-02-05 01:27:36.878 CST] 14892  DETAIL:
 Could not write to file "pg_notify/03A5" at offset 188416: No space
left on device.
[2012-02-05 01:27:36.878 CST] 14892 
STATEMENT:  COMMIT
[2012-02-05 01:27:36.882 CST] 14892  LOG: 
disconnection: session time: 0:07:00.600 user=cc database=cc
host=127.0.0.1 port=35320
 
After that the message changed to a new offset and a message of
"Success." for the failure:
 
[2012-02-05 01:30:36.952 CST] 16383  ERROR: 
could not access status of transaction 0
[2012-02-05 01:30:36.952 CST] 16383  DETAIL:
 Could not read from file "pg_notify/03A5" at offset 253952:
Success.
[2012-02-05 01:30:36.952 CST] 16383 
STATEMENT:  COMMIT
[2012-02-05 01:30:36.956 CST] 16383  LOG: 
disconnection: session time: 0:01:59.537 user=cc database=cc
host=127.0.0.1 port=38931
 
There were 4591 messages over the course of five days with exactly
that filename and offset and a message of "Success.", all for
transaction 0.  About midway through that time disk space was freed
and all other symptoms cleared.  It really is pretty impressive that
PostgreSQL survived that and was able to resume normal operations in
all other respects once disk space was freed for it.  Only this one
feature stayed in a bad state.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-09 Thread Peter Geoghegan

On 9 February 2012 14:51, Robert Haas  wrote:
> I'm not sure I entirely follow all this, but I'll look at the code
> once you have it.  Are you saying that all the comparetup_yadda
> functions are redundant to each other in the single-key case?

Yes, I am. The main reason that the loops exist in those functions
(which is the only way that they substantially differ) is because they
each have to get the other keys through various ways that characterise
the tuple class that they encapsulate (index_getattr(),
heap_getattr(), etc).

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add new keywords SNAPSHOT and TYPES to the keyword list in gram.

2012-02-09 Thread Tom Lane

Alvaro Herrera  writes:
> Excerpts from Heikki Linnakangas's message of jue feb 09 06:42:12 -0300 2012:
>> src/tools/check_keywords.pl is your friend.

> I wondered a couple of weeks ago if we could, instead, generate the
> lists in gram.y from kwlist.h.

We've looked into that in the past.  bison does not seem to have any
sort of "include" directive, which means the only way to do it would be
to have the gram.y file be built from pieces at compile time.  Although
Michael is successfully doing something of the sort for ecpg, I find it
way too ugly in return for the amount of benefit we'd get.

If people are sufficiently worried about this, a better answer would be
to teach the makefiles to run check_keywords.pl during every build.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2012-02-09 Thread Simon Riggs

On Wed, Feb 8, 2012 at 2:05 PM, Noah Misch  wrote:
> On Wed, Feb 08, 2012 at 11:40:34AM +, Simon Riggs wrote:
>> On Wed, Feb 8, 2012 at 3:24 AM, Noah Misch  wrote:
>> > On Tue, Feb 07, 2012 at 08:58:59PM +, Simon Riggs wrote:
>> >> On Thu, Jan 26, 2012 at 8:20 PM, Noah Misch  wrote:
>> >> > This patch uses FPIs to guard against torn hint writes, even when the
>> >> > checksums are disabled. ?One could not simply condition them on the
>> >> > page_checksums setting, though. ?Suppose page_checksums=off and we're 
>> >> > hinting
>> >> > a page previously written with page_checksums=on. ?If that write tears,
>> >> > leaving the checksum intact, that block will now fail verification. ?A 
>> >> > couple
>> >> > of ideas come to mind. ?(a) Read the on-disk page and emit an FPI only 
>> >> > if the
>> >> > old page had a checksum. ?(b) Add a checksumEnableLSN field to 
>> >> > pg_control.
>> >> > Whenever the cluster starts with checksums disabled, set the field to
>> >> > InvalidXLogRecPtr. ?Whenever the cluster starts with checksums enabled 
>> >> > and the
>> >> > field is InvalidXLogRecPtr, set the field to the next LSN. ?When a 
>> >> > checksum
>> >> > failure occurs in a page with LSN older than the stored one, emit 
>> >> > either a
>> >> > softer warning or no message at all.
>> >>
>> >> We can only change page_checksums at restart (now) so the above seems 
>> >> moot.
>> >>
>> >> If we crash with FPWs enabled we repair any torn pages.
>> >
>> > There's no live bug, but that comes at a high cost: the patch has us emit
>> > full-page images for hint bit writes regardless of the page_checksums 
>> > setting.
>>
>> Sorry, I don't understand what you mean. I don't see any failure cases
>> that require that.
>>
>> page_checksums can only change at a shutdown checkpoint,
>>
>> The statement "If that write tears,
>> >> > leaving the checksum intact, that block will now fail verification."
>> cannot happen, ISTM.
>>
>> If we write out a block we update the checksum if page_checksums is
>> set, or we clear it. If we experience a torn page at crash, the FPI
>> corrects that, so the checksum never does fail verification. We only
>> need to write a FPI when we write with checksums.
>>
>> If that's wrong, please explain a failure case in detail.
>
> In the following description, I will treat a heap page as having two facts.
> The first is either CHKSUM or !CHKSUM, and the second is either HINT or !HINT.
> A torn page write lacking the protection of a full-page image can update one
> or both facts despite the transaction having logically updated both.  (This is
> just the normal torn-page scenario.)  Given that, consider the following
> sequence of events:
>
> 1. startup with page_checksums = on
> 2. update page P with facts CHKSUM, !HINT
> 3. clean write of P
> 4. clean shutdown
>
> 5. startup with page_checksums = off
> 6. update P with facts !CHKSUM, HINT.  no WAL since page_checksums=off
> 7. begin write of P
> 8. crash.  torn write of P only updates HINT.  disk state now CHKSUM, HINT
>
> 9. startup with page_checksums = off
> 10. crash recovery.  does not touch P, because step 6 generated no WAL
> 11. clean shutdown
>
> 12. startup with page_checksums = on
> 13. read P.  checksum failure
>
> Again, this cannot happen with checksum16_with_wallogged_hint_bits.v7.patch,
> because step 6 _does_ write WAL.  That ought to change for the sake of
> page_checksums=off performance, at which point we must protect the above
> scenario by other means.

Thanks for explaining.

Logic fixed.

>> >> > PageSetLSN() is not atomic, so the shared buffer content lock we'll be 
>> >> > holding
>> >> > is insufficient.
>> >>
>> >> Am serialising this by only writing PageLSN while holding buf hdr lock.
>> >
>> > That means also taking the buffer header spinlock in every PageGetLSN() 
>> > caller
>> > holding only a shared buffer content lock. ?Do you think that will pay off,
>> > versus the settled pattern of trading here your shared buffer content lock 
>> > for
>> > an exclusive one?
>>
>> Yes, I think it will pay off. This is the only code location where we
>> do that, and we are already taking the buffer header lock, so there is
>> effectively no overhead.
>
> The sites I had in the mind are the calls to PageGetLSN() in FlushBuffer()
> (via BufferGetLSN()) and XLogCheckBuffer().  We don't take the buffer header
> spinlock at either of those locations.  If they remain safe under the new
> rules, we'll need comments explaining why.  I think they may indeed be safe,
> but it's far from obvious.

OK, some slight rework required to do that, no problems.

I checked all other call sites and have updated README to explain.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-09 Thread Bruce Momjian

On Thu, Feb 09, 2012 at 07:24:49AM -0500, Noah Misch wrote:
> This patch has gotten more than its fair share of attention for bloat, and I
> think that's mostly because there's a dial-a-bloat-level knob sticking out and
> begging to be frobbed.

I already emailed Peter privately stating that he shouldn't feel bad
about the many critiques of his patch --- this dial-a-bloat-level knob
is new for us, and we have to get used to the idea.  We are more
discussing the idea of a dial-a-bloat-level knob rather than his patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-09 Thread Robert Haas

On Thu, Feb 9, 2012 at 9:37 AM, Peter Geoghegan  wrote:
> On 9 February 2012 13:50, Robert Haas  wrote:
>> I'm also more than slightly concerned that we are losing sight of the
>> forest for the trees.  I have heard reports that sorting large amounts
>> of data is many TIMES slower for us than for a certain other database
>> product.  I frankly wonder if this entire patch is a distraction.
>> When inlining is the tool of choice for further performance
>> improvement, it suggests that we're pretty much out of ideas, and that
>> whatever we get out of inlining - be it 6% or 30% - will be the end.
>> I don't like that idea very much.
>
> If you're talking about the product I think you're talking about,
> well, their contracts forbid any actual benchmarks, so we won't have
> any hard figures, but I find it intuitively difficult to believe that
> the difference is that large, mostly because I've demonstrated that
> the performance of qsort is a pretty good proxy for the performance of
> a query like "select * from foo order by bar", and qsort can't be too
> far from optimal. Besides, didn't you yourself say "I've been
> skeptical of this patch for a number of reasons, the major one of
> which is that most workloads spend only a very small amount of time in
> doing qucksorts"?

Emphasis on "large" amounts of data.

I've said from the beginning that the small-sort case is less
interesting to me: it's nice to make it faster, but nobody's going to
quit using PostgreSQL because a quicksort takes 8ms instead of 6ms.
It's when we start talking about operations that take minutes or hours
that the differences become material.

> I have a new plan. I think you'll like it:
>
> * drop the type specific specialisations entirely.

Check.

> * drop qsort_arg (the original, unspecialised version). We now have
> exactly the same number of source code copies of qsort as before: 2.

This may be useful elsewhere some day, but I suppose we can always
pull it back out of git if we need it.

> * gut the comparison of the leading sort key only from
> comparetup_heap, comparetup_index_btree, etc (I collectively refer to
> these functions as tuple-class encapsulating comparators, distinct
> from their comparator proper).
>
> * Instantiate two specialisations of qsort_arg: single key and
> multiple key (exactly the same number as has already been agreed to,
> since we lost the generic version). However, there is one key
> difference now: we call one of the tuple class encapsulating functions
> through a function pointer if the first comparison gives equality - .
> Early indications are that this is almost as much of a win as the
> single-key case. So the code effectively does this (but through a
> function pointer):
>
> #define MULTI_ADDITIONAL_CODE(COMPAR)                                         
>                                   \
> {                                                                             
>                                                                           \
>        return comparetup_heap(aT, bT, state);                                 
>                                  \
> }
>
> This works because most comparisons won't actually need to look at the
> second key, and because the cut-down tuple-class encapsulating
> comparator that is used in the last revision of the patch doesn't
> actually make any assumption about the particular tuple-class
> encapsulating comparator in use.
>
> It may not even be worth-while having a separate copy of the qsort
> function for the multiple key case, so the question of binary bloat
> becomes entirely irrelevant, as there would be a net gain of zero
> copies of qsort_arg.

I'm not sure I entirely follow all this, but I'll look at the code
once you have it.  Are you saying that all the comparetup_yadda
functions are redundant to each other in the single-key case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-02-09 Thread Bruce Momjian

On Wed, Feb 08, 2012 at 10:41:43PM -0500, Greg Smith wrote:
> Just trying to set the expectations bar realistically here.  I don't
> consider the easier problem of checkpoint smoothing a solved one,
> either.  Saying autovacuum needs to reach even that level of
> automation to be a useful improvement over now is a slippery goal.
> Regardless, the simple idea I submitted to this CF is clearly dead
> for now.  I'll take the feedback of "this level of change can live
> in a user-side tuning tool" and do that instead.  Since I was
> already thinking in the direction of background activity monitoring,
> I have a good idea how I'd need to approach this next, to be more
> likely to gain community buy-in as an automated improvement.  That's
> a longer term project though, which I'll hopefully be able to
> revisit for 9.3.

Totally agree.  If it is hard for us, it is super-hard to admins to set
this, so we had better give it serious thought.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-09 Thread Peter Geoghegan

On 9 February 2012 13:50, Robert Haas  wrote:
> I'm also more than slightly concerned that we are losing sight of the
> forest for the trees.  I have heard reports that sorting large amounts
> of data is many TIMES slower for us than for a certain other database
> product.  I frankly wonder if this entire patch is a distraction.
> When inlining is the tool of choice for further performance
> improvement, it suggests that we're pretty much out of ideas, and that
> whatever we get out of inlining - be it 6% or 30% - will be the end.
> I don't like that idea very much.

If you're talking about the product I think you're talking about,
well, their contracts forbid any actual benchmarks, so we won't have
any hard figures, but I find it intuitively difficult to believe that
the difference is that large, mostly because I've demonstrated that
the performance of qsort is a pretty good proxy for the performance of
a query like "select * from foo order by bar", and qsort can't be too
far from optimal. Besides, didn't you yourself say "I've been
skeptical of this patch for a number of reasons, the major one of
which is that most workloads spend only a very small amount of time in
doing qucksorts"?

I have a new plan. I think you'll like it:

* drop the type specific specialisations entirely.

* drop qsort_arg (the original, unspecialised version). We now have
exactly the same number of source code copies of qsort as before: 2.

* gut the comparison of the leading sort key only from
comparetup_heap, comparetup_index_btree, etc (I collectively refer to
these functions as tuple-class encapsulating comparators, distinct
from their comparator proper).

* Instantiate two specialisations of qsort_arg: single key and
multiple key (exactly the same number as has already been agreed to,
since we lost the generic version). However, there is one key
difference now: we call one of the tuple class encapsulating functions
through a function pointer if the first comparison gives equality - .
Early indications are that this is almost as much of a win as the
single-key case. So the code effectively does this (but through a
function pointer):

#define MULTI_ADDITIONAL_CODE(COMPAR)   
\
{   
\
return comparetup_heap(aT, bT, state);  
\
}

This works because most comparisons won't actually need to look at the
second key, and because the cut-down tuple-class encapsulating
comparator that is used in the last revision of the patch doesn't
actually make any assumption about the particular tuple-class
encapsulating comparator in use.

It may not even be worth-while having a separate copy of the qsort
function for the multiple key case, so the question of binary bloat
becomes entirely irrelevant, as there would be a net gain of zero
copies of qsort_arg.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-09 Thread Robert Haas

On Thu, Feb 9, 2012 at 7:24 AM, Noah Misch  wrote:
> On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote:
>> Second, there's a concern about binary bloat: duplicating lots of code
>> with different comparators inlined generates, well, a lot of machine
>> code.  Of course, an 0.8% increase in the size of the resulting binary
>> is very unlikely to produce any measurable performance regression on
>> its own, but that's not really the issue.  The issue is rather that we
>> could easily go nuts applying this technique in lots of different
>> places - or even just in one place to lots of different types.  Let's
>> say that we find that even for types with very complex comparators, we
>> can get a 5% speedup on quick-sorting speed using this technique.
>> Should we, then, apply the technique to every type we have?  Perhaps
>> the binary would grow by, I don't know, 10%.  Maybe that's still not
>> enough to start hurting performance (or making packagers complain),
>> but surely it's getting there, and what happens when we discover that
>> similar speedups are possible using the same trick in five other
>> scenarios?  I think it's a bit like going out to dinner and spending
>> $50.  It's nicer than eating at home, and many people can afford to do
>> it occasionally, but if you do it every night, it gets expensive (and
>> possibly fattening).
>>
>> So we need some principled way of deciding how much inlining is
>> reasonable, because I am 100% certain this is not going to be the last
>> time someone discovers that a massive exercise in inlining can yield a
>> nifty performance benefit in some case or another: index builds and
>> COPY have already been mentioned on this thread, and I expect that
>> people will find other cases as well.  I'm not really sure what the
>> "budget" is - i.e. how much binary bloat we can afford to add - or how
>> many cases there are that can benefit, but the first isn't infinite
>> and the second is more than the first.
>
> Having such a metric would resolve this discussion, but formulating it will be
> all but impossible.  We don't know how much time PostgreSQL installations now
> spend sorting or how much additional time they would spend on cache misses,
> TLB misses, page faults, etc.  That data won't show up on this thread.
>
> You posed[1] the straw man of a sin(numeric) optimization requiring a 40GB
> lookup table.  I would not feel bad about rejecting that, because it can live
> quite comfortably as an external module.  Indeed, I think the best thing we
> can do to constrain long-term bloat in the "postgres" executable is to improve
> pluggability.  Let a wider range of features live outside the postmaster
> binary.  For example, if we had the infrastructure to let hash, GIN, GiST and
> SP-GiST index access methods live in their own DSOs (like PL/pgSQL does
> today), I would support doing that.  Extensibility is a hallmark of
> PostgreSQL.  It's a bad day when we reject a minority-interest feature or
> optimization that has no other way to exist.
>
> Better pluggability can't ease the decision process for Peter's patch, because
> its specializations essentially must live in the "postgres" executable to live
> at all.  Nominally, we could let external modules inject sort specializations
> for core types, and Peter could publish an all_fast_sorts extension.  That
> would be useless punting: if we can't make a principled decision on whether
> these accelerated sorts justify 85 KiB of binary, how will a DBA who discovers
> the module make that decision?
>
> This patch has gotten more than its fair share of attention for bloat, and I
> think that's mostly because there's a dial-a-bloat-level knob sticking out and
> begging to be frobbed.  On my system, fastpath_sort_2012_01_19.patch adds 85
> KiB to the postgres binary.  A recent commit added 57 KiB and bloat never came
> up in discussions thereof.  I appreciate your concern over a slippery slope of
> inlining proposals, but I also don't wish to see the day when every patch
> needs to declare and justify its binary byte count.  Unless, of course, we
> discover that elusive metric for evaluating it fairly.
>
> If we'd like to start taking interest in binary bloat, how about having the
> buildfarm log capture an "ls -l" on the bin directory?  We'll then have data
> to mine if we ever wish to really take action.
>
>
> All that being said, I'd want to see a 15-30% (depending on how contrived)
> improvement to a microbenchmark or a 5% improvement to a generic benchmark
> (pgbench, DBT-) before adopting an optimization of this complexity.  Based
> on your measurements[2], per-type inlining gives us a 7% microbenchmark
> improvement.  I'd therefore excise the per-type inlining.  (For the record,
> given a 20% improvement to the same benchmark, I'd vote yea on the submission
> and perhaps even suggest int2 and uuid support.)

That's all well-said and I mostly agree with all of it, including your
suggested percentages in the last paragr

Re: [HACKERS] [COMMITTERS] pgsql: Add new keywords SNAPSHOT and TYPES to the keyword list in gram.

2012-02-09 Thread Alvaro Herrera


Excerpts from Heikki Linnakangas's message of jue feb 09 06:42:12 -0300 2012:
> Add new keywords SNAPSHOT and TYPES to the keyword list in gram.y
> 
> These were added to kwlist.h as unreserved keywords in separate patches,
> but authors forgot to add them to the corresponding list in gram.y.
> Because of that, even though they were supposed to be unreserved keywords,
> they could not be used as identifiers. src/tools/check_keywords.pl is your
> friend.

I wondered a couple of weeks ago if we could, instead, generate the
lists in gram.y from kwlist.h.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pg_receivexlog and sync rep Re: [HACKERS] Updated version of pg_receivexlog

2012-02-09 Thread Magnus Hagander

On Wed, Feb 8, 2012 at 19:39, Fujii Masao  wrote:
> On Tue, Oct 25, 2011 at 7:37 PM, Magnus Hagander  wrote:
>> On Mon, Oct 24, 2011 at 14:40, Magnus Hagander  wrote:
>>> On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
>>>  wrote:
 How does this interact with synchronous replication? If a base backup that
 streams WAL is in progress, and you have synchronous_standby_names set to
 '*', I believe the in-progress backup will count as a standby for that
 purpose. That might give a false sense of security.
>>>
>>> Ah yes. Did not think of that. Yes, it will have this problem.
>>
>> Actually, thinking more, per other mail, it won't. Because it will
>> never report that the data is synced to disk, so it will not be
>> considered for sync standby.
>
> Now, new replication mode (synchronous_commit = write) is supported.
> In this mode, the in-progress backup will be considered as sync
> standby because its periodic status report includes the valid write position.
> We should change the report so that it includes only invalid positions.
> Patch attached.

Agreed, applied.


> While I agree that the backup should not behave as sync standby, ISTM
> that pg_receivexlog should, which is very useful. If pg_receivexlog does
> so, we can write WAL synchronously in both local and remote, which
> would increase the durability of the system. Thus, to allow pg_receivexlog
> to behave as sync standby, we should change it so that its status report
> includes the write and flush positions?

Yes, that could be useful. I don't think it should do so by default,
though, but it could be useful to provide a commandline switc hto
pg_receivexlog allowing it to do this.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-09 Thread Noah Misch

On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote:
> Second, there's a concern about binary bloat: duplicating lots of code
> with different comparators inlined generates, well, a lot of machine
> code.  Of course, an 0.8% increase in the size of the resulting binary
> is very unlikely to produce any measurable performance regression on
> its own, but that's not really the issue.  The issue is rather that we
> could easily go nuts applying this technique in lots of different
> places - or even just in one place to lots of different types.  Let's
> say that we find that even for types with very complex comparators, we
> can get a 5% speedup on quick-sorting speed using this technique.
> Should we, then, apply the technique to every type we have?  Perhaps
> the binary would grow by, I don't know, 10%.  Maybe that's still not
> enough to start hurting performance (or making packagers complain),
> but surely it's getting there, and what happens when we discover that
> similar speedups are possible using the same trick in five other
> scenarios?  I think it's a bit like going out to dinner and spending
> $50.  It's nicer than eating at home, and many people can afford to do
> it occasionally, but if you do it every night, it gets expensive (and
> possibly fattening).
> 
> So we need some principled way of deciding how much inlining is
> reasonable, because I am 100% certain this is not going to be the last
> time someone discovers that a massive exercise in inlining can yield a
> nifty performance benefit in some case or another: index builds and
> COPY have already been mentioned on this thread, and I expect that
> people will find other cases as well.  I'm not really sure what the
> "budget" is - i.e. how much binary bloat we can afford to add - or how
> many cases there are that can benefit, but the first isn't infinite
> and the second is more than the first.

Having such a metric would resolve this discussion, but formulating it will be
all but impossible.  We don't know how much time PostgreSQL installations now
spend sorting or how much additional time they would spend on cache misses,
TLB misses, page faults, etc.  That data won't show up on this thread.

You posed[1] the straw man of a sin(numeric) optimization requiring a 40GB
lookup table.  I would not feel bad about rejecting that, because it can live
quite comfortably as an external module.  Indeed, I think the best thing we
can do to constrain long-term bloat in the "postgres" executable is to improve
pluggability.  Let a wider range of features live outside the postmaster
binary.  For example, if we had the infrastructure to let hash, GIN, GiST and
SP-GiST index access methods live in their own DSOs (like PL/pgSQL does
today), I would support doing that.  Extensibility is a hallmark of
PostgreSQL.  It's a bad day when we reject a minority-interest feature or
optimization that has no other way to exist.

Better pluggability can't ease the decision process for Peter's patch, because
its specializations essentially must live in the "postgres" executable to live
at all.  Nominally, we could let external modules inject sort specializations
for core types, and Peter could publish an all_fast_sorts extension.  That
would be useless punting: if we can't make a principled decision on whether
these accelerated sorts justify 85 KiB of binary, how will a DBA who discovers
the module make that decision?

This patch has gotten more than its fair share of attention for bloat, and I
think that's mostly because there's a dial-a-bloat-level knob sticking out and
begging to be frobbed.  On my system, fastpath_sort_2012_01_19.patch adds 85
KiB to the postgres binary.  A recent commit added 57 KiB and bloat never came
up in discussions thereof.  I appreciate your concern over a slippery slope of
inlining proposals, but I also don't wish to see the day when every patch
needs to declare and justify its binary byte count.  Unless, of course, we
discover that elusive metric for evaluating it fairly.

If we'd like to start taking interest in binary bloat, how about having the
buildfarm log capture an "ls -l" on the bin directory?  We'll then have data
to mine if we ever wish to really take action.

All that being said, I'd want to see a 15-30% (depending on how contrived)
improvement to a microbenchmark or a 5% improvement to a generic benchmark
(pgbench, DBT-) before adopting an optimization of this complexity.  Based
on your measurements[2], per-type inlining gives us a 7% microbenchmark
improvement.  I'd therefore excise the per-type inlining.  (For the record,
given a 20% improvement to the same benchmark, I'd vote yea on the submission
and perhaps even suggest int2 and uuid support.)

nm

[1] 
http://archives.postgresql.org/message-id/ca+tgmozo1xsz+yiqz2mrokmcmqtb+jir0lz43cne6de7--q...@mail.gmail.com
[2] 
http://archives.postgresql.org/message-id/ca+tgmobfz8sqtgtaaryerywuzfodgetprbuygbhsplr4+li...@mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pg

Re: [HACKERS] Add protransform for numeric, varbit, and temporal types

2012-02-09 Thread Noah Misch

On Wed, Feb 08, 2012 at 09:37:01AM -0500, Robert Haas wrote:
> On Tue, Feb 7, 2012 at 12:43 PM, Robert Haas  wrote:
> > I've committed the numeric and varbit patches and will look at the
> > temporal one next.
> 
> Committed, after changing the OIDs so they don't conflict.
> 
> Here's one more case for you to ponder:
> 
> rhaas=# create table noah (i interval day);
> CREATE TABLE
> rhaas=# alter table noah alter column i set data type interval second(3);
> DEBUG:  rewriting table "noah"
> ALTER TABLE
> 
> Do we really need a rewrite in that case?  The code acts like the
> interval range and precision are separate beasts, but is that really
> true?

The code has a thinko; a given interval typmod ultimately implies a single
point from which we truncate rightward.  The precision only matters if the
range covers SECOND.  Thanks; the attached patch improves this.
*** a/src/backend/utils/adt/timestamp.c
--- b/src/backend/utils/adt/timestamp.c
***
*** 958,963  interval_transform(PG_FUNCTION_ARGS)
--- 958,964 
int new_range = INTERVAL_RANGE(new_typmod);
int new_precis = 
INTERVAL_PRECISION(new_typmod);
int new_range_fls;
+   int old_range_fls;
  
if (old_typmod == -1)
{
***
*** 974,985  interval_transform(PG_FUNCTION_ARGS)
 * Temporally-smaller fields occupy higher positions in the 
range
 * bitmap.  Since only the temporally-smallest bit matters for 
length
 * coercion purposes, we compare the last-set bits in the 
ranges.
 */
new_range_fls = fls(new_range);
if (new_typmod == -1 ||
((new_range_fls >= SECOND ||
! new_range_fls >= fls(old_range)) &&
!(new_precis >= MAX_INTERVAL_PRECISION ||
  new_precis >= old_precis)))
ret = relabel_to_typmod(source, new_typmod);
}
--- 975,990 
 * Temporally-smaller fields occupy higher positions in the 
range
 * bitmap.  Since only the temporally-smallest bit matters for 
length
 * coercion purposes, we compare the last-set bits in the 
ranges.
+* Precision, which is to say, sub-second precision, only 
affects
+* ranges that include SECOND.
 */
new_range_fls = fls(new_range);
+   old_range_fls = fls(old_range);
if (new_typmod == -1 ||
((new_range_fls >= SECOND ||
! new_range_fls >= old_range_fls) &&
!(old_range_fls < SECOND ||
! new_precis >= MAX_INTERVAL_PRECISION ||
  new_precis >= old_precis)))
ret = relabel_to_typmod(source, new_typmod);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-02-09 Thread Fujii Masao

On Thu, Feb 9, 2012 at 7:25 PM, Fujii Masao  wrote:
> On Thu, Feb 9, 2012 at 3:32 AM, Jeff Janes  wrote:
>> On Wed, Feb 1, 2012 at 11:46 PM, Heikki Linnakangas
>>  wrote:
>>> On 31.01.2012 17:35, Fujii Masao wrote:

 On Fri, Jan 20, 2012 at 11:11 PM, Heikki Linnakangas
   wrote:
>
> On 20.01.2012 15:32, Robert Haas wrote:
>>
>>
>> On Sat, Jan 14, 2012 at 9:32 AM, Heikki Linnakangas
>>     wrote:
>>>
>>>
>>> Here's another version of the patch to make XLogInsert less of a
>>> bottleneck
>>> on multi-CPU systems. The basic idea is the same as before, but several
>>> bugs
>>> have been fixed, and lots of misc. clean up has been done.
>>
>>
>>
>> This seems to need a rebase.
>
>
>
> Here you go.


 The patch seems to need a rebase again.
>>>
>>>
>>> Here you go again. It conflicted with the group commit patch, and the patch
>>> to WAL-log and track changes to full_page_writes setting.
>>
>>
>> After applying this patch and then forcing crashes, upon recovery the
>> database is not correct.
>>
>> If I make a table with 10,000 rows and then after that intensively
>> update it using a unique key:
>>
>> update foo set count=count+1 where foobar=?
>>
>> Then after the crash there are less than 10,000 visible rows:
>>
>> select count(*) from foo
>>
>> This not a subtle thing, it happens every time.  I get counts of
>> between 1973 and 8827.  Without this patch I always get exactly
>> 10,000.
>>
>> I don't really know where to start on tracking this down.
>
> Similar problem happened on my test. When I executed CREATE TABLE and
> shut down the server with immediate mode, after recovery I could not see the
> created table. Here are the server log of recovery with wal_debug = on:
>
> LOG:  database system was interrupted; last known up at 2012-02-09 19:18:50 
> JST
> LOG:  database system was not properly shut down; automatic recovery in 
> progress
> LOG:  redo starts at 0/179CC90
> LOG:  REDO @ 0/179CC90; LSN 0/179CCB8: prev 0/179CC30; xid 0; len 4:
> XLOG - nextOid: 24576
> LOG:  REDO @ 0/179CCB8; LSN 0/179CCE8: prev 0/179CC90; xid 0; len 16:
> Storage - file create: base/12277/16384
> LOG:  REDO @ 0/179CCE8; LSN 0/179DDE0: prev 0/179CCB8; xid 998; len
> 21; bkpb1: Heap - insert: rel 1663/12277/12014; tid 7/22
> LOG:  there is no contrecord flag in log file 0, segment 1, offset 7987200
> LOG:  redo done at 0/179CCE8
>
> According to the log "there is no contrecord flag", ISTM the path treats the
> contrecord of backup block incorrectly, and which causes the problem.

Yep, as far as I read the patch, it seems to have forgotten to set
XLP_FIRST_IS_CONTRECORD flag.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-02-09 Thread Fujii Masao

On Thu, Feb 9, 2012 at 3:32 AM, Jeff Janes  wrote:
> On Wed, Feb 1, 2012 at 11:46 PM, Heikki Linnakangas
>  wrote:
>> On 31.01.2012 17:35, Fujii Masao wrote:
>>>
>>> On Fri, Jan 20, 2012 at 11:11 PM, Heikki Linnakangas
>>>   wrote:

 On 20.01.2012 15:32, Robert Haas wrote:
>
>
> On Sat, Jan 14, 2012 at 9:32 AM, Heikki Linnakangas
>     wrote:
>>
>>
>> Here's another version of the patch to make XLogInsert less of a
>> bottleneck
>> on multi-CPU systems. The basic idea is the same as before, but several
>> bugs
>> have been fixed, and lots of misc. clean up has been done.
>
>
>
> This seems to need a rebase.



 Here you go.
>>>
>>>
>>> The patch seems to need a rebase again.
>>
>>
>> Here you go again. It conflicted with the group commit patch, and the patch
>> to WAL-log and track changes to full_page_writes setting.
>
>
> After applying this patch and then forcing crashes, upon recovery the
> database is not correct.
>
> If I make a table with 10,000 rows and then after that intensively
> update it using a unique key:
>
> update foo set count=count+1 where foobar=?
>
> Then after the crash there are less than 10,000 visible rows:
>
> select count(*) from foo
>
> This not a subtle thing, it happens every time.  I get counts of
> between 1973 and 8827.  Without this patch I always get exactly
> 10,000.
>
> I don't really know where to start on tracking this down.

Similar problem happened on my test. When I executed CREATE TABLE and
shut down the server with immediate mode, after recovery I could not see the
created table. Here are the server log of recovery with wal_debug = on:

LOG:  database system was interrupted; last known up at 2012-02-09 19:18:50 JST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/179CC90
LOG:  REDO @ 0/179CC90; LSN 0/179CCB8: prev 0/179CC30; xid 0; len 4:
XLOG - nextOid: 24576
LOG:  REDO @ 0/179CCB8; LSN 0/179CCE8: prev 0/179CC90; xid 0; len 16:
Storage - file create: base/12277/16384
LOG:  REDO @ 0/179CCE8; LSN 0/179DDE0: prev 0/179CCB8; xid 998; len
21; bkpb1: Heap - insert: rel 1663/12277/12014; tid 7/22
LOG:  there is no contrecord flag in log file 0, segment 1, offset 7987200
LOG:  redo done at 0/179CCE8

According to the log "there is no contrecord flag", ISTM the path treats the
contrecord of backup block incorrectly, and which causes the problem.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-02-09 Thread Christoph Berg

Hi,

we have a database that is storing strings in various encodings (and
non-encodings, namely the arbitrary byte soup that you might see in
email headers from the internet). For this reason, the database uses
sql_ascii encoding. The columns are text, as most characters are
ascii, so bytea didn't seem the right way to go.

Currently we are on 8.3 and try to upgrade to 9.1, but the plperlu
functions we have are acting up.

Old behavior on 8.3 .. 9.0:

sql_ascii =# create or replace function whitespace(text) returns text
language plperlu as $$ $a = shift; $a =~ s/[\t ]+/ /g; return $a; $$;
CREATE FUNCTION

sql_ascii =# select whitespace (E'\200'); -- 0x80 is not valid utf-8
 whitespace 

 

sql_ascii =# select whitespace (E'\200')::bytea;
 whitespace 

 \x80

New behavior on 9.1.2:

sql_ascii =# select whitespace (E'\200');
ERROR:  XX000: Malformed UTF-8 character (fatal) at line 1.
KONTEXT:  PL/Perl function "whitespace"
ORT:  plperl_call_perl_func, plperl.c:2037

A crude workaround is:

sql_ascii =# create or replace function whitespace_utf8_off(text)
returns text language plperlu as $$ use Encode; $a = shift;
Encode::_utf8_off($a); $a =~ s/[\t ]+/ /g; return $a; $$;
CREATE FUNCTION

sql_ascii =# select whitespace_utf8_off (E'\200');
 whitespace_utf8_off 
-
 \u0080

sql_ascii =# select whitespace_utf8_off (E'\200')::bytea;
 whitespace_utf8_off 
-
 \xc280

(Note that the workaround is not perfect as the resulting 0x80..0xff
bytes are still tagged to be utf8.)


I think the bug is in plperl_helpers.h:

/*
 * Create a new SV from a string assumed to be in the current database's
 * encoding.
 */

static inline SV *
cstr2sv(const char *str)
{
SV *sv;
char   *utf8_str = utf_e2u(str);

sv = newSVpv(utf8_str, 0);
SvUTF8_on(sv);

pfree(utf8_str);

return sv;
}

In sql_ascii databases, utf_e2u does not do any recoding, but then
SvUTF8_on still marks the string as utf-8, while it isn't.

(Returned values might also need fixing.)

In my view, this is clearly a bug in pl/perl on sql_ascii databases.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


signature.asc
Description: Digital signature

41 matches

Mail list logo