Re: [HACKERS] visibility map

2010-11-23 Thread 高增琦
Can we just log the change of VM in log_heap_clean() for redo?
Thanks

--
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com


On Tue, Nov 23, 2010 at 3:24 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jun 14, 2010 at 1:19 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  I *think* that the answer to this parenthesized question is no.
  When we vacuum a page, we set the LSN on both the heap page and the
  visibility map page.  Therefore, neither of them can get written to
  disk until the WAL record is flushed, but they could get flushed in
  either order.  So the visibility map page could get flushed before the
  heap page, as the non-parenthesized portion of the comment indicates.
 
  Right.
 
  However, at least in theory, it seems like we could fix this up during
  redo.
 
  Setting a bit in the visibility map is currently not WAL-logged, but yes
  once we add WAL-logging, that's straightforward to fix.

 Eh, so.  Suppose - for the sake of argument - we do the following:

 1. Allocate an additional infomask(2) bit that means xmin is frozen,
 no need to call XidInMVCCSnapshot().  When we freeze a tuple, we set
 this bit in lieu of overwriting xmin.  Note that freezing pages is
 already WAL-logged, so redo is possible.

 2. Modify VACUUM so that, when the page is observed to be all-visible,
 it will freeze all tuples on the page, set PD_ALL_VISIBLE, and set the
 visibility map bit, writing a single XLOG record for the whole
 operation (possibly piggybacking on XLOG_HEAP2_CLEAN if the same
 vacuum already removed tuples; otherwise and/or when no tuples were
 removed writing XLOG_HEAP2_FREEZE or some new record type).  This
 loses no forensic information because of (1).  (If the page is NOT
 observed to be all-visible, we freeze individual tuples only when they
 hit the current age thresholds.)

 Setting the visibility map bit is now crash-safe.

 Please poke holes.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] ALTER OBJECT any_name SET SCHEMA name

2010-11-23 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Especially because you also posted some revs of the ALTER EXTENSION ..
 SET SCHEMA patch on this thread

Yes, I tried to answer where questions have been raised, and that's not
helping so much at review time. That's why I take the time to update the
commit fest application each time I send a new patch version.

 Do you still want me to prepare another patch for adding in the tests
 the set schema variants that already existed but are not yet covered?
 Which are the one you did spot, btw?

 [rhaas pgsql]$ git grep 'SET SCHEMA' src/test/regress/
 [rhaas pgsql]$

The existing 'set schema' tests are in lower case, so I just did it the
same, try with git grep -i maybe :)

grep -c 'set schema' ../postgresql-extension-patches/set_schema.v8.patch 
28

 Please do.  Tab completion support should really be included in the
 patch - adding it as a separate patch is better than not having it, of
 course.

Ok, will learn about this part of psql soon'ish, hopefully by today,
given a reasonable amount of other distractions.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-11-23 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@commandprompt.com writes:
 Hmm, the first thought that comes to mind is that the GucContext param
 to ParseConfigFile is unused and can be removed.  This is probably an
 oversight from when include files were introduced in 2006.

Thanks for caring about that part.

 I don't like the fact that this code handles custom_variable_classes
 internally.  I think this would be exposed to the parsing of extension
 control files, which is obviously wrong.

Well, in fact, not that much. The extension code has a special error
case when dealing with custom variables if the class hasn't been already
parsed, and what ParseConfigFile() is doing is pushing the
custom_variable_classes setting in front of the list.

guc-file.l says:
/*
 * This variable must be processed first as it controls
 * the validity of other variables; so it goes at the 
head
 * of the result list.  If we already found a value for 
it,
 * replace with this one.
 */

extension.c says:
ereport(ERROR,
(errmsg(Unsupported parameter '%s' in 
file: %s,
tok1, filename),
 errhint(Be sure to have 
'custom_variable_classes' set 
 in a line before any 
custom variable.)));

So if we don't change the code in ParseConfigFile() that will push
custom_variable_classes in front of the list, all I have to change in
the extension.c file is the error message.

I fail to see a future usage of custom_variable_classes where it
wouldn't help to have that in the list before any user setting that
depends on it.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map

2010-11-23 Thread Heikki Linnakangas

On 22.11.2010 21:24, Robert Haas wrote:

Eh, so.  Suppose - for the sake of argument - we do the following:

1. Allocate an additional infomask(2) bit that means xmin is frozen,
no need to call XidInMVCCSnapshot().  When we freeze a tuple, we set
this bit in lieu of overwriting xmin.  Note that freezing pages is
already WAL-logged, so redo is possible.

2. Modify VACUUM so that, when the page is observed to be all-visible,
it will freeze all tuples on the page, set PD_ALL_VISIBLE, and set the
visibility map bit, writing a single XLOG record for the whole
operation (possibly piggybacking on XLOG_HEAP2_CLEAN if the same
vacuum already removed tuples; otherwise and/or when no tuples were
removed writing XLOG_HEAP2_FREEZE or some new record type).  This
loses no forensic information because of (1).  (If the page is NOT
observed to be all-visible, we freeze individual tuples only when they
hit the current age thresholds.)

Setting the visibility map bit is now crash-safe.


That's an interesting idea. You pickyback setting the vm bit on the 
freeze WAL record, on the assumption that you have to write the freeze 
record anyway. However, if that assumption doesn't hold, because the 
tuples are deleted before they reach vacuum_freeze_min_age, it's no 
better than the naive approach of WAL-logging the vm bit set separately. 
Whether that's acceptable or not, I don't know.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-11-23 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@commandprompt.com writes:
 the handling of relative vs absolute paths is bogus here.  I think it'd
 make more sense to have a bool are we including; and if that's false and
 the path is not absolute, then the file is relative to CWD; or maybe we
 make it absolute by prepending PGDATA; maybe something else?  (need to
 think of something that makes sense for both recovery.conf and extension
 control files)

Current coding in extensions prepend any control or script file with
sharepath, so that we're only dealing with absolute filename here. The
idea is that it's no business for any other part of the code to have to
know where we decide to install control and script files.

My feeling is that when !is_absolute_path(config_file) and calling_file
is NULL we should make the config_file absolute by prepending PGDATA.
Please find that done in attached v4 of the cfparser patch.

 If that looks ok, do we want to add some documentation about the new
 lexer capabilities?

 beyond extra code comments?  probably not.

Great.

 Also, for what good reason would we want to prevent
 people from using the include facility?

 Not sure about this

Ok, nothing special here.
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 5024,5200  str_time(pg_time_t tnow)
  }
  
  /*
-  * Parse one line from recovery.conf. 'cmdline' is the raw line from the
-  * file. If the line is parsed successfully, returns true, false indicates
-  * syntax error. On success, *key_p and *value_p are set to the parameter
-  * name and value on the line, respectively. If the line is an empty line,
-  * consisting entirely of whitespace and comments, function returns true
-  * and *keyp_p and *value_p are set to NULL.
-  *
-  * The pointers returned in *key_p and *value_p point to an internal buffer
-  * that is valid only until the next call of parseRecoveryCommandFile().
-  */
- static bool
- parseRecoveryCommandFileLine(char *cmdline, char **key_p, char **value_p)
- {
- 	char	   *ptr;
- 	char	   *bufp;
- 	char	   *key;
- 	char	   *value;
- 	static char *buf = NULL;
- 
- 	*key_p = *value_p = NULL;
- 
- 	/*
- 	 * Allocate the buffer on first use. It's used to hold both the parameter
- 	 * name and value.
- 	 */
- 	if (buf == NULL)
- 		buf = malloc(MAXPGPATH + 1);
- 	bufp = buf;
- 
- 	/* Skip any whitespace at the beginning of line */
- 	for (ptr = cmdline; *ptr; ptr++)
- 	{
- 		if (!isspace((unsigned char) *ptr))
- 			break;
- 	}
- 	/* Ignore empty lines */
- 	if (*ptr == '\0' || *ptr == '#')
- 		return true;
- 
- 	/* Read the parameter name */
- 	key = bufp;
- 	while (*ptr  !isspace((unsigned char) *ptr) 
- 		   *ptr != '='  *ptr != '\'')
- 		*(bufp++) = *(ptr++);
- 	*(bufp++) = '\0';
- 
- 	/* Skip to the beginning quote of the parameter value */
- 	ptr = strchr(ptr, '\'');
- 	if (!ptr)
- 		return false;
- 	ptr++;
- 
- 	/* Read the parameter value to *bufp. Collapse any '' escapes as we go. */
- 	value = bufp;
- 	for (;;)
- 	{
- 		if (*ptr == '\'')
- 		{
- 			ptr++;
- 			if (*ptr == '\'')
- *(bufp++) = '\'';
- 			else
- 			{
- /* end of parameter */
- *bufp = '\0';
- break;
- 			}
- 		}
- 		else if (*ptr == '\0')
- 			return false;		/* unterminated quoted string */
- 		else
- 			*(bufp++) = *ptr;
- 
- 		ptr++;
- 	}
- 	*(bufp++) = '\0';
- 
- 	/* Check that there's no garbage after the value */
- 	while (*ptr)
- 	{
- 		if (*ptr == '#')
- 			break;
- 		if (!isspace((unsigned char) *ptr))
- 			return false;
- 		ptr++;
- 	}
- 
- 	/* Success! */
- 	*key_p = key;
- 	*value_p = value;
- 	return true;
- }
- 
- /*
   * See if there is a recovery command file (recovery.conf), and if so
   * read in parameters for archive recovery and XLOG streaming.
   *
!  * XXX longer term intention is to expand this to
!  * cater for additional parameters and controls
!  * possibly use a flex lexer similar to the GUC one
   */
  static void
  readRecoveryCommandFile(void)
  {
  	FILE	   *fd;
- 	char		cmdline[MAXPGPATH];
  	TimeLineID	rtli = 0;
  	bool		rtliGiven = false;
! 	bool		syntaxError = false;
  
  	fd = AllocateFile(RECOVERY_COMMAND_FILE, r);
  	if (fd == NULL)
  	{
  		if (errno == ENOENT)
! 			return;/* not there, so no archive recovery */
  		ereport(FATAL,
  (errcode_for_file_access(),
   errmsg(could not open recovery command file \%s\: %m,
  		RECOVERY_COMMAND_FILE)));
  	}
  
! 	/*
! 	 * Parse the file...
! 	 */
! 	while (fgets(cmdline, sizeof(cmdline), fd) != NULL)
! 	{
! 		char	   *tok1;
! 		char	   *tok2;
! 
! 		if (!parseRecoveryCommandFileLine(cmdline, tok1, tok2))
! 		{
! 			syntaxError = true;
! 			break;
! 		}
! 		if (tok1 == NULL)
! 			continue;
  
! 		if (strcmp(tok1, restore_command) == 0)
  		{
! 			recoveryRestoreCommand = pstrdup(tok2);
  			ereport(DEBUG2,
  	(errmsg(restore_command = '%s',
  			recoveryRestoreCommand)));
  		}
! 		else if 

Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-23 Thread Heikki Linnakangas

On 23.11.2010 12:09, Shigeru HANADA wrote:

On Sun, 21 Nov 2010 21:16:05 -0500
Robert Haasrobertmh...@gmail.com  wrote:
snip

Ultimately, we probably want and need to get this patch down to chunks
of less than 2000 lines each.  But for starters, it looks quite simple
to break this into three chunks: one for the PostgreSQL FDW, one for
the CSV FDW, and one for the core functionality.  I think that the CSV
FDW, like the PG FDW, should be a loadable module.


I've separated the patch into tree parts.  They have codes, documents
and tests within, and file_fdw and pgsql_fdw can be applied onto
fdw_core for each, or together.  I hope the separation helps the
review of the patches.


The docs need some work. The CREATE FOREIGN TABLE reference page seems 
to be copy-pasted from CREATE TABLE, because it talks about constraints 
and WITH/WITHOUT OIDS which surely don't apply to foreign tables.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, this time with a patch

2010-11-23 Thread Heikki Linnakangas

On 22.11.2010 03:35, Robert Haas wrote:

On Sun, Nov 21, 2010 at 8:10 PM, Itagaki Takahiro
itagaki.takah...@gmail.com  wrote:

On Wed, Oct 20, 2010 at 01:36, Dimitri Fontainedimi...@2ndquadrant.fr  wrote:

Ah yes, thinking it's an easy patch is not helping. Please find attached
a revised version of it.


I checked cfparser.v2.patch.

It exports the static parseRecoveryCommandFileLine() in xlog.c
as the global cfParseOneLine() in cfparser.c without modification.

It generates one warning, but it can be easily fixed.
  cfparser.c:34: warning: no previous prototype for 'cfParseOneLine'

Some discussions about the patch:

* Is cf the best name for the prefix? Less abbreviated forms might
  be less confusable. Personally, I prefer conf.

* Can we export ParseConfigFile() in guc-file.l rather than
  parseRecoveryCommandFileLine()? It can solve the issue that unquoted
  parameter values in recovery.conf are not recognized. Even if we
  won't merge them, just allowing unquoted values would be useful.


I'd really like to see postgresql.conf and recovery.conf parsing
merged, and I suspect, as Itagaki-san says, that postgresql.conf
parsing is the better model for any new code.


+1. There was unanimous agreement in the synchronous replication threads 
that recovery.conf should be parsed with the GUC parser. The current 
recovery.conf parser doesn't support escaping, for example.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-23 Thread Shigeru HANADA
On Tue, 23 Nov 2010 12:30:52 +0200
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 The docs need some work. The CREATE FOREIGN TABLE reference page seems 
 to be copy-pasted from CREATE TABLE, because it talks about constraints 
 and WITH/WITHOUT OIDS which surely don't apply to foreign tables.

Thanks for the comments.

The page you pointed has been edited for foreign table.  In current
design, OID system column and CHECK constraints are supported.

OID is supported to get oid from the source table (yes, it works only
for postgresql_fdw but it seems useful to support).  CHECK constraint
is supported to enable constraint exclusion.  In addition, TABLEOID is
supported to show which table is the actual source.

I agree that some kind of documents, such as How to create new FDW,
should be written.

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-11-23 Thread Craig Ringer

However, I am not clear what benefit we get from moving this into core
in 9.1.  If it's still going to require a full postmaster restart, the
GUC you have to change may as well be shared_preload_libraries as a
new one.


There's no reason it should require a postmaster restart. New backends 
spawned after the handler is turned on would enable it, and existing 
backends could be signalled to enable it as well.



on-by-default is what we gain. I think that's fairly big...


More than that. If a crash handler is in core, then:

- It can be inited much earlier, catching crashes that happen during 
loading of libraries and during later backend init; and


- It's basically free when the cost of shared library loading is 
removed, so it can be left on in production or even be on by default. I 
need to do some testing on this, but given the apparently near-zero cost 
of initing the handler I won't be surprised if testing a GUC to see if 
the handler should be on or not costs more than loading it does.


I still wouldn't support on-by-default because you'd need an automatic 
process to weed out old crash dumps or limit the number stored. That's a 
bigger job. So I think the admin should have to turn it on, but it'd be 
good to make it easy to turn on in production without a restart; I don't 
see why that'd be hard.


I'll try to put together a patch that does just that, time permitting. 
Things are kind of hectic ATM.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-11-23 Thread Craig Ringer

On 11/23/2010 01:30 AM, Tom Lane wrote:


I'm not really sure why we're even considering the notion of
back-patching this item.  Doing so would not fit with any past practice
or agreed-on project management practices, not even under our lax
standards for contrib (and I keep hearing people claim that contrib
is or should be as trustworthy as core, anyway).  Since when do we
back-patch significant features that have not been through a beta test
cycle?


I see no advantage to back-patching. It's easy to provide a drop-in 
binary DLL for older versions of Pg on Windows, who're the only people 
this will work for.


If the EDB folks saw value in it, they could bundle the DLL with updated 
point releases of the installer for Windows. No back-patching is necessary.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-11-23 Thread Craig Ringer

On 11/23/2010 01:46 AM, Tom Lane wrote:


* However, when storing it in crashdumps, I think the code would need
to create that directory if it does not exist, doesn't it?


If it didn't do so, then manual creation/removal of the directory could
be used as an on/off switch for the feature.


Yep. That's how I'd want to do it in 9.1 - test for the directory and 
use that to decide whether to set the handler during early backend 
startup. That way you don't need a GUC, and should be able to load it 
*very* early in backend startup.



I haven't looked at the patch but this
discussion makes it sound like the dumper is dependent on an
uncomfortably large amount of backend code being functional.  You need
to minimize the number of assumptions of that sort.


It doesn't need to have any backend code working, really; all it needs 
is a usable stack and the ability to allocate off the heap. It won't 
save you in total OOM situations, stack exhaustion, or severe stack 
corruption, but should work pretty much any other time.


The crash dump facility in dbghelp.dll offers *optional* features where 
apps can stream in additional app-specific data like recent log 
excerpts, etc. I didn't try to implement anything like that in the 
initial module partly because I want to minimize the amount of the 
backend that must be working for a crash dump to succeed.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-11-23 Thread Craig Ringer

On 11/23/2010 01:56 AM, Heikki Linnakangas wrote:

On 22.11.2010 19:47, Robert Haas wrote:

I am as conservative about back-patching as anybody here, but
debugging on Windows is not an easy thing to do, and I strongly
suspect we are going to point people experiencing crashes on Windows
to this code whether it's part of our official distribution or not.


This whole thing makes me wonder: is there truly no reliable, simple
method to tell Windows to create a core dump on crash, without writing
custom code for it. I haven't seen one, but I find it amazing if there
isn't. We can't be alone with this.


Search for 'dbghelp.dll' on your average Windows system and you'll be 
surprised how many apps use it. Steam (the software distribution system) 
does, as does the Adobe Creative Suite on my machine.


If you're running in interactive mode with access to the user's display 
you can use Windows error reporting. Services running in isolated user 
accounts don't seem to be able to use that. In any case, windows error 
reporting only collects *tiny* dumps with barely anything beyond the 
stack contents; they're a nightmare to use, and really require psychic 
powers and deep knowledge of scary Windows APIs for effective debugging.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST seems to drop left-branch leaf tuples

2010-11-23 Thread Peter Tanski
Thanks for the advice.  I ran a row-by-row test, including debug output.  I'll 
put a test case together as well but I believe I have narrowed down the problem 
somewhat.  The first split occurrs when the 6th row is inserted and there are 6 
calls to Compress(), however picksplit only receives 4 of those 6 tuples and 
the other two are dropped.

postgres=# \i xaa
psql:xaa:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat   
---
 Number of levels:  1 +
 Number of pages:   1 +
 Number of leaf pages:  1 +
 Number of tuples:  1 +
 Number of invalid tuples:  0 +
 Number of leaf tuples: 1 +
 Total size of tuples:  1416 bytes+
 Total size of leaf tuples: 1416 bytes+
 Total size of index:   8192 bytes+

postgres=# \i xab
psql:xab:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat   
---
 Number of levels:  1 +
 Number of pages:   1 +
 Number of leaf pages:  1 +
 Number of tuples:  2 +
 Number of invalid tuples:  0 +
 Number of leaf tuples: 2 +
 Total size of tuples:  2820 bytes+
 Total size of leaf tuples: 2820 bytes+
 Total size of index:   8192 bytes+

postgres=# \i xac
psql:xac:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat   
---
 Number of levels:  1 +
 Number of pages:   1 +
 Number of leaf pages:  1 +
 Number of tuples:  3 +
 Number of invalid tuples:  0 +
 Number of leaf tuples: 3 +
 Total size of tuples:  4224 bytes+
 Total size of leaf tuples: 4224 bytes+
 Total size of index:   8192 bytes+

postgres=# \i xad
psql:xad:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat   
---
 Number of levels:  1 +
 Number of pages:   1 +
 Number of leaf pages:  1 +
 Number of tuples:  4 +
 Number of invalid tuples:  0 +
 Number of leaf tuples: 4 +
 Total size of tuples:  5628 bytes+
 Total size of leaf tuples: 5628 bytes+
 Total size of index:   8192 bytes+

postgres=# \i xae
psql:xae:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat   
---
 Number of levels:  1 +
 Number of pages:   1 +
 Number of leaf pages:  1 +
 Number of tuples:  5 +
 Number of invalid tuples:  0 +
 Number of leaf tuples: 5 +
 Total size of tuples:  7032 bytes+
 Total size of leaf tuples: 7032 bytes+
 Total size of index:   8192 bytes+

postgres=# \i xaf
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_picksplit:660] entered picksplit
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_picksplit:812] split: 2 left, 2 right
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat

 Number of levels:  2  +
 Number of pages:   3  +
 Number of leaf pages:  2  +
 Number of tuples:  6  +
 Number of invalid tuples:  0  +
 Number of leaf tuples: 4  +
 Total size of tuples:  8460 bytes +
 Total size of leaf tuples: 5640 bytes +
 Total size of index:   24576 bytes+

postgres=# 

There are checks inside the Picksplit() function for the number of entries:

  OffsetNumber maxoff = entryvec-n - 1;
  int n_entries, j;
  n_entries = Max(maxoff, 1) - 1;

  j = 0;
  for (i = FirstOffsetNumber; i  maxoff; i = OffsetNumberNext(i)) {
FPrint* v = deserialize_fprint(entv[i].key);
if 

Re: [HACKERS] GiST seems to drop left-branch leaf tuples

2010-11-23 Thread Alvaro Herrera
Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300 2010:

 There are checks inside the Picksplit() function for the number of entries:
 
   OffsetNumber maxoff = entryvec-n - 1;
   int n_entries, j;
   n_entries = Max(maxoff, 1) - 1;
 
   j = 0;
   for (i = FirstOffsetNumber; i  maxoff; i = OffsetNumberNext(i)) {
 FPrint* v = deserialize_fprint(entv[i].key);

Isn't this off by one?  Offset numbers are 1-based, so the maxoff
computation is wrong.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map

2010-11-23 Thread Robert Haas
On Tue, Nov 23, 2010 at 3:42 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 That's an interesting idea. You pickyback setting the vm bit on the freeze
 WAL record, on the assumption that you have to write the freeze record
 anyway. However, if that assumption doesn't hold, because the tuples are
 deleted before they reach vacuum_freeze_min_age, it's no better than the
 naive approach of WAL-logging the vm bit set separately. Whether that's
 acceptable or not, I don't know.

I don't know, either.  I was trying to think of the cases where this
would generate a net increase in WAL before I sent the email, but
couldn't fully wrap my brain around it at the time.  Thanks for
summarizing.

Here's another design to poke holes in:

1. Imagine that the visibility map is divided into granules.  For the
sake of argument let's suppose there are 8K bits per granule; thus
each granule covers 64M of the underlying heap and 1K of space in the
visibility map itself.

2. In shared memory, create a new array called the visibility vacuum
array (VVA), each element of which has room for a backend ID, a
relfilenode, a granule number, and an LSN.  Before setting bits in the
visibility map, a backend is required to allocate a slot in this
array, XLOG the slot allocation, and fill in its backend ID,
relfilenode number, and the granule number whose bits it will be
manipulating, plus the LSN of the slot allocation XLOG record.  It
then sets as many bits within that granule as it likes.  When done, it
sets the backend ID of the VVA slot to InvalidBackendId but does not
remove it from the array immediately; such a slot is said to have been
released.

3. When visibility map bits are set, the LSN of the page is set to the
new-VVA-slot XLOG record, so that the visibility map page can't hit
the disk before the new-VVA-slot XLOG record.  Also, the contents of
the VVA, sans backend IDs, are XLOG'd at each checkpoint.  Thus, on
redo, we can compute a list of all VVA slots for which visibility-bit
changes might already be on disk; we go through and clear both the
visibility map bit and the PD_ALL_VISIBLE bits on the underlying
pages.

4. To free a VVA slot that has been released, we must xlogflush as far
as the record that allocated the slot and sync the visibility map and
heap segments containing that granule.  Thus, all slots released
before a checkpoint starts can be freed after it completes.
Alternatively, an individual backend can free a previously-released
slot by perfoming the xlog flush and syncs itself.  (This might
require a few more bookkeeping details to be stored in the VVA, but it
seems manageable.)

One problem with this design is that the visibility map bits never get
set on standby servers.  If we don't XLOG setting the bit then I
suppose that doesn't happen now either, but it's more sucky (that's
the technical term) if you're relying on it for index-only scans
(which are also relevant on the standby, either during HS or if
promoted) versus if you're only relying on it for vacuum (which
doesn't happen on the standby anyway unless and until it's promoted).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-11-23 Thread Robert Haas
On Tue, Nov 23, 2010 at 2:33 AM, Kris Jurka bo...@ejurka.com wrote:


 On Mon, 22 Nov 2010, Itagaki Takahiro wrote:

 On Fri, Oct 15, 2010 at 03:40, Rados?aw Smogura
 rsmog...@softperience.eu wrote:

 Regarding JDBC in the CF process -- other interfaces are handled
 there.  I haven't seen one patch this size for JDBC since I've been
 involved, let alone two competing patches to implement the same
 feature.  Small patches which can be quickly handled don't make sense
 to put into the process, but it seemed reasonable for these.

 In any way I'm sending this patch, and I will put this under
 Miscellaneous in
 CF. This cleared patch takes only 47k (in uncleared was some binary read
 classes) and about 50% it's big test case.

 I changed the patch's topic to JDBC.
 https://commitfest.postgresql.org/action/patch_view?id=399


 I don't think it makes sense to try to manage anything other than core code
 in the commitfest app.  The other patch touched the backend, so it made
 sense to put it in the commitfest, but as far as I understand it, this one
 is pure Java code.  There is a backlog of JDBC issues to deal with, but I
 think it needs its own commitfest instead of trying to tack on to the main
 project's.

We could have separate JDBC CommitFests inside the app if that's
helpful - the CommitFests are by convention named -MM, but the app
will support arbitrary names.  The only problem I see is that it would
mess up the calculation of the currently open CF and the currently
in progress CF and the most recently closed CF.  I'd be willing to
put in the work to fix that, though, if you guys want to use the app
too.

For now I suggest we mark this Returned with Feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-11-23 Thread Robert Haas
On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Nov 22, 2010 at 8:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On balance I'm inclined to leave the unique key as per previous proposal
 (with a purpose column) and add the which-sort-order-is-that
 information as payload columns that aren't part of the key.

 This is probably OK too, although I confess I'm a lot less happy about
 it now that you've pointed out the need for those payload columns.

 The reason I said columns is that I can foresee eventually wanting to
 specify a pathkey in its entirety --- opfamily, asc/desc, nulls_first,
 and whatever we come up with for collation.  We don't currently need to
 store more than the opfamily, since the others can never need to have
 non-default values given the current implementation of KNNGIST.  But the
 idea that they might all be there eventually makes me feel that we don't
 want to try to incorporate this data in pg_amop's unique key.  I'm
 satisfied to say that only one sort order can be associated with a
 particular operator in a particular opclass, which is what would be
 implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component.

Does that imply that KNNGIST would only be able to support one
ordering per AMOP_ORDER-operator, or does it imply that each such
ordering would require a separate strategy number?  The second might
be OK, but the first sounds bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-11-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm satisfied to say that only one sort order can be associated with a
 particular operator in a particular opclass, which is what would be
 implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component.

 Does that imply that KNNGIST would only be able to support one
 ordering per AMOP_ORDER-operator, or does it imply that each such
 ordering would require a separate strategy number?  The second might
 be OK, but the first sounds bad.

It would be the first, because simply assigning another strategy number
only satisfies one of the unique constraints on pg_amop.  To allow
arbitrary flexibility here, we would have to include all components of
the ordering specification in the unique constraint that's presently
just (amopopr, amopfamily) and is proposed to become
(amopopr, amopfamily, amoppurpose).  I think that's an undue amount of
complexity to support something that's most likely physically impossible
from the index's standpoint anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-23 Thread Greg Stark
On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 No, especially if it results in queries that used to work breaking,
 which it well could.  But I'm not sure where to go with it from there,
 beyond throwing up my hands.

 Well, that's why there's been no movement on this since 2004 :-(.  The
 amount of work needed for a better solution seems far out of proportion
 to the benefits.

We could extend the existing logic to handle multi-bytes characters
though, couldn't we? It's not going to fix all the problems but at
least it'll do something sane.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-11-23 Thread Robert Haas
On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm satisfied to say that only one sort order can be associated with a
 particular operator in a particular opclass, which is what would be
 implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component.

 Does that imply that KNNGIST would only be able to support one
 ordering per AMOP_ORDER-operator, or does it imply that each such
 ordering would require a separate strategy number?  The second might
 be OK, but the first sounds bad.

 It would be the first, because simply assigning another strategy number
 only satisfies one of the unique constraints on pg_amop.  To allow
 arbitrary flexibility here, we would have to include all components of
 the ordering specification in the unique constraint that's presently
 just (amopopr, amopfamily) and is proposed to become
 (amopopr, amopfamily, amoppurpose).  I think that's an undue amount of
 complexity to support something that's most likely physically impossible
 from the index's standpoint anyway.

Or, you'd need to pass these details separately to the AM, which seems
like a more more flexible way of doing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-11-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It would be the first, because simply assigning another strategy number
 only satisfies one of the unique constraints on pg_amop.  To allow
 arbitrary flexibility here, we would have to include all components of
 the ordering specification in the unique constraint that's presently
 just (amopopr, amopfamily) and is proposed to become
 (amopopr, amopfamily, amoppurpose).  I think that's an undue amount of
 complexity to support something that's most likely physically impossible
 from the index's standpoint anyway.

 Or, you'd need to pass these details separately to the AM, which seems
 like a more more flexible way of doing it.

A more flexible way of doing what?  The first requirement here is that
the catalog entries provide sufficient information to determine the
semantics.  You can't just say this opclass supports ordering, you
have to specify what that ordering is.  Punting to the index AM helps
not at all, unless your proposal is to hard-wire this in GIST rather
than in the core planner.

We will probably *also* want to pass these details explicitly to the
index AM, but that doesn't solve the problem that some catalog somewhere
has to say what it is that the opclass can do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST seems to drop left-branch leaf tuples

2010-11-23 Thread Peter Tanski
I should correct what I just wrote: the first and last entries in  
entryvec-vector are invalid.


On Nov 23, 2010, at 11:39 AM, Peter Tanski wrote:

Picksplit() seems to be an exceptional case here: the first and last  
numbers in entryvec are invalid so


entryvec-vector[entryvec-n - 1]

is invalid.  All the other GiST code Picksplit() functions use the  
same convention.  For example, see the btree_gist picksplit  
function, at

http://doxygen.postgresql.org/btree__utils__num_8c-source.html#l00241

OffsetNumber i,
 maxoff = entryvec-n - 1;


On Nov 23, 2010, at 10:22 AM, Alvaro Herrera wrote:

Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300  
2010:


There are checks inside the Picksplit() function for the number of  
entries:


OffsetNumber maxoff = entryvec-n - 1;
int n_entries, j;
n_entries = Max(maxoff, 1) - 1;

j = 0;
for (i = FirstOffsetNumber; i  maxoff; i = OffsetNumberNext(i)) {
  FPrint* v = deserialize_fprint(entv[i].key);


Isn't this off by one?  Offset numbers are 1-based, so the maxoff
computation is wrong.

--
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-11-23 Thread Tom Lane
I wrote:
 We will probably *also* want to pass these details explicitly to the
 index AM, but that doesn't solve the problem that some catalog somewhere
 has to say what it is that the opclass can do.

... although having said that, the obvious question is why that catalog
has to be pg_amop.  Maybe we should leave pg_amop alone (so that it
represents only search operators) and invent a new catalog pg_amorderop.
I envision it having the same columns as pg_amop, plus an ordering
opclass OID (and maybe we might as well stick in asc/desc and nulls_first).
The reason to do this instead of just adding those columns to pg_amop
is that then we can have a different set of unique indexes.  I'm
thinking about (amopfamily, amoplefttype, amoprighttype, amopstrategy),
which would be the same as in pg_amop, plus
(amopopr, amopfamily, amopstrategy).  This would allow a single operator
to be registered under multiple strategy numbers, which presumably would
carry different payload sort-order-specification columns.

This still seems like overkill to me, because I don't actually believe
that it's practical for an index to support multiple sort orders.
But if anyone would like to make an argument why that's not pie in the
sky, this might be the way to represent it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-11-23 Thread Robert Haas
On Tue, Nov 23, 2010 at 11:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It would be the first, because simply assigning another strategy number
 only satisfies one of the unique constraints on pg_amop.  To allow
 arbitrary flexibility here, we would have to include all components of
 the ordering specification in the unique constraint that's presently
 just (amopopr, amopfamily) and is proposed to become
 (amopopr, amopfamily, amoppurpose).  I think that's an undue amount of
 complexity to support something that's most likely physically impossible
 from the index's standpoint anyway.

 Or, you'd need to pass these details separately to the AM, which seems
 like a more more flexible way of doing it.

 A more flexible way of doing what?  The first requirement here is that
 the catalog entries provide sufficient information to determine the
 semantics.  You can't just say this opclass supports ordering, you
 have to specify what that ordering is.  Punting to the index AM helps
 not at all, unless your proposal is to hard-wire this in GIST rather
 than in the core planner.

Eh, let's just do it the way you want to do it.  It's probably going
to have to be redone the next time somebody wants to make an
enhancement in this area, but I guess it's going to be easy to do that
then than to figure where to go with it now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST seems to drop left-branch leaf tuples

2010-11-23 Thread Peter Tanski
Picksplit() seems to be an exceptional case here: the first and last  
numbers in entryvec are invalid so


entryvec-vector[entryvec-n - 1]

is invalid.  All the other GiST code Picksplit() functions use the  
same convention.  For example, see the btree_gist picksplit function, at

http://doxygen.postgresql.org/btree__utils__num_8c-source.html#l00241

OffsetNumber i,
  maxoff = entryvec-n - 1;


On Nov 23, 2010, at 10:22 AM, Alvaro Herrera wrote:

Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300  
2010:


There are checks inside the Picksplit() function for the number of  
entries:


 OffsetNumber maxoff = entryvec-n - 1;
 int n_entries, j;
 n_entries = Max(maxoff, 1) - 1;

 j = 0;
 for (i = FirstOffsetNumber; i  maxoff; i = OffsetNumberNext(i)) {
   FPrint* v = deserialize_fprint(entv[i].key);


Isn't this off by one?  Offset numbers are 1-based, so the maxoff
computation is wrong.

--
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] NLS builds on windows and lc_messages

2010-11-23 Thread Stefan Kaltenbrunner

So I'm still trying to look into:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg00686.php


which is still broken in 8.4-STABLE even after toms patch, to get better 
debugging I was trying to get non-translated error messages in the log 
which seems to be completely impossible in windows(neither through 
postgresql.conf nor in a session - at least on my custom build).



* lc_messages on windows just excepts anything (including ridiculous 
large values with escapes and all that - which might actually be a 
problem) but does not seem to do anything
* the docs claim If you want the system to behave as if it had no 
locale support, use the special locale C or POSIX.  - setting it to 'C' 
on windows does not work either
* if that stuff is not working at all on windows we clearly need to add 
an appropriate sentence or two in the docs - but this seems like a 
really annoying and stupid limitation




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-23 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that's why there's been no movement on this since 2004 :-(.  The
 amount of work needed for a better solution seems far out of proportion
 to the benefits.

 We could extend the existing logic to handle multi-bytes characters
 though, couldn't we? It's not going to fix all the problems but at
 least it'll do something sane.

Not easily, cheaply, or portably.  The closest you could get in that
line would be to use towlower(), which doesn't exist everywhere
(though I grant probably most platforms have it by now).  The much much
bigger problem though is that we don't know what character representation
towlower() deals in.  We recently kluged the regex code to assume that
the wchar_t representation for UTF8 locales is the standardized Unicode
code point.  I haven't heard of that breaking, but 9.0 hasn't been out
that long.  In other multibyte encodings we have no idea how to use that
function, short of invoking mbstowcs/wcstombs or local equivalent, which
is expensive and doesn't readily allow a short-circuit for ASCII.

And, after you've hacked your way through all that, you still end up
with case-folding behavior that depends on the prevailing locale.
Which is dangerous for the previously cited reasons, and arguably not
spec-compliant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-23 Thread Andrew Dunstan



On 11/23/2010 11:14 AM, Greg Stark wrote:

On Mon, Nov 22, 2010 at 12:38 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

No, especially if it results in queries that used to work breaking,
which it well could.  But I'm not sure where to go with it from there,
beyond throwing up my hands.

Well, that's why there's been no movement on this since 2004 :-(.  The
amount of work needed for a better solution seems far out of proportion
to the benefits.

We could extend the existing logic to handle multi-bytes characters
though, couldn't we? It's not going to fix all the problems but at
least it'll do something sane.


What casing rules will you apply? How will you know what is an upper 
case character and what its lower case character is? The sad, short 
answer is that there are no simple rules beyond ASCII. See the URL I 
posted upthread.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-23 Thread Greg Stark
On Tue, Nov 23, 2010 at 5:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 And, after you've hacked your way through all that, you still end up
 with case-folding behavior that depends on the prevailing locale.
 Which is dangerous for the previously cited reasons, and arguably not
 spec-compliant.


So I thought the problem with the Turkish locale definition was that
it redefined how a capital ascii character which was present in
standard SQL identifiers was lowercased. Resulting in standard SQL
syntax not working.

I'm not sure I understand the danger if a user creates an object in a
database with a particular encoding and locale using that locale for
downcasing in the future. We don't currently support changing the
locale of a database or using different locales in the future. Even
with Peter's patch I think we can reasonably require the user to
specify a single locale which controls how the SQL identifiers are
interpreted regardless of the collations used in the operations.

The points about the C API being limited and nonportable are a
different issue.I guess I would need to do research to see if we're
missing something which would help here. Otherwise I might be
beginning to see the value in that /other/ library which I've argued
against in the past.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-23 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 I'm not sure I understand the danger if a user creates an object in a
 database with a particular encoding and locale using that locale for
 downcasing in the future.

The case I was worried about is dumping from one database and reloading
into another one with a different locale.  Although I suppose there are
enough *other* reasons why that might fail that adding changes of
downcasing behavior might not be a big deal.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-23 Thread Heikki Linnakangas

On 23.11.2010 14:22, Shigeru HANADA wrote:

On Tue, 23 Nov 2010 12:30:52 +0200
Heikki Linnakangasheikki.linnakan...@enterprisedb.com  wrote:

The docs need some work. The CREATE FOREIGN TABLE reference page seems
to be copy-pasted from CREATE TABLE, because it talks about constraints
and WITH/WITHOUT OIDS which surely don't apply to foreign tables.


Thanks for the comments.

The page you pointed has been edited for foreign table.  In current
design, OID system column and CHECK constraints are supported.


Oh, ok.


OID is supported to get oid from the source table (yes, it works only
for postgresql_fdw but it seems useful to support).


I don't think that's worthwhile. Oids on user tables is a legacy 
feature, not recommended for new applications. And if you have to access 
an existing table that uses oids, you can define a regular column for 
the oid:


CREATE FOREIGN TABLE foreigntable (oid oid, data int4) SERVER myserver;


 CHECK constraint
is supported to enable constraint exclusion.


Hmm, my gut reaction is that that's a premature optimization. But what 
about DEFAULTs then, surely that doesn't make sense for a foreign table?



I agree that some kind of documents, such as How to create new FDW,
should be written.


A well-documented file FDW implementation goes a long way for that. But 
a chapter that explains SQL/MED, how to create foreign tables, servers, 
user mappings etc, and how they behave. That we need.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-23 Thread Greg Stark
On Tue, Nov 23, 2010 at 5:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The case I was worried about is dumping from one database and reloading
 into another one with a different locale.  Although I suppose there are
 enough *other* reasons why that might fail that adding changes of
 downcasing behavior might not be a big deal.

If you dump the whole database then pg_dump would create the new
database with the correct encoding and locale. If you change it then
that can already cause it to fail if the data can't be converted to
the new encoding.  And as you point out there are all kinds of ways
you can cause that to fail by making the context incompatible with the
definitions you're loading.

The lesson we learned in the past is that we have to ignore the locale
for all the characters present in the standard identifiers. Beyond
that I think this is just an implementation problem which may be a
show stopper in itself but if we can do anything with mulitbyte
characters it's probably an improvement over what we do now.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-23 Thread Joshua D. Drake
On Tue, 2010-11-23 at 20:18 +0200, Heikki Linnakangas wrote:
 On 23.11.2010 14:22, Shigeru HANADA wrote:

  OID is supported to get oid from the source table (yes, it works only
  for postgresql_fdw but it seems useful to support).
 
 I don't think that's worthwhile. Oids on user tables is a legacy 
 feature, not recommended for new applications.

Agreed. We should do everything we can to NOT encourage their use.

JD
-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-23 Thread Robert Haas
On Tue, Nov 23, 2010 at 5:09 AM, Shigeru HANADA
han...@metrosystems.co.jp wrote:
 I've separated the patch into tree parts.  They have codes, documents
 and tests within, and file_fdw and pgsql_fdw can be applied onto
 fdw_core for each, or together.  I hope the separation helps the
 review of the patches.  Contents of each patch are:

  fdw_core : DDL for FDW HANDLER and FOREIGN TABLE, ForeignScan
  file_fdw : FDW for file, as contrib (loadable) module
  pgsql_fdw: FDW for PG, as contrib module

I think it would be useful to repost each of these on their own thread
and adjust the CF app to have an entry referring to each thread.  That
way we can keep discussion and review for each one separate from the
others.  Note that this thread is already totally off-topic anyway -
it started as a question about when SQL/MED would land and then it got
patches posted to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-23 Thread Robert Haas
On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 ok, I can only recapitulate so this feature was proposed cca two
 months ago, and minimally Tom and maybe you did agreement - with
 request on syntax - do you remember? I am little bit tired so this
 agreement was changed when I spent my time with this.

I went back and reread the thread I believe you're speaking about.
The first post is here:

http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php

I cannot find one single email on that thread where Tom or I or anyone
else endorsed the syntax you've proposed here; indeed, it and some
other suggestions were roundly criticized.  You responded to that by
saying that the arguments against it were all wrong, but no one other
than you ever appeared to believe that.  There are a few emails on
that thread where other people agreed that it would be nice, in
theory, to have some syntax for this, but there is not one single
email that I see saying that any syntax you proposed was a good one.
If you read that thread and concluded that there was consensus to
implement this syntax, you did not read it very carefully.

If we had ELEMENT as a reserved keyword (which apparently it is in
some versions of the SQL standard), maybe FOR ELEMENT wunk IN
wunkarray... would be sufficiently unambiguous.  But it's not even an
unreserved keyword right now, and I have a hard time thinking it would
be worth reserving it just for this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] s/LABEL/VALUE/ for ENUMs

2010-11-23 Thread David E. Wheeler
On Nov 22, 2010, at 6:03 PM, Josh Berkus wrote:

 ... original patch.  Sorry.  Let's not fiddle with the names.

To be clear, as things stand now, the new command is:

ALTER TYPE name ADD VALUE new_enum_value [ { BEFORE | AFTER } 
existing_enum_value ]

So while the term in the SQL statement is VALUE, it's called a label in the 
documentation. I think that's confusing. Does anyone else?

Best

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()

2010-11-23 Thread Peter Eisentraut
On sön, 2010-11-21 at 18:48 -0500, Tom Lane wrote:
 Yeah.  I'm actually not sure that the SQL committee has thought very
 hard about this, because the spec is worded as though they think that
 Unicode case normalization is all they have to say to uniquely
 define what to do.  The Unicode guys recognize that case mapping is
 locale-specific, which puts us right back at square one.  But leaving
 spec compliance aside, we know from bitter experience that we cannot
 use a definition that lets the Turkish locale fool with the mapping of
 i/I. I suspect that locale-dependent mappings of any other characters
 are just as bad, we simply haven't had enough users burnt by such
 cases to have an institutional memory of it.

The number of locale-dependent case mappings in the entire universe of
Unicode is actually limited  to 7 cases for Lithuanian and 8 cases for
Turkish. (ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt)  So it
would be fair to say that there is a default case mapping, and that is
what the SQL standard presumably refers to.

One thing that we could do is let the user declare that he thinks his
current locale is consistent with the Unicode case normalization, and
apply the full Unicode conversion if so.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: fix performance problems with repated decomprimation of varlena values in plpgsql

2010-11-23 Thread Alvaro Herrera
Excerpts from Pavel Stehule's message of lun nov 22 10:01:23 -0300 2010:

 sorry - it is related to tests from FOR-IN-ARRAY thread

 test query: select avg(array_upper(filter02(x,'%AA%', 10),1)) from t1000;

Yeah, I can measure a 25x improvement in that test with the patch
applied.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] s/LABEL/VALUE/ for ENUMs

2010-11-23 Thread Robert Haas
On Tue, Nov 23, 2010 at 2:01 PM, David E. Wheeler da...@kineticode.com wrote:
 On Nov 22, 2010, at 6:03 PM, Josh Berkus wrote:

 ... original patch.  Sorry.  Let's not fiddle with the names.

 To be clear, as things stand now, the new command is:

    ALTER TYPE name ADD VALUE new_enum_value [ { BEFORE | AFTER } 
 existing_enum_value ]

 So while the term in the SQL statement is VALUE, it's called a label in 
 the documentation. I think that's confusing. Does anyone else?

Yes.  As between the two options, I favor changing the command.  And
let's also paint it pink.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] s/LABEL/VALUE/ for ENUMs

2010-11-23 Thread David E. Wheeler
On Nov 23, 2010, at 11:48 AM, Robert Haas wrote:

 So while the term in the SQL statement is VALUE, it's called a label in 
 the documentation. I think that's confusing. Does anyone else?
 
 Yes.  As between the two options, I favor changing the command.  And
 let's also paint it pink.

Would that go with Andrew's boots and buttons?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] s/LABEL/VALUE/ for ENUMs

2010-11-23 Thread Joshua D. Drake
On Tue, 2010-11-23 at 14:48 -0500, Robert Haas wrote:
 On Tue, Nov 23, 2010 at 2:01 PM, David E. Wheeler da...@kineticode.com 
 wrote:
  On Nov 22, 2010, at 6:03 PM, Josh Berkus wrote:
 
  ... original patch.  Sorry.  Let's not fiddle with the names.
 
  To be clear, as things stand now, the new command is:
 
 ALTER TYPE name ADD VALUE new_enum_value [ { BEFORE | AFTER } 
  existing_enum_value ]
 
  So while the term in the SQL statement is VALUE, it's called a label in 
  the documentation. I think that's confusing. Does anyone else?
 
 Yes.  As between the two options, I favor changing the command.  And
 let's also paint it pink.

We discussed this at West remember? When asked to paint a wall black we
ask, which shade?.

JD

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST seems to drop left-branch leaf tuples

2010-11-23 Thread Peter Tanski
On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote:
 j = 0;
 for (i = FirstOffsetNumber; i  maxoff; i = OffsetNumberNext(i)) {
   FPrint* v = deserialize_fprint(entv[i].key);
 
 Isn't this off by one?  Offset numbers are 1-based, so the maxoff
 computation is wrong.
 The first for loop of all others compare with i = maxoff instead of i  
 maxoff.

You are right: I am missing the last one, there.  (During a memory-debugging 
phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite 
error but I fixed that later and never changed it back.)

On the other hand, there are two problems:

1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size 
Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 
64-bit machine).  The example test cases I used were smaller in order to get 
around that issue: they are 2326 bytes base size.  

2. Even after fixing the Picksplit() loop, the dropped-leaf problem still 
manifests itself:

postgres=# set enable_seqscan=false;
SET
postgres=# set enable_indexscan=true;
SET
postgres=# create table fps2 (id serial, soid character(24) not null, 
fingerprint fprint not null);
NOTICE:  CREATE TABLE will create implicit sequence fps2_id_seq for serial 
column fps2.id
CREATE TABLE
postgres=# create index fps2_fingerprint_ix on fps2 using gist (fingerprint 
fprint_gist_ops);
CREATE INDEX
postgres=# \i xaa
psql:xaa:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# \i xab
psql:xab:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# \i xac
psql:xac:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# \i xad
psql:xad:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat   
---
 Number of levels:  1 +
 Number of pages:   1 +
 Number of leaf pages:  1 +
 Number of tuples:  4 +
 Number of invalid tuples:  0 +
 Number of leaf tuples: 4 +
 Total size of tuples:  5628 bytes+
 Total size of leaf tuples: 5628 bytes+
 Total size of index:   8192 bytes+

postgres=# \i xae
psql:xae:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat   
---
 Number of levels:  1 +
 Number of pages:   1 +
 Number of leaf pages:  1 +
 Number of tuples:  5 +
 Number of invalid tuples:  0 +
 Number of leaf tuples: 5 +
 Total size of tuples:  7032 bytes+
 Total size of leaf tuples: 7032 bytes+
 Total size of index:   8192 bytes+

postgres=# \i xaf
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_picksplit:659] entered picksplit
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_picksplit:838] split: 3 left, 2 right
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE:  [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
   gist_stat

 Number of levels:  2  +
 Number of pages:   3  +
 Number of leaf pages:  2  +
 Number of tuples:  7  +
 Number of invalid tuples:  0  +
 Number of leaf tuples: 5  +
 Total size of tuples:  9864 bytes +
 Total size of leaf tuples: 7044 bytes +
 Total size of index:   24576 bytes+

postgres=# select id, soid from fps2;
 id |   soid   
+--
  1 | 4c65a39d4d9bca2c3382
  2 | 4c65a39d4d9bca2c338a
  3 | 4c65a39d4d9bca2c3390
  4 | 4c65a39d4d9bca2c3399
  5 | 4c65a39d4d9bca2c33a5
  6 | 4c65a39d4d9bca2c33a8

postgres=# select f1.id, f2.id, fprint_cmp(f1.fingerprint, f2.fingerprint) from 
fps2 f1 join fps2 f2 on f1.fingerprint=f2.fingerprint;
 id | id |fprint_cmp
++--
  1 |  1 | 1.00031467691569
  2 |  2 | 1.00031467691569
  4 |  4 | 1.00031467691569
  5 |  5 | 1.00031467691569
  6 |  6 | 1.00031467691569
 
So GiST does not include a tuple for row 3; one of the old tuples.  

After inserting a few more rows to trigger another Picksplit():

postgres=# \i xag

[HACKERS] Horizontal Write Scaling

2010-11-23 Thread Eliot Gable
I know there has been a lot of talk about replication getting built into
Postgres and I know of many projects that aim to fill the role. However, I
have not seen much in the way of a serious attempt at multi-master write
scaling. I understand the fundamental problem with write scaling across
multiple nodes is Disk I/O and inter-node communication latency and that in
the conventional synchronous, multi-master replication type setup you would
be limited to the speed of the slowest node, plus the communication protocol
overhead and latency. However, it occurs to me that if you had a shared disk
system via either iSCSI, Fiber Channel, NFS, or whatever (which also had
higher I/O capabilities than a single server could utilize), if you used a
file system that supported locks on a particular section (extent) of a file,
it should theoretically be possible for multiple Postgres instances on
multiple systems sharing the database to read and write to the database
without causing corruption. Obviously, it would take some carefully designed
code in terms of file extent locking, but it seems like it would also
eliminate the need entirely for any type of replication system. Lustre seems
to support the type of file locking required. Of course, I am assuming the
disk system would be RAID 1, RAID 10, RAID 5, or RAID 6 for reliability
purposes and that it is sufficiently redundant that you don't have to worry
about an outage of your storage system.

Has anyone put any thought into what it would take to do this in Postgres?
Is it simply a matter of making the database file interaction code aware of
extent locking, or is it considerably more involved than that? It also
occurs to me that you probably need some form of transaction ordering
mechanism across the nodes based on synchronized timestamps, but it seems
Postgres-R has the required code to do that portion already written. That
may not even be needed since all nodes would actually be working on the same
database files and the locks would ensure a strict ordering of queries. The
only place I see that possibly causing a problem is someone load balancing
across the servers and sending a delete and insert really close to each
other to different nodes such that if the insert executes first, it would be
deleted by the delete even though the intent was to have the delete run
first. Timestamp ordering of the queries or just some shared transaction ID
across the nodes would eliminate that possibility. Other than that, is there
anything else I am missing? Wouldn't this type of setup be far simpler to
implement and provide better scalability than trying to do multi-master
replication using log shipping or binary object shipping or any other
techniques? Wouldn't it also be far more efficient since you don't need to
have a copy of your data on each master node and therefor also don't have to
ship your data to each node and have each node process it?

I am mostly asking for educational purposes, and I would appreciate
technical (and hopefully specific) explanations as to what in Postgres would
need to change to support this.


Re: [HACKERS] ALTER TYPE recursion to typed tables

2010-11-23 Thread Peter Eisentraut
On ons, 2010-11-17 at 21:05 +0100, Dimitri Fontaine wrote:
 Code wise, though, I wonder about the name of the recursing
 parameter of the renameatt_internal function is
 src/backend/commands/tablecmds.c,
 which seems to only get used to detect erroneous attempt at renaming
 the table column directly. Maybe it's only me not used enough to
 PostgreSQL code yet, but here it distract the code reader. Having
 another parameter called recurse is not helping, too, but I don't
 see this one needs to be changed.

This parameter has only minimal use in the renameatt case, but the same
terminology is used throughout the ALTER TABLE code, so I think it's
wise to keep it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal Write Scaling

2010-11-23 Thread Eliot Gable
On Tue, Nov 23, 2010 at 3:43 PM, Eliot Gable
egable+pgsql-hack...@gmail.comegable%2bpgsql-hack...@gmail.com
 wrote:
snip

 Other than that, is there anything else I am missing? Wouldn't this type of
 setup be far simpler to implement and provide better scalability than trying
 to do multi-master replication using log shipping or binary object shipping
 or any other techniques? Wouldn't it also be far more efficient since you
 don't need to have a copy of your data on each master node and therefor also
 don't have to ship your data to each node and have each node process it?

 I am mostly asking for educational purposes, and I would appreciate
 technical (and hopefully specific) explanations as to what in Postgres would
 need to change to support this.


Now that I think about this more, it seems you would still need to ship the
transactions to your other nodes and have some form of processing system on
each that knew which node was supposed to be executing each transaction and
whether that node is currently online. It would also have to have designated
backup nodes to execute the transaction on. Otherwise, you could end up
waiting forever for a transaction to finish that was sent to one node right
before that node lost power. However, if a transaction manager on each node
is able to figured out the ordering of the transactions for itself based on
some globally incrementing transaction ID and able to figure out which node
will be executing the transaction and which node is the backup if the first
one fails, etc., then if the backup sees the primary for that transaction go
offline, it could execute the transaction instead.

Then, I suppose you also need some system in Postgres which can allow
concurrent processing of transactions such that they don't process stuff in
a transaction which is dependent on a transaction that has not yet been
committed, but can process other stuff. So, evaluation of deterministic
functions could take place, but anything volatile could not until all
previous transactions finished. I assume Postgres already has something like
this in order to scale across multiple cores in a single box. This setup
would basically make all the master nodes for the database look like just
extra memory and CPU cores.


Re: [HACKERS] Horizontal Write Scaling

2010-11-23 Thread Kevin Grittner
Eliot Gable egable+pgsql-hack...@gmail.com wrote:
 
 the locks would ensure a strict ordering of queries.
 
PostgreSQL doesn't support S2PL.  I'm not sure what locks you mean.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-11-23 Thread Magnus Hagander
On Tue, Nov 23, 2010 at 15:02, Craig Ringer cr...@postnewspapers.com.au wrote:
 However, I am not clear what benefit we get from moving this into core
 in 9.1.  If it's still going to require a full postmaster restart, the
 GUC you have to change may as well be shared_preload_libraries as a
 new one.

 There's no reason it should require a postmaster restart. New backends
 spawned after the handler is turned on would enable it, and existing
 backends could be signalled to enable it as well.

I think that came off my comment that we could store the on/off in the
startup shared memory block. It'd then be the only way to get it into
any existing backends.


 on-by-default is what we gain. I think that's fairly big...

 More than that. If a crash handler is in core, then:

 - It can be inited much earlier, catching crashes that happen during loading
 of libraries and during later backend init; and

Yeah.


 - It's basically free when the cost of shared library loading is removed, so
 it can be left on in production or even be on by default. I need to do some
 testing on this, but given the apparently near-zero cost of initing the
 handler I won't be surprised if testing a GUC to see if the handler should
 be on or not costs more than loading it does.

I'm fairly certain it does. The GUC would be there to be able to turn
the whole thing off because you don't want the dumps, not for
performance reasons.

 I still wouldn't support on-by-default because you'd need an automatic
 process to weed out old crash dumps or limit the number stored. That's a
 bigger job. So I think the admin should have to turn it on, but it'd be good
 to make it easy to turn on in production without a restart; I don't see why
 that'd be hard.

I think the admin should deal with that - just like the admin has to
clear out the old logs.


 I'll try to put together a patch that does just that, time permitting.
 Things are kind of hectic ATM.

Let me know if you want me to look at adapting the patch for that - i
can do that if you prefer.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-11-23 Thread Magnus Hagander
On Tue, Nov 23, 2010 at 15:09, Craig Ringer cr...@postnewspapers.com.au wrote:
 On 11/23/2010 01:46 AM, Tom Lane wrote:

 * However, when storing it in crashdumps, I think the code would need
 to create that directory if it does not exist, doesn't it?

 If it didn't do so, then manual creation/removal of the directory could
 be used as an on/off switch for the feature.

 Yep. That's how I'd want to do it in 9.1 - test for the directory and use
 that to decide whether to set the handler during early backend startup. That
 way you don't need a GUC, and should be able to load it *very* early in
 backend startup.

Or you set the handler always, and have the handler only actually
create the dump if the directory exists. That way you can add the
directory and still get  a dump from both existing backends and the
postmaster itself without a restart.


 The crash dump facility in dbghelp.dll offers *optional* features where apps
 can stream in additional app-specific data like recent log excerpts, etc. I
 didn't try to implement anything like that in the initial module partly
 because I want to minimize the amount of the backend that must be working
 for a crash dump to succeed.

Yeah, we already have the logs in the logfile etc. Let's keep this
uncomplicated so that it stays working :-)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-11-23 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Tue, Nov 23, 2010 at 15:09, Craig Ringer cr...@postnewspapers.com.au 
 wrote:
 Yep. That's how I'd want to do it in 9.1 - test for the directory and use
 that to decide whether to set the handler during early backend startup. That
 way you don't need a GUC, and should be able to load it *very* early in
 backend startup.

 Or you set the handler always, and have the handler only actually
 create the dump if the directory exists.

+1 for that way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal Write Scaling

2010-11-23 Thread Mark Kirkwood

On 24/11/10 09:43, Eliot Gable wrote:
However, it occurs to me that if you had a shared disk system via 
either iSCSI, Fiber Channel, NFS, or whatever (which also had higher 
I/O capabilities than a single server could utilize)


Yeah, current Postgres multi-master projects seem to be focusing on 
shared-nothing architecture as opposed to a shared-disk one. I guess the 
advantage of the former is that specialized (i.e expensive) hardware is 
not required to attempt to overcome the point of failure with 
shared-disk systems - the disk they share.


Cheers

Mark


Re: [HACKERS] Re: Proposed Windows-specific change: Enable crash dumps (like core files)

2010-11-23 Thread Craig Ringer

On 11/24/2010 05:18 AM, Magnus Hagander wrote:


Or you set the handler always, and have the handler only actually
create the dump if the directory exists. That way you can add the
directory and still get  a dump from both existing backends and the
postmaster itself without a restart.


That's way smarter. No extra filesystem access during startup, even if 
it is cheap.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-11-23 Thread Robert Haas
On Sun, Nov 21, 2010 at 11:07 PM, Andy Colson a...@squeakycode.net wrote:
 After all this... there are too many numbers for me.  I have no idea what
 this means.

I think what it means that is that, for you, unlogged tables were
almost as fast as shutting off all of synchronous_commit,
full_page_writes, and fsync, and further setting
bgwriter_lru_maxpages=0.  Now, that seems a little strange, because
you'd think if anything it would be faster.  I'm not sure what
accounts for the difference, although I wonder if checkpoints are part
of it.  With the current code, which doesn't exclude unlogged table
pages from checkpoints, a checkpoint will still be faster with
fsync=off than with unlogged tables.  It seems like we're agreed that
this is a problem to be fixed in phase two, though, either by fsyncing
every unlogged table we can find at shutdown time, or else by
providing two durability options, one that works as the current code
does (but survives clean shutdowns) and another that excludes dirty
pages from checkpoints (and does not survive clean shutdowns).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-23 Thread Cédric Villemain
2010/11/23 Robert Haas robertmh...@gmail.com:
 On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 ok, I can only recapitulate so this feature was proposed cca two
 months ago, and minimally Tom and maybe you did agreement - with
 request on syntax - do you remember? I am little bit tired so this
 agreement was changed when I spent my time with this.

 I went back and reread the thread I believe you're speaking about.
 The first post is here:

 http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php

Here perhaps ? (or before)

http://archives.postgresql.org/pgsql-hackers/2010-09/msg01983.php


 I cannot find one single email on that thread where Tom or I or anyone
 else endorsed the syntax you've proposed here;

Nah, but you didn't disagree on the main idea, you just said : 'like
Tom I agree that syntax must be uptaded to something beter' , more or
less

 indeed, it and some
 other suggestions were roundly criticized.  You responded to that by
 saying that the arguments against it were all wrong, but no one other
 than you ever appeared to believe that.  There are a few emails on
 that thread where other people agreed that it would be nice, in
 theory, to have some syntax for this, but there is not one single
 email that I see saying that any syntax you proposed was a good one.
 If you read that thread and concluded that there was consensus to
 implement this syntax, you did not read it very carefully.

I think you (Robert) misunderstood dramatically what Pavel try to do.
Pavel did an excellent optimization work for a specific point. This
specific point looks crucial for me in the current behavior of
PostgreSQL[1]. AFAIS Pavel didn't want to implement a genious syntax,
but an optimization feature.

I don't care about syntax, I care with Tom explanation on that. but no more.

I care with the idea that this patch is just a quick way to cut the iceberg.
It is.
and ?

And we might do it better with more deep analysis and refactoring more
stuff, I agree...
Still this patch is interesting enought from perf point of view to not
trash it that quickly, IMO.


 If we had ELEMENT as a reserved keyword (which apparently it is in
 some versions of the SQL standard), maybe FOR ELEMENT wunk IN
 wunkarray... would be sufficiently unambiguous.  But it's not even an
 unreserved keyword right now, and I have a hard time thinking it would
 be worth reserving it just for this.

I am not aware of SQL spec precisely about that.
David, did your recent post about UNNEST stuff looks relevant in this
thread ? I mean can we elaborate something from your suggestion to
improve the situation of the current patch (and vice-versa) ?

[1] data compression in the array allow to insert billions of data for
a small size print. (I know it is not pure design, it is just pure
end-user very effective solution)

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-11-23 Thread Cédric Villemain
2010/11/21 Andres Freund and...@anarazel.de:
 On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:
 For a similar problem we had (kernel buffering too much) we had success
 using the fadvise and madvise WONTNEED syscalls to force the data to
 exit the cache much sooner than it would otherwise. This was on Linux
 and it had the side-effect that the data was deleted from the kernel
 cache, which we wanted, but probably isn't appropriate here.
 Yep, works fine. Although it has the issue that the data will get read again 
 if
 archiving/SR is enabled.

mmhh . the current code does call DONTNEED or WILLNEED for WAL
depending of the archiving off or on.

This matters *only* once the data is writen (fsync, fdatasync), before
that it should not have  an effect.


 There is also sync_file_range, but that's linux specific, although
 close to what you want I think. It would allow you to work with blocks
 smaller than 1GB.
 Unfortunately that puts the data under quite high write-out pressure inside
 the kernel - which is not what you actually want because it limits reordering
 and such significantly.

 It would be nicer if you could get a mix of both semantics (looking at it,
 depending on the approach that seems to be about a 10 line patch to the
 kernel). I.e. indicate that you want to write the pages soonish, but don't put
 it on the head of the writeout queue.

 Andres

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Suggested easy TODO: pg_dump --from-list

2010-11-23 Thread Josh Berkus
Hackers,

I just thought of a good feature to put on the [E] list for TODO, if
people agree: a switch to pg_dump called --from-list which would take
a filename and dump the list of objects in that file.

Objections to putting this on the TODO?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-23 Thread Josh Kupershmidt
On Fri, Oct 29, 2010 at 10:33 AM, David Fetter da...@fetter.org wrote:
 That seems like a matter for a separate patch.  Looking this over, I
 found I'd created a query that can never get used, so please find
 enclosed the next version of the patch :)

I like deletables better than deleteables for
Query_for_list_of_deleteables. Sources: dictionary.com and a git grep
through the rest of the PG source.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-23 Thread Robert Haas
On Tue, Nov 23, 2010 at 8:56 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2010/11/23 Robert Haas robertmh...@gmail.com:
 On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 ok, I can only recapitulate so this feature was proposed cca two
 months ago, and minimally Tom and maybe you did agreement - with
 request on syntax - do you remember? I am little bit tired so this
 agreement was changed when I spent my time with this.

 I went back and reread the thread I believe you're speaking about.
 The first post is here:

 http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php

 Here perhaps ? (or before)

 http://archives.postgresql.org/pgsql-hackers/2010-09/msg01983.php

Dang.  You're right.  I stand corrected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security hooks on object creation

2010-11-23 Thread KaiGai Kohei
Thanks for your reviewing, and sorry for the late reply.
I've not been available for a few days.

(2010/11/22 12:11), Robert Haas wrote:
 2010/11/12 KaiGai Koheikai...@ak.jp.nec.com:
 (2010/11/12 19:34), KaiGai Kohei wrote:
 I revised my patch according to the prior suggestions.

 I'm sorry. I revised my patch, but not attached.

 Please see this attached one.
 
 I'm satisfied with this approach, although I intend to change
 InvokeObjectAccessHook0 to simply InvokeObjectAccessHook before
 committing it;

OK. We have no other object-access-type which takes any arguments
right now. It is quite cosmetic things, so we may be able to add
the number of arguments later, such as SysCache.

 and correct your use of AttributeRelationId to
 RelationRelationId for consistency with the rest of the code.

Oops, it was my bug. I'll fix it.

 What
 I'm not quite sure about is where to put the definitions you've added
 to a new file utils/hooks.h; I don't feel that's a very appropriate
 location.  It's tempting to put them in utils/acl.h just because this
 is vaguely access-control related and that header is already included
 in most of the right places, but maybe that's too much of a stretch;
 or perhaps catalog/catalog.h, although that doesn't feel quite right
 either.  If we are going to add a new header file, I still don't like
 utils/hooks.h much - it's considerably more generic than can be
 justified by its contents.
 
I don't think utils/acl.h is long-standing right place, because we
intended not to restrict the purpose of this hooks to access controls
as you mentioned.

I think somewhere under the catalog/ directory is a good idea because
it hooks events that user wants (eventually) to modify system catalogs.
How about catalog/hooks.h, instead of utils/hooks.h?

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security hooks on object creation

2010-11-23 Thread Robert Haas
2010/11/23 KaiGai Kohei kai...@ak.jp.nec.com:
 What
 I'm not quite sure about is where to put the definitions you've added
 to a new file utils/hooks.h; I don't feel that's a very appropriate
 location.  It's tempting to put them in utils/acl.h just because this
 is vaguely access-control related and that header is already included
 in most of the right places, but maybe that's too much of a stretch;
 or perhaps catalog/catalog.h, although that doesn't feel quite right
 either.  If we are going to add a new header file, I still don't like
 utils/hooks.h much - it's considerably more generic than can be
 justified by its contents.

 I don't think utils/acl.h is long-standing right place, because we
 intended not to restrict the purpose of this hooks to access controls
 as you mentioned.

 I think somewhere under the catalog/ directory is a good idea because
 it hooks events that user wants (eventually) to modify system catalogs.
 How about catalog/hooks.h, instead of utils/hooks.h?

Well, if we're going to create a new header file for this, I think it
should be called something like catalog/objectaccess.h, rather than
just hooks.h.  But I'd rather reuse something that's already there,
all things being equal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST seems to drop left-branch leaf tuples

2010-11-23 Thread Peter Tanski
I found another off-by-one error in my Picksplit() algorithm and the GiST index 
contains one leaf tuple for each row in the table now.  The error was to start 
from 1 instead of 0 when assigning the entries.  Thanks to everyone for your 
help.

For the record, this is the only GiST index I know of where the keys are over 
2000 bytes in size.  So GiST definitely handles large keys.  Perhaps the 
maximum size for intarray could be increased.

On Nov 23, 2010, at 4:01 PM, Yeb Havinga wrote:

 On 2010-11-23 20:54, Peter Tanski wrote:
 On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote:
 j = 0;
 for (i = FirstOffsetNumber; i  maxoff; i = OffsetNumberNext(i)) {
   FPrint* v = deserialize_fprint(entv[i].key);
 Isn't this off by one?  Offset numbers are 1-based, so the maxoff
 computation is wrong.
 The first for loop of all others compare with i= maxoff instead of i  
 maxoff.
 You are right: I am missing the last one, there.  (During a memory-debugging 
 phase entv[entryvec-n - 1] was always invalid, probably as a memory 
 overwrite error but I fixed that later and never changed it back.)
 
 On the other hand, there are two problems:
 
 1. the maximum size on a GiST page is 4240 bytes, so I cannot add a 
 full-size Datum using this kind of hash-key setup (the base Datum size is 
 4230 bytes on a 64-bit machine).  The example test cases I used were smaller 
 in order to get around that issue: they are 2326 bytes base size.
 
 2. Even after fixing the Picksplit() loop, the dropped-leaf problem still 
 manifests itself:
 I noticed an n_entries intialization in one of your earlier mails that might 
 also be a source of trouble. I was under the impression that gistentryvectors 
 have n-1 entries (not n-2 as you say), because the first element (0 / 
 InvalidOffsetNumber) must be skipped. E.g. entryvec-n = 5. This means that 
 there are 4 entries, which are in array positions 1,2,3,4.
 
 btw: interesting topic, audio fingerprinting!
 
 regards,
 Yeb Havinga
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-23 Thread Robert Haas
On Mon, Nov 22, 2010 at 6:54 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 21.11.2010 15:18, Robert Haas wrote:

 On Sat, Nov 20, 2010 at 4:07 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

 Robert Haasrobertmh...@gmail.com  writes:

 So what DO we need to guard against here?

 I think the general problem can be stated as process A changes two or
 more values in shared memory in a fairly short span of time, and process
 B, which is concurrently examining the same variables, sees those
 changes occur in a different order than A thought it made them in.

 In practice we do not need to worry about changes made with a kernel
 call in between, as any sort of context swap will cause the kernel to
 force cache synchronization.

 Also, the intention is that the locking primitives will take care of
 this for any shared structures that are protected by a lock.  (There
 were some comments upthread suggesting maybe our lock code is not
 bulletproof; but if so that's something to fix in the lock code, not
 a logic error in code using the locks.)

 So what this boils down to is being an issue for shared data structures
 that we access without using locks.  As, for example, the latch
 structures.

 So is the problem case a race involving owning/disowning a latch vs.
 setting that same latch?

 No. (or maybe that as well, but that's not what we've been concerned about
 here). As far as I've understood correctly, the problem is that process A
 does something like this:

 /* set a shared variable */
 ((volatile bool *) shmem)-variable = true;
 /* Wake up process B to notice that we changed the variable */
 SetLatch();

 And process B does this:

 for (;;)
 {
  ResetLatch();
  if (((volatile bool *) shmem)-variable)
    DoStuff();

  WaitLatch();
 }

 This is the documented usage pattern of latches. The problem arises if
 process A runs just before ResetLatch, but the effect of setting the shared
 variable doesn't become visible until after the if-test in process B.
 Process B will clear the is_set flag in ResetLatch(), but it will not
 DoStuff(), so it in effect misses the wakeup from process A and goes back to
 sleep even though it would have work to do.

 This situation doesn't arise in the current use of latches, because the
 shared state comparable to shmem-variable in the above example is protected
 by a spinlock. But it might become an issue in some future use case.

Eh, so, should we do anything about this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tab completion for view triggers in psql

2010-11-23 Thread David Fetter
On Tue, Nov 23, 2010 at 09:37:57PM -0500, Josh Kupershmidt wrote:
 On Fri, Oct 29, 2010 at 10:33 AM, David Fetter da...@fetter.org wrote:
  That seems like a matter for a separate patch.  Looking this over, I
  found I'd created a query that can never get used, so please find
  enclosed the next version of the patch :)
 
 I like deletables better than deleteables for
 Query_for_list_of_deleteables. Sources: dictionary.com and a git grep
 through the rest of the PG source.

Thanks for the review.

Please find attached a patch changing both this and updateable to
updatable, also per the very handy git grep I just learned about :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
***
*** 303,308  static const SchemaQuery Query_for_list_of_tables = {
--- 303,359 
NULL
  };
  
+ /* The bit masks for the following three functions come from
+  * src/include/catalog/pg_trigger.h.
+  */
+ static const SchemaQuery Query_for_list_of_insertables = {
+   /* catname */
+   pg_catalog.pg_class c,
+   /* selcondition */
+   c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS 
+   (SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND 
t.tgtype | (1  2) = t.tgtype)),
+   /* viscondition */
+   pg_catalog.pg_table_is_visible(c.oid),
+   /* namespace */
+   c.relnamespace,
+   /* result */
+   pg_catalog.quote_ident(c.relname),
+   /* qualresult */
+   NULL
+ };
+ 
+ static const SchemaQuery Query_for_list_of_deletables = {
+   /* catname */
+   pg_catalog.pg_class c,
+   /* selcondition */
+   c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS 
+   (SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND 
t.tgtype | (1  3) = t.tgtype)),
+   /* viscondition */
+   pg_catalog.pg_table_is_visible(c.oid),
+   /* namespace */
+   c.relnamespace,
+   /* result */
+   pg_catalog.quote_ident(c.relname),
+   /* qualresult */
+   NULL
+ };
+ 
+ static const SchemaQuery Query_for_list_of_updatables = {
+   /* catname */
+   pg_catalog.pg_class c,
+   /* selcondition */
+   c.relkind = 'r' OR (c.relkind = 'v' AND c.relhastriggers AND EXISTS 
+   (SELECT 1 FROM pg_catalog.pg_trigger t WHERE t.tgrelid = c.oid AND 
t.tgtype | (1  4) = t.tgtype)),
+   /* viscondition */
+   pg_catalog.pg_table_is_visible(c.oid),
+   /* namespace */
+   c.relnamespace,
+   /* result */
+   pg_catalog.quote_ident(c.relname),
+   /* qualresult */
+   NULL
+ };
+ 
  static const SchemaQuery Query_for_list_of_tisv = {
/* catname */
pg_catalog.pg_class c,
***
*** 333,338  static const SchemaQuery Query_for_list_of_tsv = {
--- 384,404 
NULL
  };
  
+ static const SchemaQuery Query_for_list_of_tv = {
+   /* catname */
+   pg_catalog.pg_class c,
+   /* selcondition */
+   c.relkind IN ('r', 'v'),
+   /* viscondition */
+   pg_catalog.pg_table_is_visible(c.oid),
+   /* namespace */
+   c.relnamespace,
+   /* result */
+   pg_catalog.quote_ident(c.relname),
+   /* qualresult */
+   NULL
+ };
+ 
  static const SchemaQuery Query_for_list_of_views = {
/* catname */
pg_catalog.pg_class c,
***
*** 630,636  psql_completion(char *text, int start, int end)
   *prev2_wd,
   *prev3_wd,
   *prev4_wd,
!  *prev5_wd;
  
static const char *const sql_commands[] = {
ABORT, ALTER, ANALYZE, BEGIN, CHECKPOINT, CLOSE, 
CLUSTER,
--- 696,703 
   *prev2_wd,
   *prev3_wd,
   *prev4_wd,
!  *prev5_wd,
!  *prev6_wd;
  
static const char *const sql_commands[] = {
ABORT, ALTER, ANALYZE, BEGIN, CHECKPOINT, CLOSE, 
CLUSTER,
***
*** 669,675  psql_completion(char *text, int start, int end)
completion_info_charp2 = NULL;
  
/*
!* Scan the input line before our current position for the last five
 * words. According to those we'll make some smart decisions on what the
 * user is probably intending to type. TODO: Use strtokx() to do this.
 */
--- 736,742 
completion_info_charp2 = NULL;
  
/*
!* Scan the input line before our current position for the last six
 * words. According to those we'll make some smart decisions on what the
 * 

Re: [HACKERS] Suggested easy TODO: pg_dump --from-list

2010-11-23 Thread Andrew Dunstan



On 11/23/2010 09:33 PM, Josh Berkus wrote:

Hackers,

I just thought of a good feature to put on the [E] list for TODO, if
people agree: a switch to pg_dump called --from-list which would take
a filename and dump the list of objects in that file.

Objections to putting this on the TODO?


Well, very little about pg_dump is very [E], IMNSHO. The question in my 
mind here is what format the list file will take. For example, how would 
we specify a function? Would we need to specify all the argument types 
(or at least the IN arguments)? It's not as easy as a list with 
pg_restore, which is just a list of TOC ids, and all the rest is just a 
comment in the list file.


I certainly don't think we should put this on the list without at least 
having the idea fleshed out some more.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggested easy TODO: pg_dump --from-list

2010-11-23 Thread Joachim Wieland
On Tue, Nov 23, 2010 at 10:24 PM, Andrew Dunstan and...@dunslane.net wrote:
 Well, very little about pg_dump is very [E], IMNSHO. The question in my mind
 here is what format the list file will take. For example, how would we
 specify a function? Would we need to specify all the argument types (or at
 least the IN arguments)? It's not as easy as a list with pg_restore, which
 is just a list of TOC ids, and all the rest is just a comment in the list
 file.

 I certainly don't think we should put this on the list without at least
 having the idea fleshed out some more.

I think the list should be generated by pg_dump itself in a first run,
by building a complete TOC and then dumping a pg_restore -l like
list format (without dumpIds) where the user just deletes the objects
that he doesn't want to get dumped. The list wouldn't contain dumpIds,
but catalogIds and those should be sufficiently unique and easy to
parse and compare.

Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Assertion failure on hot standby

2010-11-23 Thread Fujii Masao
Hi,

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php

When I did unusual operations (e.g., suspend bgwriter by gdb,
pgbench -i and issue txid_current many times) on the master
in order to try to reproduce the above HS error, I encountered
the following assertion error.

Since I compiled the standby postgres with WAL_DEBUG and
ran it with wal_debug = on, all the replayed WAL records were
logged.


sby LOG:  REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid
23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09
sby LOG:  REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid
23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09
sby LOG:  consistent recovery state reached at 0/134C0510
sby LOG:  REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid
23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09
sby LOG:  REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0;
len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409
sby LOG:  REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0;
len 20: Standby -  running xacts: nextXid 23256 latestCompletedXid
23255 oldestRunningXid 23256
TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File:
twophase.c, Line: 1209)
sby LOG:  database system is ready to accept read only connections
sby LOG:  startup process (PID 32666) was terminated by signal 6: Aborted
sby LOG:  terminating any other active server processes


Does anyone know what the cause of the problem is?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-23 Thread Joachim Wieland
On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 19.11.2010 23:46, Joachim Wieland wrote:

 FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
 pArray-maxKnownAssignedXids: 6890

 Hmm, that's a lot of entries in KnownAssignedXids.

 Can you recompile with WAL_DEBUG, and run the recovery again with
 wal_debug=on ? That will print all the replayed WAL records, which is a lot
 of data, but it might give a hint what's going on.

Sure, but this gives me only one more line:

[...]
LOG:  redo starts at 1F8/FC00E978
LOG:  REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 4587) exited with exit code 1
LOG:  terminating any other active server processes


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggested easy TODO: pg_dump --from-list

2010-11-23 Thread Josh Berkus
.
 Well, very little about pg_dump is very [E], IMNSHO. The question in my
 mind here is what format the list file will take

I was thinking same format as pg_restore -l, only without the dumpIDs.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] profiling connection overhead

2010-11-23 Thread Robert Haas
Per previous threats, I spent some time tonight running oprofile
(using the directions Tom Lane was foolish enough to provide me back
in May).  I took testlibpq.c and hacked it up to just connect to the
server and then disconnect in a tight loop without doing anything
useful, hoping to measure the overhead of starting up a new
connection.  Ha, ha, funny about that:

120899   18.0616  postgres AtProcExit_Buffers
56891 8.4992  libc-2.11.2.so   memset
30987 4.6293  libc-2.11.2.so   memcpy
26944 4.0253  postgres hash_search_with_hash_value
26554 3.9670  postgres AllocSetAlloc
20407 3.0487  libc-2.11.2.so   _int_malloc
17269 2.5799  libc-2.11.2.so   fread
13005 1.9429  ld-2.11.2.so do_lookup_x
11850 1.7703  ld-2.11.2.so _dl_fixup
10194 1.5229  libc-2.11.2.so   _IO_file_xsgetn

In English: the #1 overhead here is actually something that happens
when processes EXIT, not when they start.  Essentially all the time is
in two lines:

 56920  6.6006 :for (i = 0; i  NBuffers; i++)
   :{
 98745 11.4507 :if (PrivateRefCount[i] != 0)

Non-default configs:

max_connections = 100
shared_buffers = 480MB
work_mem = 4MB
maintenance_work_mem = 128MB
checkpoint_segments = 30
random_page_cost = 2.0

Anything we can do about this?  That's a lot of overhead, and it'd be
a lot worse on a big machine with 8GB of shared_buffers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-23 Thread Robert Haas
On Wed, Nov 24, 2010 at 12:07 AM, Robert Haas robertmh...@gmail.com wrote:
 Per previous threats, I spent some time tonight running oprofile
 (using the directions Tom Lane was foolish enough to provide me back
 in May).  I took testlibpq.c and hacked it up to just connect to the
 server and then disconnect in a tight loop without doing anything
 useful, hoping to measure the overhead of starting up a new
 connection.

Oh, right: attachments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
CPU: AMD64 family10, speed 2200 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 10
samples  %image name   symbol name
120899   18.0616  postgres AtProcExit_Buffers
56891 8.4992  libc-2.11.2.so   memset
30987 4.6293  libc-2.11.2.so   memcpy
26944 4.0253  postgres hash_search_with_hash_value
26554 3.9670  postgres AllocSetAlloc
20407 3.0487  libc-2.11.2.so   _int_malloc
17269 2.5799  libc-2.11.2.so   fread
13005 1.9429  ld-2.11.2.so do_lookup_x
11850 1.7703  ld-2.11.2.so _dl_fixup
10194 1.5229  libc-2.11.2.so   _IO_file_xsgetn
10087 1.5069  postgres MemoryContextAllocZero
9143  1.3659  ld-2.11.2.so strcmp
8957  1.3381  postgres load_relcache_init_file
7132  1.0655  postgres fmgr_info_cxt_security
5630  0.8411  libc-2.11.2.so   vfprintf
5029  0.7513  ld-2.11.2.so _dl_lookup_symbol_x
4997  0.7465  postgres _bt_getroot
3935  0.5879  libc-2.11.2.so   memcmp
3874  0.5788  postgres hash_seq_search
3718  0.5554  postgres LWLockAcquire
3666  0.5477  postgres guc_name_compare
3457  0.5165  libc-2.11.2.so   __strlen_sse2
3297  0.4926  postgres load_relmap_file
3175  0.4743  libc-2.11.2.so   malloc
3170  0.4736  postgres LockAcquireExtended
3139  0.4689  postgres hash_any
3113  0.4651  postgres MemoryContextAlloc
2946  0.4401  postgres _bt_compare
2936  0.4386  postgres index_getnext
2885  0.4310  ld-2.11.2.so _dl_sort_fini
2873  0.4292  libc-2.11.2.so   _int_free
2619  0.3913  postgres MemoryContextCreate
2579  0.3853  ld-2.11.2.so check_match.12146
2485  0.3712  postgres heap_page_prune_opt
2457  0.3671  postgres LWLockRelease
2438  0.3642  postgres CreateTemplateTupleDesc
2322  0.3469  ld-2.11.2.so _dl_fini
2301  0.3438  postgres set_config_option
2253  0.3366  postgres _bt_first
2225  0.3324  postgres PinBuffer
2140  0.3197  postgres BeginReportingGUCOptions
2091  0.3124  postgres _bt_preprocess_keys
2085  0.3115  libc-2.11.2.so   _IO_vfscanf
2051  0.3064  postgres element_alloc
1962  0.2931  postgres ServerLoop
1936  0.2892  postgres CreateTupleDescCopyConstr
1884  0.2815  libc-2.11.2.so   __strcpy_sse2
1846  0.2758  libkrb5.so.3.3   /lib64/libkrb5.so.3.3
1801  0.2691  postgres FunctionCall2
1797  0.2685  postgres hash_create
1782  0.2662  postgres PgstatCollectorMain
1761  0.2631  postgres _bt_checkpage
1728  0.2582  postgres AllocSetFree
1597  0.2386  libselinux.so.1  /lib64/libselinux.so.1
1579  0.2359  libc-2.11.2.so   _IO_default_xsputn
1543  0.2305  libc-2.11.2.so   free
1531  0.2287  postgres SearchCatCache
1528  0.2283  postgres BuildHardcodedDescriptor
1506  0.2250  libc-2.11.2.so   strchrnul
1475  0.2204  postgres _bt_checkkeys
1457  0.2177  postgres ResourceOwnerForgetRelationRef
1451  0.2168  ld-2.11.2.so _dl_runtime_resolve
1443  0.2156  postgres InitCatCache
1443  0.2156  postgres hash_search
1382  0.2065  ld-2.11.2.so _dl_name_match_p
1360  0.2032  postgres PostgresMain
1347  0.2012  postgres pgstat_report_stat
1342  0.2005  libssl.so.1.0.0b /usr/lib64/libssl.so.1.0.0b
1340  0.2002  postgres systable_beginscan
1311  0.1959  libgssapi_krb5.so.2.2/lib64/libgssapi_krb5.so.2.2
1254  0.1873  postgres errstart
1247  0.1863  libc-2.11.2.so   __strncmp_sse2
1245

Re: [HACKERS] Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running

2010-11-23 Thread Bruce Momjian
Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Tom Lane wrote:
   Possibly the cleanest fix is to implement pg_ping as a libpq function.
   You do have to distinguish connection failures (ie connection refused)
   from errors that came back from the postmaster, and the easiest place to
   be doing that is inside libpq.
  
   OK, so a new libpq function --- got it.  Would we just pass the status
   from the backend or can it be done without backend modifications?
  
  It would definitely be better to do it without backend mods, so that
  the functionality would work against back-branch postmasters.
  
  To my mind, the entire purpose of such a function is to classify the
  possible errors so that the caller doesn't have to.  So I wouldn't
  consider that it ought to pass back the status from the backend.
  I think what we basically want is a function that takes a conninfo
  string (or one of the variants of that) and returns an enum defined
  more or less like this:
  
  * failed to connect to postmaster
  * connected, but postmaster is not accepting sessions
  * postmaster is up and accepting sessions
  
  I'm not sure those are exactly the categories we want, but something
  close to that.  In particular, I don't know if there's any value in
  subdividing the not accepting sessions status --- pg_ctl doesn't
  really care, but other use-cases might want to tell the difference
  between the various canAcceptConnections failure states.
  
  BTW, it is annoying that we can't definitively distinguish postmaster
  is not running from a connectivity problem, but I can't see a way
  around that.
 
 Agreed.  I will research this.

I have researched this and developed the attached patch.  It implements
PGping() and PGpingParams() in libpq, and has pg_ctl use it for pg_ctl
-w server status detection.

The new output for cases where .pgpass is not allowing for a connection
is:

$ pg_ctl -w -l /dev/null start
waiting for server to start done
server started
However, could not connect, perhaps due to invalid authentication or
misconfiguration.

The code basically checks the connection status between PQconnectStart()
and connectDBComplete() to see if the server is running but we failed to
connect for some reason.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index a911c50..32c58a5 100644
*** /tmp/b2EvXa_libpq.sgml	Tue Nov 23 17:41:50 2010
--- doc/src/sgml/libpq.sgml	Tue Nov 23 17:36:32 2010
*** int PQbackendPID(const PGconn *conn);
*** 1511,1516 
--- 1511,1584 
   /listitem
  /varlistentry
  
+ varlistentry id=libpq-pqpingparams
+  termfunctionPQpingParams/functionindextermprimaryPQpingParams///term
+  listitem
+   para
+functionPQpingParams/function indicates the status of the
+server.  The currently recognized parameter key words are the
+same as functionPQconnectParams/.
+ 
+ synopsis
+ PGPing PQpingParams(const char **keywords, const char **values, int expand_dbname);
+ /synopsis
+ 
+It returns one of the following values:
+ 
+variablelist
+ varlistentry id=libpq-pqpingparams-pqaccess
+  termliteralPQACCESS/literal/term
+  listitem
+   para
+The server is running and allows access.
+   /para
+  /listitem
+ /varlistentry
+ 
+ varlistentry id=libpq-pqpingparams-pqreject
+  termliteralPQREJECT/literal/term
+  listitem
+   para
+The server is running but rejected a connection request.
+   /para
+  /listitem
+ /varlistentry
+ 
+ varlistentry id=libpq-pqpingparams-pqnoresponse
+  termliteralPQNORESPONSE/literal/term
+  listitem
+   para
+The server did not respond.
+   /para
+  /listitem
+ /varlistentry
+/variablelist
+ 
+   /para
+ 
+  /listitem
+ /varlistentry
+ 
+ varlistentry id=libpq-pqping
+  termfunctionPQping/functionindextermprimaryPQping///term
+  listitem
+   para
+Returns the status of the server.
+ 
+ synopsis
+ PGPing PQping(const char *conninfo);
+ /synopsis
+   /para
+ 
+   para
+This function uses the same literalconninfo/literal parameter
+key words as functionPQconnectdb/.  It returns the same
+values as functionPQpingParams/ above.
+   /para
+ 
+  /listitem
+ /varlistentry
+ 
  varlistentry id=libpq-pqconnectionneedspassword
   termfunctionPQconnectionNeedsPassword/functionindextermprimaryPQconnectionNeedsPassword///term
   listitem
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 14d36b5..7a5bb7a 100644
*** 

Re: [HACKERS] Instrument checkpoint sync calls

2010-11-23 Thread Greg Smith

Robert Haas wrote:

Did this get eaten by the email goblin, or you're still working on it?
  


Fell behind due to an unfortunately timed bit of pneumonia.  Hurray for 
the health benefits of cross country flights.  Will fix this up, rebase 
my other patch, and head toward some more review/'Fest cleanup now that 
I'm feeling better.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-23 Thread Tom Lane
=?ISO-8859-1?Q?C=E9dric_Villemain?= cedric.villemain.deb...@gmail.com writes:
 I think you (Robert) misunderstood dramatically what Pavel try to do.
 Pavel did an excellent optimization work for a specific point. This
 specific point looks crucial for me in the current behavior of
 PostgreSQL[1]. AFAIS Pavel didn't want to implement a genious syntax,
 but an optimization feature.

As near as I can tell, Pavel is bullheadedly insisting on adding new
syntax, not on the optimization aspect of it.  I already pointed out
how he could get 100% of the performance benefit using the existing
syntax, but he doesn't appear to be willing to pursue that route.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST seems to drop left-branch leaf tuples

2010-11-23 Thread Oleg Bartunov
Peter, glad to know you succeeded.  FYI, a year ago we developed GiST 
extension for rdkit.org.


Oleg
On Tue, 23 Nov 2010, Peter Tanski wrote:


I found another off-by-one error in my Picksplit() algorithm and the GiST index 
contains one leaf tuple for each row in the table now.  The error was to start 
from 1 instead of 0 when assigning the entries.  Thanks to everyone for your 
help.

For the record, this is the only GiST index I know of where the keys are over 
2000 bytes in size.  So GiST definitely handles large keys.  Perhaps the 
maximum size for intarray could be increased.

On Nov 23, 2010, at 4:01 PM, Yeb Havinga wrote:


On 2010-11-23 20:54, Peter Tanski wrote:

On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote:

j = 0;
for (i = FirstOffsetNumber; i  maxoff; i = OffsetNumberNext(i)) {
  FPrint* v = deserialize_fprint(entv[i].key);

Isn't this off by one?  Offset numbers are 1-based, so the maxoff
computation is wrong.

The first for loop of all others compare with i= maxoff instead of i  maxoff.

You are right: I am missing the last one, there.  (During a memory-debugging 
phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite 
error but I fixed that later and never changed it back.)

On the other hand, there are two problems:

1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size 
Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 
64-bit machine).  The example test cases I used were smaller in order to get 
around that issue: they are 2326 bytes base size.

2. Even after fixing the Picksplit() loop, the dropped-leaf problem still 
manifests itself:

I noticed an n_entries intialization in one of your earlier mails that might also 
be a source of trouble. I was under the impression that gistentryvectors have n-1 
entries (not n-2 as you say), because the first element (0 / InvalidOffsetNumber) 
must be skipped. E.g. entryvec-n = 5. This means that there are 4 entries, 
which are in array positions 1,2,3,4.

btw: interesting topic, audio fingerprinting!

regards,
Yeb Havinga







Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal Write Scaling

2010-11-23 Thread Greg Smith

Eliot Gable wrote:
However, I have not seen much in the way of a serious attempt at 
multi-master write scaling. 


Scaling writes across nodes using PL/Proxy works.

Of course, I am assuming the disk system would be RAID 1, RAID 10, 
RAID 5, or RAID 6 for reliability purposes and that it is sufficiently 
redundant that you don't have to worry about an outage of your storage 
system.


The idea that you'll have a system that needs better write scalability 
that isn't limited by the storage system is an unusual one, not the 
expected case.  And the trend everywhere in the industry is away from 
giant redundant systems, and toward having multiple cheaper redundant 
copies of all the data instead.  It's impossible to protect against 
things like environmental failure at any single location.  Once you've 
accepted that you have to be able to replicate this beast too if you 
want high availability, you're back at having a multi-node problem 
again.  This is why the most active work is on distributed designs that 
start on that basis, rather than projects trying to build more scalable 
monoliths. 


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggested easy TODO: pg_dump --from-list

2010-11-23 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Well, very little about pg_dump is very [E], IMNSHO. The question in my
 mind here is what format the list file will take

 I was thinking same format as pg_restore -l, only without the dumpIDs.

Nope ... those strings are just helpful comments, they aren't really
guaranteed to be unique identifiers.  In any case, it seems unlikely
that a user could expect to get the more complicated cases exactly right
other than by consulting pg_dump | pg_restore -l output.  Which makes
the use-case kind of dubious to me.

I don't say that this wouldn't be a useful feature, but you need a
better spec than this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] JDBC and Binary protocol error, for some statements

2010-11-23 Thread Radosław Smogura
Hi,

I work on implementing binary protocol, during test select ?::int has
been 
executed few times to compile it to server prepared statement, but after
this 
backend returned field format = text, but sent field data in binary mode.

The parameter is long and is sent to in text mode, with oid 0, as value
2
Result is oid=23, format=(0) T, value = 0x00,0x00,0x00,0x02

When I send parameter explicite in binary mode and int4 oid, problem
doesn't 
occurs.

Here is debug of protocol

---
07:52:02.995 (54) simple execute, 
handler=org.postgresql.jdbc2.abstractjdbc2statement$statementresulthand...@508aeb74,

maxRows=0, fetchSize=0, flags=17
07:52:02.996 (54)  FE= Parse(stmt=null,query=SELECT $1::int,oids={0})
07:52:02.996 (54)  FE= Bind(stmt=null,portal=null,$1='2')
07:52:02.996 (54)  FE= Describe(portal=null)
07:52:02.996 (54)  FE= Execute(portal=null,limit=0)
07:52:02.996 (54)  FE= Sync
07:52:02.997 (54)  =BE ParseComplete [null]
07:52:02.997 (54)  =BE BindComplete [null]
07:52:02.998 (54)  =BE RowDescription(1)
07:52:02.998 (54) Field(,INT4,4,B)
07:52:02.998 (54)  =BE DataRow(len=4)
07:52:02.998 (54)  =BE CommandStatus(SELECT)
07:52:02.999 (54)  =BE ReadyForQuery(I)


---
Compiled stat exec
07:52:06.059 (54) simple execute, 
handler=org.postgresql.jdbc2.abstractjdbc2statement$statementresulthand...@128ae45a,

maxRows=0, fetchSize=0, flags=16
07:52:06.059 (54)  FE= Parse(stmt=S_1,query=SELECT $1::int,oids={0})
07:52:06.059 (54)  FE= Describe(statement=S_1)
07:52:06.060 (54)  FE= Bind(stmt=S_1,portal=null,$1='2')
07:52:06.060 (54)  FE= Execute(portal=null,limit=0)
07:52:06.060 (54)  FE= Sync
07:52:06.060 (54)  =BE ParseComplete [S_1]
07:52:06.061 (54)  =BE ParameterDescription
07:52:06.061 (54)  =BE RowDescription(1)
07:52:06.061 (54) Field(,INT4,4,T)
07:52:06.061 (54)  =BE BindComplete [null]
07:52:06.061 (54)  =BE DataRow(len=4)
07:52:06.062 (54)  =BE CommandStatus(SELECT)
07:52:06.062 (54)  =BE ReadyForQuery(I)
org.postgresql.jdbc4.exceptions.PSQLDataException: Zła wartość dla typu
int: 

Kind regards,
Radek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] profiling connection overhead

2010-11-23 Thread Heikki Linnakangas

On 24.11.2010 07:07, Robert Haas wrote:

Per previous threats, I spent some time tonight running oprofile
(using the directions Tom Lane was foolish enough to provide me back
in May).  I took testlibpq.c and hacked it up to just connect to the
server and then disconnect in a tight loop without doing anything
useful, hoping to measure the overhead of starting up a new
connection.  Ha, ha, funny about that:

120899   18.0616  postgres AtProcExit_Buffers
56891 8.4992  libc-2.11.2.so   memset
30987 4.6293  libc-2.11.2.so   memcpy
26944 4.0253  postgres hash_search_with_hash_value
26554 3.9670  postgres AllocSetAlloc
20407 3.0487  libc-2.11.2.so   _int_malloc
17269 2.5799  libc-2.11.2.so   fread
13005 1.9429  ld-2.11.2.so do_lookup_x
11850 1.7703  ld-2.11.2.so _dl_fixup
10194 1.5229  libc-2.11.2.so   _IO_file_xsgetn

In English: the #1 overhead here is actually something that happens
when processes EXIT, not when they start.  Essentially all the time is
in two lines:

  56920  6.6006 :for (i = 0; i  NBuffers; i++)
:{
  98745 11.4507 :if (PrivateRefCount[i] != 0)


Oh, that's quite surprising.


Anything we can do about this?  That's a lot of overhead, and it'd be
a lot worse on a big machine with 8GB of shared_buffers.


Micro-optimizing that search for the non-zero value helps a little bit 
(attached). Reduces the percentage shown by oprofile from about 16% to 
12% on my laptop.


For bigger gains, I think you need to somehow make the PrivateRefCount 
smaller. Perhaps only use one byte for each buffer instead of int32, and 
use some sort of an overflow list for the rare case that a buffer is 
pinned more than 255 times. Or make it a hash table instead of a simple 
lookup array. But whatever you do, you have to be very careful to not 
add overhead to PinBuffer/UnPinBuffer, those can already be quite high 
in oprofile reports of real applications. It might be worth 
experimenting a bit, at the moment PrivateRefCount takes up 5MB of 
memory per 1GB of shared_buffers. Machines with a high shared_buffers 
setting have no shortage of memory, but a large array like that might 
waste a lot of precious CPU cache.


Now, the other question is if this really matters. Even if we eliminate 
that loop in AtProcExit_Buffers altogether, is connect/disconnect still 
be so slow that you have to use a connection pooler if you do that a lot?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 54c7109..03593fd 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1665,11 +1665,20 @@ static void
 AtProcExit_Buffers(int code, Datum arg)
 {
 	int			i;
+	int		   *ptr;
+	int		   *end;
 
 	AbortBufferIO();
 	UnlockBuffers();
 
-	for (i = 0; i  NBuffers; i++)
+	/* Fast search for the first non-zero entry in PrivateRefCount */
+	end = (int *) PrivateRefCount[NBuffers - 1];
+	ptr = (int *) PrivateRefCount;
+	while(ptr  end  *ptr == 0)
+		ptr++;
+	i = ((int32 *) ptr) - PrivateRefCount;
+
+	for (;i  NBuffers; i++)
 	{
 		if (PrivateRefCount[i] != 0)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-23 Thread Pavel Stehule
2010/11/24 Robert Haas robertmh...@gmail.com:
 On Tue, Nov 23, 2010 at 8:56 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 2010/11/23 Robert Haas robertmh...@gmail.com:
 On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 ok, I can only recapitulate so this feature was proposed cca two
 months ago, and minimally Tom and maybe you did agreement - with
 request on syntax - do you remember? I am little bit tired so this
 agreement was changed when I spent my time with this.

 I went back and reread the thread I believe you're speaking about.
 The first post is here:

 http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php

 Here perhaps ? (or before)

 http://archives.postgresql.org/pgsql-hackers/2010-09/msg01983.php

 Dang.  You're right.  I stand corrected.


Sorry, I though so you and Tom hasn't a problem with syntax
FOR-IN-ARRAY (what is a Kevin Grittner's proposal). So problematic is
just my original proposal FOR-IN-expr, but proposed feature isn't
rejected.

My proposal isn't really genial - is true so first my motivation was
to replace a pattern array_lower(var,1)..array_upper(var,1). It's
relative simple in ADA, statement FOR is defined over range type, and
relative impossible in PL/pgSQL, where range type doesn't exists. Some
special construct in PL/pgSQL can to solve iteration over array
significantly better and simpler then any other solution - this really
must not be syntax FOR-IN-ARRAY - and with any next test and next code
checking I am more sure:

why:
* there is clean indicia so developer wants to process all items in array
* there isn't random access to array
* is possibility for a reuse varlena types stored in array without a
temporal copy

I am sorry, so I didn't speaking about these facts ear

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-23 Thread Pavel Stehule
sorry, there was  a broken message

2010/11/24 Pavel Stehule pavel.steh...@gmail.com:
 2010/11/24 Robert Haas robertmh...@gmail.com:
 On Tue, Nov 23, 2010 at 8:56 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 2010/11/23 Robert Haas robertmh...@gmail.com:
 On Mon, Nov 22, 2010 at 11:55 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 ok, I can only recapitulate so this feature was proposed cca two
 months ago, and minimally Tom and maybe you did agreement - with
 request on syntax - do you remember? I am little bit tired so this
 agreement was changed when I spent my time with this.

 I went back and reread the thread I believe you're speaking about.
 The first post is here:

 http://archives.postgresql.org/pgsql-hackers/2010-09/msg01945.php

 Here perhaps ? (or before)

 http://archives.postgresql.org/pgsql-hackers/2010-09/msg01983.php

 Dang.  You're right.  I stand corrected.


 Sorry, I though so you and Tom hasn't a problem with syntax
 FOR-IN-ARRAY (what is a Kevin Grittner's proposal). So problematic is
 just my original proposal FOR-IN-expr, but proposed feature isn't
 rejected.


Sorry, I though so you and Tom hasn't a problem with syntax
FOR-IN-ARRAY (what is a Kevin Grittner's proposal). I though so
problematic is just my original proposal FOR-IN-expr, but proposed
feature isn't a problem.

My proposal isn't really genial - is true so first my motivation was
to replace unwished  pattern array_lower(var,1)..array_upper(var,1).
It's relative simple in ADA, where statement FOR is defined over range
type, and relative impossible in PL/pgSQL, where range type doesn't
exists yet. Some special construct in PL/pgSQL can to solve iteration
over array significantly better and simpler then any other solution -
there must not be used the syntax FOR-IN-ARRAY - with any next test
and next code checking I am more sure:

why?:
* there is clean indicia so developer wants to process all items in
array, or almost all
* there isn't random access to array!!
* is possibility for a reuse varlena's value stored in array without a
temporal copy - with maybe some trick!!
* there is a very low overhead

I am sorry, so I didn't speaking about these advices early.

I though about other possible syntax - what do you think about  FOR
var OVER expr LOOP ... END LOOP ? OVER is keyword now

Regards

Pavel
 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers