Re: [HACKERS] Getting to beta1

2010-03-18 Thread Simon Riggs
On Wed, 2010-03-17 at 23:29 -0400, Bruce Momjian wrote:
 Simon Riggs wrote:
  On Sat, 2010-03-13 at 11:26 -0800, Josh Berkus wrote:
The list has been reduced greatly in the past week.  What about HS/SR
open items?
   
   I'd like to see vacuum_defer_cleanup_age added to the Archive section
   of postgresql.conf,
  
  Not all parameters are in postgresql.conf.sample. Encouraging people to
  do this is the wrong approach.
  

 I agree with Josh Berkus that vacuum_defer_cleanup_age should be in
 postgresql.conf.  We don't stop listing items just because they are
 dangerous, e.g. fsync, or to discourage their use.  I believe Greg Smith
 also felt it should be included.

Added to postgresql.conf.sample

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQftype implementation

2010-03-18 Thread Pavel Golub
Hello, Tom.

Yes, you are absolutely right. My bad!

Sorry guys! :)

You wrote:

TL Pavel Golub pa...@microolap.com writes:
 Here I created user-defined type my_varchar for internal tests. But
 PQftype returns 1043 (varchar oid) for the info column.

TL Really?  I tried it and got 172069, which is about right for where the
TL OID counter is in my database.  I think you messed up your test.

TL res = PQexec(conn, select * from my_varchar_test);
TL if (PQresultStatus(res) != PGRES_TUPLES_OK)
TL {
TL fprintf(stderr, SELECT failed: %s, PQerrorMessage(conn));
TL PQclear(res);
TL exit_nicely(conn);
TL }

TL nFields = PQnfields(res);
TL for (i = 0; i  nFields; i++)
TL printf(%-15s %d\n, PQfname(res, i), PQftype(res, i));

TL regards, tom lane



-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram
On Thu, Mar 18, 2010 at 2:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Davis pg...@j-davis.com writes:
  There are all kinds of challenges there, but it might be worth thinking
  about. Visibility information is highly compressible, and requires
  constant maintenance (updates, deletes, freezing, etc.). It also might
  make it possible to move to 64-bit xids, if we wanted to.

 If you want it to be cheaply updatable (or even cheaply readable),
 compression is not what you're going to do.

regards, tom lane

 +1..


Re: [HACKERS] Command to prune archive at restartpoints

2010-03-18 Thread Heikki Linnakangas
Committed.

Heikki Linnakangas wrote:
 One awkward omission in the new built-in standby mode, mainly used for
 streaming replication, is that there is no easy way to delete old
 archived files like you do with the %r parameter to restore_command.
 This was discussed at
 http://archives.postgresql.org/pgsql-hackers/2010-02/msg01003.php, among
 other things.
 
 Per discussion, attached patch adds a new restartpoint_command option to
 recovery.conf. That's an external shell command just like
 recovery_end_command that's executed at every restartpoint. You can use
 the %r parameter to pass the filename of the oldest WAL file that needs
 to be retained.
 
 While developing this I noticed that %r in recovery_end_command is not
 working correctly:
 
 LOG:  redo done at 0/14000C10
 LOG:  last completed transaction was at log time 2000-01-01
 02:21:08.816445+02
 cp: cannot stat
 `/home/hlinnaka/pgsql.cvshead/walarchive/00010014': No
 such file or directory
 cp: cannot stat
 `/home/hlinnaka/pgsql.cvshead/walarchive/0002.history': No such file
 or directory
 LOG:  selected new timeline ID: 2
 cp: cannot stat
 `/home/hlinnaka/pgsql.cvshead/walarchive/0001.history': No such file
 or directory
 LOG:  archive recovery complete
 LOG:  checkpoint starting: end-of-recovery immediate wait
 LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
 file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
 total=0.003 s
 LOG:  executing recovery_end_command echo recovery_end_command %r
 recovery_end_command 
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started
 
 Note how %r is always expanded to . That's
 because %r is expanded only when InRedo is true, which makes sense for
 restore_command where that piece of code was copy-pasted from, but it's
 never true anymore when recovery_end_command is run. The attached patch
 fixes that too.
 
 Barring objections, I will commit this later today.
 
 


-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-03-18 Thread Simon Riggs
On Wed, 2010-03-17 at 11:37 +0200, Heikki Linnakangas wrote:

 One awkward omission in the new built-in standby mode, mainly used for
 streaming replication, is that there is no easy way to delete old
 archived files like you do with the %r parameter to restore_command.
 This was discussed at
 http://archives.postgresql.org/pgsql-hackers/2010-02/msg01003.php, among
 other things.

...

 Barring objections, I will commit this later today.

Would it be better to call this archive_cleanup_command? That might
help people understand the need for and the use of this parameter.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram

  Secondly there's the whole retail vacuum problem -- any
 index entries referring to this page would be left dangling unless
 there's some kind of retail vacuum or perhaps a page version number.


The issue, we can divide into two
a)volatile functions
b)broken datatypes

For a) I think volatile function issue can be resolved by using hidden
columns in the heap itself. This will create a duplication of data, but
since the index will get created on it, it will always point to the right
heap tuple

For b) We are already suffering with this issue in any index lookups, index
based updates/deletes, unique constraints, referential integrity maintenance
etc. So hopefully one day we can extend this list to include more :))

Gokul.


Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram


 I didn't mean that we'd want to compress it to the absolute minimum
 size. I had envisioned that it would be a simple scheme designed only to
 eliminate long runs of identical visibility information (perhaps only
 the frozen and always visible regions would be compressed).

 The extra level of indirection would be slower, but if we freeze tuples
 more aggressively (which would be much cheaper if we didn't have to go
 to the heap), there might be a small number of tuples with interesting
 visibility information at any particular time.


 This should be achievable with current proposal of Heikki, but I think it
is useful only for tables which won't have more concurrent operations and on
databases without any long running queries. So if we have an option to
create a visibiliy map ( on the lines of materialized view ), whenever we
want for a table, it would help a good number of use cases, i suppose.

Thanks,
Gokul.


Re: [HACKERS] WIP: shared ispell dictionary

2010-03-18 Thread Heikki Linnakangas
Pavel Stehule wrote:
 attached patch add possibility to share ispell dictionary between
 processes. The reason for this is the slowness of first tsearch query
 and size of allocated memory per process. When I tested loading of
 ispell dictionary (for Czech language) I got about 500 ms and 48MB.
 With simple allocator it uses only 25 MB. If we remove some check and
 tolower string transformation from loading stage it needs only 200 ms.
 But with broken dict or affix file it can put wrong results. This
 patch significantly reduce load on servers that use ispell
 dictionaries.
 
 I know so Tom worries about using of share memory. I think so it
 unnecessarily. After loading data from dictionary are only read, never
 modified. Second idea - this dictionary template can be distributed as
 separate project (it needs a few changes in core - and simple
 allocator).

Fixed-size shared memory blocks are always problematic. Would it be
possible to do the preloading with shared_preload_libraries somehow?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: shared ispell dictionary

2010-03-18 Thread Pavel Stehule
2010/3/18 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 Pavel Stehule wrote:
 attached patch add possibility to share ispell dictionary between
 processes. The reason for this is the slowness of first tsearch query
 and size of allocated memory per process. When I tested loading of
 ispell dictionary (for Czech language) I got about 500 ms and 48MB.
 With simple allocator it uses only 25 MB. If we remove some check and
 tolower string transformation from loading stage it needs only 200 ms.
 But with broken dict or affix file it can put wrong results. This
 patch significantly reduce load on servers that use ispell
 dictionaries.

 I know so Tom worries about using of share memory. I think so it
 unnecessarily. After loading data from dictionary are only read, never
 modified. Second idea - this dictionary template can be distributed as
 separate project (it needs a few changes in core - and simple
 allocator).

 Fixed-size shared memory blocks are always problematic. Would it be
 possible to do the preloading with shared_preload_libraries somehow?

Maybe. But there are some disadvantages: a) you have to copy
dictionary info to config, b) on some systems can be a problem lot of
memory per process (probably not on linux). Still you have to do some
bridge between tsearch cache and preloaded data.

Pavel


 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-18 Thread Fujii Masao
On Wed, Mar 17, 2010 at 7:35 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Fujii Masao wrote:
 I found another missing feature in new file-based log shipping (i.e.,
 standby_mode is enabled and 'cp' is used as restore_command).

 After the trigger file is found, the startup process with pg_standby
 tries to replay all of the WAL files in both pg_xlog and the archive.
 So, when the primary fails, if the latest WAL file in pg_xlog of the
 primary can be read, we can prevent the data loss by copying it to
 pg_xlog of the standby before creating the trigger file.

 On the other hand, the startup process with standby mode doesn't
 replay the WAL files in pg_xlog after the trigger file is found. So
 failover always causes the data loss even if the latest WAL file can
 be read from the primary. And if the latest WAL file is copied to the
 archive instead, it can be replayed but a PANIC error would happen
 because it's not filled.

 We should remove this restriction?

 Looking into this, I realized that we have a bigger problem related to
 this. Although streaming replication stores the streamed WAL files in
 pg_xlog, so that they can be re-replayed after a standby restart without
 connecting to the master, we don't try to replay those either. So if you
 restart standby, it will fail to start up if the WAL it needs can't be
 found in archive or by connecting to the master. That must be fixed.

I agree that this is a bigger problem. Since the standby always starts
walreceiver before replaying any WAL files in pg_xlog, walreceiver tries
to receive the WAL files following the REDO starting point even if they
have already been in pg_xlog. IOW, the same WAL files might be shipped
from the primary to the standby many times. This behavior is unsmart,
and should be addressed.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: shared ispell dictionary

2010-03-18 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 I know so Tom worries about using of share memory.

You're right, and if I have any say in the matter no patch like this
will ever go in.

What I would suggest looking into is some way of preprocessing the raw
text dictionary file into a format that can be slurped into memory
quickly.  The main problem compared to the way things are done now
is that the current internal format relies heavily on pointers.
Maybe you could replace those by offsets?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: shared ispell dictionary

2010-03-18 Thread Pavel Stehule
2010/3/18 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 I know so Tom worries about using of share memory.

 You're right, and if I have any say in the matter no patch like this
 will ever go in.

 What I would suggest looking into is some way of preprocessing the raw
 text dictionary file into a format that can be slurped into memory
 quickly.  The main problem compared to the way things are done now
 is that the current internal format relies heavily on pointers.
 Maybe you could replace those by offsets?

You have to maintain a new application :( There can be a new kind of bugs.

I playing with preload solution now. And I found a new issue.

I don't know why, but when I preload library with large mem like
ispell, then all next operations are ten times slower :(

[pa...@nemesis tsearch]$ psql-dev3 postgres
Timing is on.
psql-dev3 (9.0devel)
Type help for help.

postgres=# select 10;
 ?column?
--
   10
(1 row)

Time: 0,611 ms
postgres=# select 10;
 ?column?
--
   10
(1 row)

Time: 0,277 ms
postgres=# select 10;
 ?column?
--
   10
(1 row)

Time: 0,266 ms
postgres=# select 10;
 ?column?
--
   10
(1 row)

Time: 0,348 ms
postgres=# select * from ts_debug('cs','Jmenuji se Pavel Stěhule a
bydlím ve Skalici');
   alias   |description|  token  |   dictionaries|
   dictionary| lexemes
---+---+-+---+--+
 asciiword | Word, all ASCII   | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | {jmenovat}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | se  | {preloaded_cspell,simple} |
preloaded_cspell | {}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | Pavel   | {preloaded_cspell,simple} |
preloaded_cspell | {pavel,pavla}
 blank | Space symbols | | {}|
 |
 word  | Word, all letters | Stěhule | {preloaded_cspell,simple} |
preloaded_cspell | {stěhule}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | a   | {preloaded_cspell,simple} |
preloaded_cspell | {}
 blank | Space symbols | | {}|
 |
 word  | Word, all letters | bydlím  | {preloaded_cspell,simple} |
preloaded_cspell | {bydlet,bydlit}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | ve  | {preloaded_cspell,simple} |
preloaded_cspell | {}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | Skalici | {preloaded_cspell,simple} |
preloaded_cspell | {skalice}
(15 rows)

Time: 24,495 ms
postgres=# select * from ts_debug('cs','Jmenuji se Pavel Stěhule a
bydlím ve Skalici');
   alias   |description|  token  |   dictionaries|
   dictionary| lexemes
---+---+-+---+--+
 asciiword | Word, all ASCII   | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | {jmenovat}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | se  | {preloaded_cspell,simple} |
preloaded_cspell | {}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | Pavel   | {preloaded_cspell,simple} |
preloaded_cspell | {pavel,pavla}
 blank | Space symbols | | {}|
 |
 word  | Word, all letters | Stěhule | {preloaded_cspell,simple} |
preloaded_cspell | {stěhule}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | a   | {preloaded_cspell,simple} |
preloaded_cspell | {}
 blank | Space symbols | | {}|
 |
 word  | Word, all letters | bydlím  | {preloaded_cspell,simple} |
preloaded_cspell | {bydlet,bydlit}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | ve  | {preloaded_cspell,simple} |
preloaded_cspell | {}
 blank | Space symbols | | {}|
 |
 asciiword | Word, all ASCII   | Skalici | {preloaded_cspell,simple} |
preloaded_cspell | {skalice}
(15 rows)

...skipping...
   alias   |description|  token  |   dictionaries|
   dictionary| lexemes
---+---+-+---+--+
 asciiword | Word, all ASCII   | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | 

Re: [HACKERS] WIP: shared ispell dictionary

2010-03-18 Thread Pavel Stehule
2010/3/18 Pavel Stehule pavel.steh...@gmail.com:
 2010/3/18 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 I know so Tom worries about using of share memory.

 You're right, and if I have any say in the matter no patch like this
 will ever go in.

 What I would suggest looking into is some way of preprocessing the raw
 text dictionary file into a format that can be slurped into memory
 quickly.  The main problem compared to the way things are done now
 is that the current internal format relies heavily on pointers.
 Maybe you could replace those by offsets?

 You have to maintain a new application :( There can be a new kind of bugs.

 I playing with preload solution now. And I found a new issue.

 I don't know why, but when I preload library with large mem like
 ispell, then all next operations are ten times slower :(


this strange issue is from very large memory context. When I don't
join tseach cached context with working context, then this issue
doesn't exists.

Datum
dpreloaddict_init(PG_FUNCTION_ARGS)
{

--if (prepd == NULL)
return dispell_init(fcinfo);  // use without preloading
--else
--{
--
//return PointerGetDatum(prepd);
/*.
 * Add preload context to current conntext -- when
this code is active, then I have a issue
 */
preload_ctx-parent = CurrentMemoryContext;
preload_ctx-nextchild = CurrentMemoryContext-firstchild;
CurrentMemoryContext-firstchild = preload_ctx;

return PointerGetDatum(prepd);
--}
}

Pavel


 When I reduce memory with simple allocator, then this issue is
 removed, but it is strange.

 Pavel



                        regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER ROLE/DATABASE RESET ALL versus security

2010-03-18 Thread Alvaro Herrera
Bruce Momjian wrote:
 Alvaro Herrera wrote:
  Tom Lane wrote:
   Alvaro Herrera alvhe...@commandprompt.com writes:
Tom Lane wrote:
It looks to me like the code in AlterSetting() will allow an ordinary
user to blow away all settings for himself.  Even those that are for
SUSET variables and were presumably set for him by a superuser.  Isn't
this a security hole?  I would expect that an unprivileged user should
not be able to change such settings, not even to the extent of
reverting to the installation-wide default.
   
Yes, it is, but this is not a new hole.  This works just fine in 8.4
too:
   
   So I'd argue for changing it in 8.4 too.
  
  Understood.  I'm starting to look at what this requires.
 
 Any progress on this?

I have come up with the attached patch.  I haven't tested it fully yet,
and I need to backport it.  The gist of it is: we can't simply remove
the pg_db_role_setting tuple, we need to ask GUC to reset the settings
array, for which it checks superuser-ness on each setting.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Index: src/backend/catalog/pg_db_role_setting.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/catalog/pg_db_role_setting.c,v
retrieving revision 1.3
diff -c -p -r1.3 pg_db_role_setting.c
*** src/backend/catalog/pg_db_role_setting.c	26 Feb 2010 02:00:37 -	1.3
--- src/backend/catalog/pg_db_role_setting.c	18 Mar 2010 15:43:14 -
*** AlterSetting(Oid databaseid, Oid roleid,
*** 49,55 
  	/*
  	 * There are three cases:
  	 *
! 	 * - in RESET ALL, simply delete the pg_db_role_setting tuple (if any)
  	 *
  	 * - in other commands, if there's a tuple in pg_db_role_setting, update
  	 * it; if it ends up empty, delete it
--- 49,56 
  	/*
  	 * There are three cases:
  	 *
! 	 * - in RESET ALL, request GUC to reset the settings array and update the
! 	 * catalog if there's anything left, delete it otherwise
  	 *
  	 * - in other commands, if there's a tuple in pg_db_role_setting, update
  	 * it; if it ends up empty, delete it
*** AlterSetting(Oid databaseid, Oid roleid,
*** 60,66 
  	if (setstmt-kind == VAR_RESET_ALL)
  	{
  		if (HeapTupleIsValid(tuple))
! 			simple_heap_delete(rel, tuple-t_self);
  	}
  	else if (HeapTupleIsValid(tuple))
  	{
--- 61,101 
  	if (setstmt-kind == VAR_RESET_ALL)
  	{
  		if (HeapTupleIsValid(tuple))
! 		{
! 			ArrayType  *new = NULL;
! 			Datum		datum;
! 			bool		isnull;
! 
! 			datum = heap_getattr(tuple, Anum_pg_db_role_setting_setconfig,
!  RelationGetDescr(rel), isnull);
! 
! 			if (!isnull)
! new = GUCArrayReset(DatumGetArrayTypeP(datum));
! 
! 			if (new)
! 			{
! Datum		repl_val[Natts_pg_db_role_setting];
! bool		repl_null[Natts_pg_db_role_setting];
! bool		repl_repl[Natts_pg_db_role_setting];
! HeapTuple	newtuple;
! 
! memset(repl_repl, false, sizeof(repl_repl));
! 
! repl_val[Anum_pg_db_role_setting_setconfig - 1] =
! 	PointerGetDatum(new);
! repl_repl[Anum_pg_db_role_setting_setconfig - 1] = true;
! repl_null[Anum_pg_db_role_setting_setconfig - 1] = false;
! 
! newtuple = heap_modify_tuple(tuple, RelationGetDescr(rel),
! 			 repl_val, repl_null, repl_repl);
! simple_heap_update(rel, tuple-t_self, newtuple);
! 
! /* Update indexes */
! CatalogUpdateIndexes(rel, newtuple);
! 			}
! 			else
! simple_heap_delete(rel, tuple-t_self);
! 		}
  	}
  	else if (HeapTupleIsValid(tuple))
  	{
Index: src/backend/utils/misc/guc.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.543
diff -c -p -r1.543 guc.c
*** src/backend/utils/misc/guc.c	26 Feb 2010 02:01:14 -	1.543
--- src/backend/utils/misc/guc.c	18 Mar 2010 15:39:15 -
*** ParseLongOption(const char *string, char
*** 7099,7105 
  
  
  /*
!  * Handle options fetched from pg_database.datconfig, pg_authid.rolconfig,
   * pg_proc.proconfig, etc.	Caller must specify proper context/source/action.
   *
   * The array parameter must be an array of TEXT (it must not be NULL).
--- 7099,7105 
  
  
  /*
!  * Handle options fetched from pg_db_role_setting.setconfig,
   * pg_proc.proconfig, etc.	Caller must specify proper context/source/action.
   *
   * The array parameter must be an array of TEXT (it must not be NULL).
*** ProcessGUCArray(ArrayType *array,
*** 7151,7156 
--- 7151,7157 
  		free(name);
  		if (value)
  			free(value);
+ 		pfree(s);
  	}
  }
  
*** GUCArrayDelete(ArrayType *array, const c
*** 7285,7290 
--- 7286,7370 
  			 val[strlen(name)] == '=')
  			continue;
  
+ 
+ 		/* else add it to the output array */
+ 		if (newarray)
+ 		{
+ 			newarray = array_set(newarray, 1, index,
+ 

Re: [HACKERS] An idle thought

2010-03-18 Thread Chris Browne
si...@2ndquadrant.com (Simon Riggs) writes:
 On Tue, 2010-03-16 at 15:29 +, Greg Stark wrote:

 big batch delete

 Is one of the reasons for partitioning, allowing the use of truncate.

Sure, but it would be even nicer if DELETE could be thus made cheaper
without needing to interfere with the schema.

The concurrency issue might be resolved (*might!*) by the following
complication...

 - A delete request is looking at a page, and concludes, oh, all the
   tuples here are now marked dead!.

 - It flags the page as *possibly* dead.  Almost what Greg suggests for
   the visibility map, but this is just marking it as proposed dead.

 - It throws the page number, along with xid, into a side map.

When something wants to do something with the page (e.g. - vacuum), it
sees that it's possibly dead, and looks at the side map for the list
of xids that wanted to mark the page dead.

for each xid:
   if xid is still active
   do nothing with it
   else 
   remove xid entry from the map

if all xids were failed
   remove flag from page
if any xid committed
   empty the page; the tuples are all dead

I'm less confident about that last clause - I *think* that if *any*
page-clearing XID is found, that means the page is well and truly clear,
doesn't it?

The extra map mayn't be a nice thing.

It's food for thought, anyways.
-- 
let name=cbbrowne and tld=linuxfinances.info in String.concat @ 
[name;tld];;
The real  problem with the  the year 2000  is that there are  too many
zero bits and that adversely affects the global bit density.
-- Boyd Roberts b...@france3.fr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER ROLE/DATABASE RESET ALL versus security

2010-03-18 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 I have come up with the attached patch.  I haven't tested it fully yet,
 and I need to backport it.  The gist of it is: we can't simply remove
 the pg_db_role_setting tuple, we need to ask GUC to reset the settings
 array, for which it checks superuser-ness on each setting.

I think you still want to have a code path whereby the tuple will be
deleted once the array is empty.  Failing to check that is inefficient
and also exposes clients such as pg_dump to corner case bugs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: shared ispell dictionary

2010-03-18 Thread Pavel Stehule
2010/3/18 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 I know so Tom worries about using of share memory.

 You're right, and if I have any say in the matter no patch like this
 will ever go in.

I wrote second patch based on preloading. For real using it needs to
design parametrisation. It working well - on Linux. It is simple and
fast (with simple alloc). I am not sure about others systems.
Minimally it can exists as contrib module.

Pavel


 What I would suggest looking into is some way of preprocessing the raw
 text dictionary file into a format that can be slurped into memory
 quickly.  The main problem compared to the way things are done now
 is that the current internal format relies heavily on pointers.
 Maybe you could replace those by offsets?

                        regards, tom lane



preload.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] C libpq frontend library fetchsize

2010-03-18 Thread Yeb Havinga

Robert Haas wrote:

On Fri, Feb 26, 2010 at 3:28 PM, Yeb Havinga yebhavi...@gmail.com wrote:
  

I'm wondering if there would be community support for adding using the
execute message with a rownum  0 in the c libpq client library, as it is
used by the jdbc driver with setFetchSize.



Not sure I follow what you're asking...  what would the new/changed
function signature be?
  

Hello Robert, list

I'm sorry I did not catch your reply until I searched in the archives on 
libpq, I hope you are not offended. However I think the question is 
answered somewhat in a reply I sent to Takahiro Itagaki, viz: 
http://archives.postgresql.org/pgsql-hackers/2010-03/msg00015.php


The recent posting in PERFORM where someone compares mysql vs postgresql 
speed is caused by libpq / whole pgresult as one time.  
(http://archives.postgresql.org/pgsql-performance/2010-03/msg00228.php)


ISTM that using cursors and then fetch is not an adequate solution, 
because 1) someone must realise that the pgresult object is 
gathered/transfered under the hood of libpq completely before the first 
row can be used by the application 2) the structure of the application 
layer is altered to make use of partial results.


What if the default operation of e.g. php using libpq would be as 
follows: set some default fetchsize (e.g. 1000 rows), then just issue 
getrow. In the php pg handling, a function like getnextrow would wait 
for the first pgresult with 1000 rows. Then if the pgresult is depleted 
or almost depleted, request the next pgresult automatically. I see a lot 
of benefits like less memory requirements in libpq, less new users with 
why is my query so slow before the first row, and almost no concerns. A 
small overhead of row description messages perhaps. Maybe the biggest 
benefit of a pgsetfetchsize api call would be to raise awareness that of 
the fact that pgresults are transfered completely (or partially if there 
is animo for me/collegue of mine working on a patch for this).


Besides that, another approach to get data to clients faster could be by 
perhaps using lzo, much in the same way that google uses zippy (see e.g. 
http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/) 
to speed up data transfer and delivery. LZO has been mentioned before on 
mailing lists for pg_dump compression, but I think that with a 
--enable-lzo also libpq could benefit too. 
(http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php)


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] C libpq frontend library fetchsize

2010-03-18 Thread Tom Lane
Yeb Havinga yebhavi...@gmail.com writes:
 What if the default operation of e.g. php using libpq would be as 
 follows: set some default fetchsize (e.g. 1000 rows), then just issue 
 getrow. In the php pg handling, a function like getnextrow would wait 
 for the first pgresult with 1000 rows. Then if the pgresult is depleted 
 or almost depleted, request the next pgresult automatically. I see a lot 
 of benefits like less memory requirements in libpq, less new users with 
 why is my query so slow before the first row, and almost no concerns.

You are blithely ignoring the reasons why libpq doesn't do this.  The
main one being that it's impossible to cope sanely with queries that
fail partway through execution.  The described implementation would not
cope tremendously well with nonsequential access to the resultset, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Josh Berkus

 It's already in the docs, so if they read it and understand it they can
 add it to the postgresql.conf if they so choose.
 
 I agree with Josh Berkus that vacuum_defer_cleanup_age should be in
 postgresql.conf.  We don't stop listing items just because they are
 dangerous, e.g. fsync, or to discourage their use.  I believe Greg Smith
 also felt it should be included.

Or, let's put it another way: I've made my opinion clear in the past
that I think that we ought to ship with a minimal postgresql.conf with
maybe 15 items in it.  If we are going to continue to ship with
postgresql.conf kitchen sick version, however, it should include
vacuum_defer_cleanup_age.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] C libpq frontend library fetchsize

2010-03-18 Thread Yeb Havinga

Tom Lane wrote:

Yeb Havinga yebhavi...@gmail.com writes:
  
What if the default operation of e.g. php using libpq would be as 
follows: set some default fetchsize (e.g. 1000 rows), then just issue 
getrow. In the php pg handling, a function like getnextrow would wait 
for the first pgresult with 1000 rows. Then if the pgresult is depleted 
or almost depleted, request the next pgresult automatically. I see a lot 
of benefits like less memory requirements in libpq, less new users with 
why is my query so slow before the first row, and almost no concerns.



You are blithely ignoring the reasons why libpq doesn't do this.  The
main one being that it's impossible to cope sanely with queries that
fail partway through execution.
I'm sorry I forgot to add a reference to your post of 
http://archives.postgresql.org/pgsql-general/2010-02/msg00956.php which 
is the only reference to queries failing partway that I know of. But 
blithely is not a good description of me ignoring it. I though about how 
queries could fail, but can't think of anything else than e.g. memory 
exhaustion, and that is just one of the things that is improved this 
way. Maybe a user defined type with an error on certain data values, but 
then the same arguing could be: why support UDT? And if a query fails 
during execution, does that mean that the rows returned until that point 
are wrong?

  The described implementation would not
cope tremendously well with nonsequential access to the resultset, either.
  
That's why I'm not proposing to replace the current way pgresults are 
made complete, but just an extra option to enable developers using the 
libpq libary making the choice themselves.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Joshua D. Drake
On Thu, 2010-03-18 at 10:18 -0700, Josh Berkus wrote:
  It's already in the docs, so if they read it and understand it they can
  add it to the postgresql.conf if they so choose.
  
  I agree with Josh Berkus that vacuum_defer_cleanup_age should be in
  postgresql.conf.  We don't stop listing items just because they are
  dangerous, e.g. fsync, or to discourage their use.  I believe Greg Smith
  also felt it should be included.
 
 Or, let's put it another way: I've made my opinion clear in the past
 that I think that we ought to ship with a minimal postgresql.conf with
 maybe 15 items in it.  If we are going to continue to ship with
 postgresql.conf kitchen sick version, however, it should include
 vacuum_defer_cleanup_age.

+1

As usual, the postgresql.conf is entirely too full. We should ship with
the top 15. If this gains any traction, I am sure that Greg Smith,
Berkus and I could provide that list with nothing but a care bear
discussion.

Joshua D. Drake

 
 -- 
   -- Josh Berkus
  PostgreSQL Experts Inc.
  http://www.pgexperts.com
 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or 
Sir.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Thu, 2010-03-18 at 14:29 +0530, Gokulakannan Somasundaram wrote:

 If you want it to be cheaply updatable (or even cheaply
 readable),
 compression is not what you're going to do.
 
regards, tom lane
 
 
 
 +1.. 

The visibility map itself is already an example of compression. If
visibility information were randomly distributed among tuples, the
visibility map would be nearly useless.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-18 Thread Greg Smith

Takahiro Itagaki wrote:

The log filenames are pgbench_log.main-process-id.thread-serial-number
for each thread, but the first thread (including single-threaded) still uses
pgbench_log.main-process-id for the name because of compatibility.
  


Attached is an updated version that I think is ready to commit.  Only 
changes are docs--I rewrote those to improve the wording some.  The code 
looked and tested fine to me.  I just added support for the new format 
to pgbench-tools and am back to happily running large batches of tests 
using it again.


I confirmed a few things:

-On my CentOS system, the original problem is masked if you have 
--enable-thread-safety on; the multi-threaded output shows up without 
any broken lines into the single file.  As I suspected it's only the 
multi-process implementation that shows the issue here.  Since Tom 
points out that's luck rather than something that should be relied upon, 
I don't think that actually changes what to do here, it just explains 
why this wasn't obvious in earlier testing--normally I have thread 
safety on nowadays.


-Patch corrects the problem.  I took a build without thread safety on, 
demonstrated the issue with its pgbench.  Apply the patch, rebuild just 
pgbench, run again; new multiple log files have no issue.


-It's easy to convert existing scripts to utilize the new multiple log 
format.  Right now the current idiom you're forced into using when 
running pgbench scripts is to track the PID it's run as, then use 
something like:


mv pgbench_log.${PID} pgbench.log

To convert to a stable filename for later processing.  Now you just use 
something like this instead:


cat pgbench_log.${PID}*  pgbench.log
rm -f pgbench_log.${PID}*

And that works fine. 


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 0019db4..28a8c84 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -131,11 +131,9 @@ int			fillfactor = 100;
 #define ntellers	10
 #define naccounts	10
 
-FILE	   *LOGFILE = NULL;
-
 bool		use_log;			/* log transaction latencies to a file */
-
-int			is_connect;			/* establish connection for each transaction */
+bool		is_connect;			/* establish connection for each transaction */
+int			main_pid;			/* main process id used in log filename */
 
 char	   *pghost = ;
 char	   *pgport = ;
@@ -183,6 +181,7 @@ typedef struct
  */
 typedef struct
 {
+	int			tid;			/* thread id */
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
@@ -741,7 +740,7 @@ clientDone(CState *st, bool ok)
 
 /* return false iff client should be disconnected */
 static bool
-doCustom(CState *st, instr_time *conn_time)
+doCustom(CState *st, instr_time *conn_time, FILE *log_file)
 {
 	PGresult   *res;
 	Command   **commands;
@@ -778,7 +777,7 @@ top:
 		/*
 		 * transaction finished: record the time it took in the log
 		 */
-		if (use_log  commands[st-state + 1] == NULL)
+		if (log_file  commands[st-state + 1] == NULL)
 		{
 			instr_time	now;
 			instr_time	diff;
@@ -791,12 +790,12 @@ top:
 
 #ifndef WIN32
 			/* This is more than we really ought to know about instr_time */
-			fprintf(LOGFILE, %d %d %.0f %d %ld %ld\n,
+			fprintf(log_file, %d %d %.0f %d %ld %ld\n,
 	st-id, st-cnt, usec, st-use_file,
 	(long) now.tv_sec, (long) now.tv_usec);
 #else
 			/* On Windows, instr_time doesn't provide a timestamp anyway */
-			fprintf(LOGFILE, %d %d %.0f %d 0 0\n,
+			fprintf(log_file, %d %d %.0f %d 0 0\n,
 	st-id, st-cnt, usec, st-use_file);
 #endif
 		}
@@ -857,7 +856,7 @@ top:
 		INSTR_TIME_ACCUM_DIFF(*conn_time, end, start);
 	}
 
-	if (use_log  st-state == 0)
+	if (log_file  st-state == 0)
 		INSTR_TIME_SET_CURRENT(st-txn_begin);
 
 	if (commands[st-state]-type == SQL_COMMAND)
@@ -1833,7 +1832,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'C':
-is_connect = 1;
+is_connect = true;
 break;
 			case 's':
 scale_given = true;
@@ -1955,6 +1954,12 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	/*
+	 * save main process id in the global variable because process id will be
+	 * changed after fork.
+	 */
+	main_pid = (int) getpid();
+
 	if (nclients  1)
 	{
 		state = (CState *) realloc(state, sizeof(CState) * nclients);
@@ -1980,20 +1985,6 @@ main(int argc, char **argv)
 		}
 	}
 
-	if (use_log)
-	{
-		char		logpath[64];
-
-		snprintf(logpath, 64, pgbench_log.%d, (int) getpid());
-		LOGFILE = fopen(logpath, w);
-
-		if (LOGFILE == NULL)
-		{
-			fprintf(stderr, Couldn't open logfile \%s\: %s, logpath, strerror(errno));
-			exit(1);
-		}
-	}
-
 	if (debug)
 	{
 		if (duration = 0)
@@ -2111,6 +2102,7 @@ main(int argc, char **argv)
 	threads = (TState *) malloc(sizeof(TState) * nthreads);
 	for (i = 0; i  nthreads; i++)
 	{
+		threads[i].tid = i;
 		threads[i].state = state[nclients / nthreads 

Re: [HACKERS] Getting to beta1

2010-03-18 Thread Greg Smith

Joshua D. Drake wrote:

As usual, the postgresql.conf is entirely too full. We should ship with
the top 15.


Maybe, but what we should do is ship, and then talk about this again 
when it's appropriate--earlier in the release cycle.  Let me try and cut 
this one off before it generates a bunch of traffic by summarizing where 
this is stuck at.


We started this release with a good plan for pulling off a major 
postgresql.conf trimming effort that I still like a lot ( 
http://wiki.postgresql.org/wiki/PgCon_2009_Developer_Meeting#Auto-Tuning 
)  The first step was switching over to a directory-based structure that 
allowed being all things to all people just by selecting which of the 
files provided you put into there.  We really need the things initdb 
touches to go into a separate file, rather than the bloated sample, in a 
way that it's easy to manage; if you just drop files into a directory 
and the server reads them all that's the easiest route.  Extending to 
include the top 15 or whatever other subset people want is easy after that.


Now, that didn't go anywhere in this release due to development focus 
constraints, but I'm willing to take has what we can advertise as 
built-in replication as a disappointing but acceptable substitute in 
lieu of that.  (rolls eyes)  I think it will fit nicely into the 9.1 
adds the polish theme already gathering around the replication features 
being postponed to the next release.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Marc G. Fournier

On Thu, 18 Mar 2010, Joshua D. Drake wrote:


On Thu, 2010-03-18 at 10:18 -0700, Josh Berkus wrote:

It's already in the docs, so if they read it and understand it they can
add it to the postgresql.conf if they so choose.


I agree with Josh Berkus that vacuum_defer_cleanup_age should be in
postgresql.conf.  We don't stop listing items just because they are
dangerous, e.g. fsync, or to discourage their use.  I believe Greg Smith
also felt it should be included.


Or, let's put it another way: I've made my opinion clear in the past
that I think that we ought to ship with a minimal postgresql.conf with
maybe 15 items in it.  If we are going to continue to ship with
postgresql.conf kitchen sick version, however, it should include
vacuum_defer_cleanup_age.


+1

As usual, the postgresql.conf is entirely too full. We should ship with
the top 15. If this gains any traction, I am sure that Greg Smith,
Berkus and I could provide that list with nothing but a care bear
discussion.


+1 ... but, why the 'top 15'?  why not just those that are uncommented to 
start with, and leave those that are commented out as 'in the docs' ... ?



Marc G. FournierHub.Org Hosting Solutions S.A.
scra...@hub.org http://www.hub.org

Yahoo:yscrappySkype: hub.orgICQ:7615664MSN:scra...@hub.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram
 The visibility map itself is already an example of compression. If
 visibility information were randomly distributed among tuples, the
 visibility map would be nearly useless.


 I believe it is very difficult to make visibility map update friendly
without compromising durability.  But such a functionality is very much
wanted in PG still.


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Fri, 2010-03-19 at 01:59 +0530, Gokulakannan Somasundaram wrote:
 
 The visibility map itself is already an example of
 compression. If
 visibility information were randomly distributed among tuples,
 the
 visibility map would be nearly useless.
 
 
 I believe it is very difficult to make visibility map update friendly
 without compromising durability.  But such a functionality is very
 much wanted in PG still.

Surely the VM is already update-friendly. If you update a tuple in a
page with the visibility bit set, the bit must be unset or you will get
wrong results.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Fri, 2010-03-19 at 01:59 +0530, Gokulakannan Somasundaram wrote:
 I believe it is very difficult to make visibility map update friendly
 without compromising durability.  But such a functionality is very
 much wanted in PG still.

 Surely the VM is already update-friendly. If you update a tuple in a
 page with the visibility bit set, the bit must be unset or you will get
 wrong results.

The VM is (a) not compressed and (b) not correctness-critical.
Wrong bit values don't do any serious damage.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Thu, 2010-03-18 at 16:50 -0400, Tom Lane wrote:
 The VM is (a) not compressed and (b) not correctness-critical.
 Wrong bit values don't do any serious damage.

The VM cause wrong results if a bit is set that's not supposed to be --
right? Am I missing something? How does a seq scan skip visibility
checks and still produce right results, if it doesn't rely on the bit?

The visibility map would obviously not be very useful if visibility
information was randomly distributed among tuples. Whether that
qualifies as compression or not was not my primary point. The point is
that it may be possible to use some structure that is significantly
smaller than holding xmin/xmax for every tuple in the heap, and at the
same time may be acceptably fast to update.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Alex Hunsaker
On Thu, Mar 18, 2010 at 15:07, Jeff Davis pg...@j-davis.com wrote:
 On Thu, 2010-03-18 at 16:50 -0400, Tom Lane wrote:
 The VM is (a) not compressed and (b) not correctness-critical.
 Wrong bit values don't do any serious damage.

 The VM cause wrong results if a bit is set that's not supposed to be --
 right? Am I missing something? How does a seq scan skip visibility
 checks and still produce right results, if it doesn't rely on the bit?

Isn't it only really used for VACUUM at this point?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Thu, 2010-03-18 at 16:50 -0400, Tom Lane wrote:
 The VM is (a) not compressed and (b) not correctness-critical.
 Wrong bit values don't do any serious damage.

 The VM cause wrong results if a bit is set that's not supposed to be --
 right? Am I missing something? How does a seq scan skip visibility
 checks and still produce right results, if it doesn't rely on the bit?

It doesn't.  The only thing we currently rely on the VM for is deciding
whether a page needs vacuuming --- and even that we don't trust it for
when doing anti-wraparound vacuuming.  The worst-case consequence of a
wrong bit is failure to free some dead tuples until the vacuum freeze
limit expires.

In order to do things like not visiting a page during scans, we'll have
to solve the reliability issues.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Greg Stark
On Thu, Mar 18, 2010 at 9:07 PM, Jeff Davis pg...@j-davis.com wrote:
 The VM cause wrong results if a bit is set that's not supposed to be --
 right? Am I missing something? How does a seq scan skip visibility
 checks and still produce right results, if it doesn't rely on the bit?


There's also a PD_ALL_VISIBLE flag on the page header. We wal log when
we clear that bit so we can trust if it's set then all the tuples
really are visible. I forget whether we can trust it if it's *not* set
but there's not much point -- all tuples could become visible
spontaneously even the page is sitting on disk.
-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Thu, 2010-03-18 at 17:17 -0400, Tom Lane wrote:
  The VM cause wrong results if a bit is set that's not supposed to be --
  right? Am I missing something? How does a seq scan skip visibility
  checks and still produce right results, if it doesn't rely on the bit?
 
 It doesn't.  The only thing we currently rely on the VM for is deciding
 whether a page needs vacuuming

Oh, my mistake. I misremembered the discussion and I thought the seq
scan optimization made it in.

 In order to do things like not visiting a page during scans, we'll have
 to solve the reliability issues.

Yeah, and also for the index-only scans.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Thu, 2010-03-18 at 14:48 -0700, Jeff Davis wrote:
 On Thu, 2010-03-18 at 17:17 -0400, Tom Lane wrote:
   The VM cause wrong results if a bit is set that's not supposed to be --
   right? Am I missing something? How does a seq scan skip visibility
   checks and still produce right results, if it doesn't rely on the bit?
  
  It doesn't.  The only thing we currently rely on the VM for is deciding
  whether a page needs vacuuming
 
 Oh, my mistake. I misremembered the discussion and I thought the seq
 scan optimization made it in.
 

Hmm...

From heapgetpage() in heapam.c, CVS HEAD:
/*  

* If the all-visible flag indicates that all tuples on the page
are
* visible to everyone, we can skip the per-tuple visibility tests.
But 

I tested in gdb, and it calls HeapTupleSatisfiesMVCC, until I VACUUM a
few times, and then it doesn't call it any more. So, apparently the seq
scan optimization _is_ there. And that means it is correctness-critical.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 I tested in gdb, and it calls HeapTupleSatisfiesMVCC, until I VACUUM a
 few times, and then it doesn't call it any more. So, apparently the seq
 scan optimization _is_ there. And that means it is correctness-critical.

The page header bit is critical.  Not the VM.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Robert Haas
On Thu, Mar 18, 2010 at 4:28 PM, Marc G. Fournier scra...@hub.org wrote:
 On Thu, 18 Mar 2010, Joshua D. Drake wrote:

 On Thu, 2010-03-18 at 10:18 -0700, Josh Berkus wrote:

 It's already in the docs, so if they read it and understand it they can
 add it to the postgresql.conf if they so choose.

 I agree with Josh Berkus that vacuum_defer_cleanup_age should be in
 postgresql.conf.  We don't stop listing items just because they are
 dangerous, e.g. fsync, or to discourage their use.  I believe Greg Smith
 also felt it should be included.

 Or, let's put it another way: I've made my opinion clear in the past
 that I think that we ought to ship with a minimal postgresql.conf with
 maybe 15 items in it.  If we are going to continue to ship with
 postgresql.conf kitchen sick version, however, it should include
 vacuum_defer_cleanup_age.

 +1

 As usual, the postgresql.conf is entirely too full. We should ship with
 the top 15. If this gains any traction, I am sure that Greg Smith,
 Berkus and I could provide that list with nothing but a care bear
 discussion.

 +1 ... but, why the 'top 15'?  why not just those that are uncommented to
 start with, and leave those that are commented out as 'in the docs' ... ?

+1 to either proposal.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Joshua D. Drake
On Thu, 2010-03-18 at 19:10 -0400, Robert Haas wrote:
 On Thu, Mar 18, 2010 at 4:28 PM, Marc G. Fournier scra...@hub.org wrote:
  On Thu, 18 Mar 2010, Joshua D. Drake wrote:
 
  On Thu, 2010-03-18 at 10:18 -0700, Josh Berkus wrote:
 
  It's already in the docs, so if they read it and understand it they can
  add it to the postgresql.conf if they so choose.
 
  I agree with Josh Berkus that vacuum_defer_cleanup_age should be in
  postgresql.conf.  We don't stop listing items just because they are
  dangerous, e.g. fsync, or to discourage their use.  I believe Greg Smith
  also felt it should be included.
 
  Or, let's put it another way: I've made my opinion clear in the past
  that I think that we ought to ship with a minimal postgresql.conf with
  maybe 15 items in it.  If we are going to continue to ship with
  postgresql.conf kitchen sick version, however, it should include
  vacuum_defer_cleanup_age.
 
  +1
 
  As usual, the postgresql.conf is entirely too full. We should ship with
  the top 15. If this gains any traction, I am sure that Greg Smith,
  Berkus and I could provide that list with nothing but a care bear
  discussion.
 
  +1 ... but, why the 'top 15'?  why not just those that are uncommented to
  start with, and leave those that are commented out as 'in the docs' ... ?
 
 +1 to either proposal.

I think top 15 was arbitrary. The point, is that our postgresql.conf is
ridiculous. For 99% of installations, only a dozen to a dozen and a half
of the options are relevant.

 
 ...Robert
 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or 
Sir.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Greg Smith

Robert Haas wrote:

On Thu, Mar 18, 2010 at 4:28 PM, Marc G. Fournier scra...@hub.org wrote:
  

On Thu, 18 Mar 2010, Joshua D. Drake wrote:



On Thu, 2010-03-18 at 10:18 -0700, Josh Berkus wrote:
  


Or, let's put it another way: I've made my opinion clear in the past
that I think that we ought to ship with a minimal postgresql.conf with
maybe 15 items in it.

+1
  

+1 ... but, why the 'top 15'?  why not just those that are uncommented to
start with, and leave those that are commented out as 'in the docs' ... ?


+1 to either proposal.
  


If this is turning into a vote:  -1 from me for any work on this until 
http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items is cleared.  
It boggles my mind that anyone could have a different prioritization 
right now.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Josh Berkus
On 3/18/10 4:54 PM, Greg Smith wrote:
 
 If this is turning into a vote:  -1 from me for any work on this until
 http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items is cleared. 
 It boggles my mind that anyone could have a different prioritization
 right now.

Yes.  I wasn't suggesting that we change postgresql.conf.sample for this
release.  Feature Freeze was a while ago.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-18 Thread Robert Haas
On Thu, Mar 18, 2010 at 7:54 PM, Greg Smith g...@2ndquadrant.com wrote:
 If this is turning into a vote:  -1 from me for any work on this until
 http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items is cleared.  It
 boggles my mind that anyone could have a different prioritization right now.

This isn't about priority, at least in my mind.  It's about consensus.
 If we have consensus, the work can get done for 9.1.  It seems like
everyone is on the same page - that is a good thing.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers