date:20100408

There are a couple of features that were considered but not committed
for 9.0 which require additional information about the properties of
various data types.  At PGeast I had a chance to talk with Jeff Davis
about this briefly, and wanted to write up some of what we talked
about plus some further thoughts of my own before they went out of my
head.  The features I'm thinking about are:

1. knngist wants to use index scans to speed up queries of the form
SELECT ... ORDER BY(as opposed to the
existing machinery which only knows how to use an index for SELECT ...
ORDER BY ).
2. Window functions want to define windows over a range of values
defined by the underlying data type.  To do this, we need to define
what addition and subtraction mean for a particular data type.
3. Jeff Davis is interested in implementing range types.  When the
underlying base type is discrete, e.g. integers, you can say that
[1,3] = [1,4), but only if you know that 3 and 4 are consecutive (in
that order).

All of these problems seem loosely related: we're teaching the
database more about how certain types behave.  But there are some
important differences.  In case #1, we're trying to teach the planner
that if it sees a certain operator, it can map that operator onto an
AM strategy - so the knowledge is fundamentally AM-specific.  In the
remaining cases, the information needed is a property of the
underlying data type with no natural or logical relationship to an
access method.  For that reason, I don't think there's going to be any
clean way to create a single mechanism that will encompass all of
these needs, though maybe someone has a clever idea I'm not thinking
of.

What we discussed doing before for #1 (and it make sense to me) is to
add a column pg_amop.amopcategory.  The existing operator class
functions will constitute one category - map a boolean operator onto
an index strategy number that can be used to treat a filter condition
as an index qual.  The functions knngist cares about will be a second
category - map an operator that returns some arbitrary type that is
legal in the context of an ORDER BY clause onto an index strategy
number that can regurgitate the tuples in the order defined by the
return type.

While it might be possible to shoehorn the remaining cases into the
operator class machinery, it seems likely that it will be nothing but
ugly.  The whole charter of the operator class machinery at least AIUI
is to map operators onto AM-specific index strategy numbers, and there
is neither an applicable AM nor a strategy number for it in any of
these cases.  So I think it's time to create a separate concept of
type interfaces (Jeff Davis proposed this name, and I like it).  What
might this look like?

Given a type T, I think we'd like to be able to define a type U as
"the natural type to be added to or subtracted from T".  As Jeff
pointed out to me, this is not necessarily the same as the underlying
type.  For example, if T is a timestamp, U is an interval; if T is a
numeric, U is also a numeric; if T is a cidr, U is an integer.  Then
we'd like to define a canonical addition operator and a canonical
subtraction operator.  I think that would be sufficient for the needs
of RANGE BETWEEN ... PRECEDING AND ... FOLLOWING.  It would also be
nearly sufficient for range types, but in that case you also need to
specify the unit increment within U - i.e. a "1" value for the
datatype.  It may or may not be worth building the concept of a unit
increment into the type interface machinery, though: one could imagine
two different range types built over the same base type with different
unit increments - e.g. one timestamp range with unit increment = 1s,
and one with unit increment = 1m.  Under the first type [4pm,5pm) =
[4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm].

In a previous thread on this topic, in response to a question about
other possible needs for operator knowledge, Hitoshi Harada mentioned
range partitioning as another possible case.  If we have a partition
where x ranges from 1 to 100, that's basically saying that x >= 1 and
x <= 100, for suitable values of >= and <=.  I think we're already
habituated to doing this kind of thing by looking at the default btree
opclass for the data type and looking for the operator that
implements, e.g. BTLessEqualsStrategyNumber.  It might be cleaner to
get all this information from type interfaces, but I'm not sure
whether it's reasonable (either for reasons of complexity or of
performance) to think about untangling all the places where we've
already made this assumption and redoing them.

Thoughts?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] a faster compression algorithm for pg_dump

2010-04-08 Thread Joachim Wieland

I'd like to revive the discussion about offering another compression
algorithm than zlib to at least pg_dump. There has been a previous
discussion here:

http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php

and it ended without any real result. The results so far were:

- There exist BSD-licensed compression algorithms
- Nobody knows a patent that is in our way
- Nobody can confirm that no patent is in our way

I do see a very real demand for replacing zlib which compresses quite
well but is slow as hell. For pg_dump what people want is cheap
compression, they usually prefer an algorithm that compresses less
optimal but that is really fast.

One question that I do not yet see answered is, do we risk violating a
patent even if we just link against a compression library, for example
liblzf, without shipping the actual code?

I have checked what other projects do, especially about liblzf which
would be my favorite choice (BSD license, available since quite some
time...) and there are other projects that actually ship the lzf code
(I haven't found a project that just links to it). The most prominent
projects are

- KOffice (implements a derived version in
koffice-2.1.2/libs/store/KoXmlReader.cpp)
- Virtual Box (ships it in vbox-ose-1.3.8/src/libs/liblzf-1.51)
- TuxOnIce (formerly known as suspend2 - linux kernel patch, ships it
in the patch)

We have pg_lzcompress.c which implements the compression routines for
the tuple toaster. Are we sure that we don't violate any patents with
this algorithm?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Thoughts on pg_hba.conf rejection

2010-04-08 Thread Joshua Tolley

On Wed, Apr 07, 2010 at 01:07:21PM -0400, Robert Haas wrote:
> On Wed, Apr 7, 2010 at 10:46 AM, Tom Lane  wrote:
> > Simon Riggs  writes:
> >> When there is a specific reject rule, why does the server say
> >> FATAL:  no pg_hba.conf entry
> >
> > It's intentional.  We try to expose the minimum amount of knowledge
> > about the contents of pg_hba.conf to potential attackers.
> 
> The problem with the message is not that it's uninformative, but that
> it's counterfactual.
> 
> ...Robert

I agree (I noticed and was bothered by this today, as a matter of irrelevant
fact). I can support the idea of exposing as little as possible of
pg_hba.conf, but ISTM the "no pg_hba.conf entry" is exposing too much, by that
standard. Just say something like "connection disallowed" and leave it at that
-- either it's disallowed by lack of a rule, or by existence of a "reject"
rule, or by something else entirely. As long as the message isn't clearly
wrong in the "reject" case, as it is now.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature

Re: [HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Greg Smith

An introduction to the current state of work in progress for adding 
improved partitioning features to PostgreSQL is documented at 
http://wiki.postgresql.org/wiki/Table_partitioning


If you can find a small, targeted piece of that overall plan that builds 
on the work already done, and is in the direction of the final goal 
here, you may be able to make useful progress in a few months time.  
This area is extremely well explored already.  There are 13 mailing list 
threads you'll need to browse through carefully just to have enough 
background that you're likely to build something new, rather than just 
wandering down a path that's already been followed but leads to a dead end.


You have picked a PostgreSQL feature that is dramatically more difficult 
than it appears to be, and I wouldn't expect you'll actually finish even 
a fraction of your goals in a summer of work.  You're at least in 
plentiful company--most students do the same.  As a rule, if you see a 
feature on our TODO list that looks really useful and fun to work on, 
it's only still there because people have tried multiple times to build 
it completely but not managed to do so because it's harder than it 
appears.  This is certainly the case with improving the partitioning 
support that's built in to the database.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unsafe threading in syslogger on Windows

Andrew Dunstan wrote:
> Heikki Linnakangas wrote:
>> I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
>> syslogger, and let it use pgpipe() and select() instead of the extra
>> thread.
>
> Sounds reasonable. Let's see how big the changes are on HEAD. I'm not
> sure it's worth creating a different smaller fix for the back branches.

I tried that, and got a crash somewhere in the code that inherits the
syslogger pipe/socket to the child process. I don't understand why, and
I don't feel like debugging any deeper into that right now. If you or
someone else wants to give it a shot, that would be good. If not, I
might try again some other day after sleeping over it.

Anyway, here's the patch I had in mind for back-branches.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 22cf5f2..ee449ac 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -117,7 +117,7 @@ HANDLE		syslogPipe[2] = {0, 0};
 
 #ifdef WIN32
 static HANDLE threadHandle = 0;
-static CRITICAL_SECTION sysfileSection;
+static CRITICAL_SECTION sysloggerSection;
 #endif
 
 /*
@@ -268,7 +268,8 @@ SysLoggerMain(int argc, char *argv[])
 
 #ifdef WIN32
 	/* Fire up separate data transfer thread */
-	InitializeCriticalSection(&sysfileSection);
+	InitializeCriticalSection(&sysloggerSection);
+	EnterCriticalSection(&sysloggerSection);
 
 	threadHandle = (HANDLE) _beginthreadex(NULL, 0, pipeThread, NULL, 0, NULL);
 	if (threadHandle == 0)
@@ -423,8 +424,16 @@ SysLoggerMain(int argc, char *argv[])
 		 * On Windows we leave it to a separate thread to transfer data and
 		 * detect pipe EOF.  The main thread just wakes up once a second to
 		 * check for SIGHUP and rotation conditions.
+		 *
+		 * Server code isn't generally thread-safe, so we ensure that only
+		 * one of the threads is active at a time by entering the critical
+		 * section whenever we're not sleeping.
 		 */
+		LeaveCriticalSection(&sysloggerSection);
+
 		pg_usleep(100L);
+
+		EnterCriticalSection(&sysloggerSection);
 #endif   /* WIN32 */
 
 		if (pipe_eof_seen)
@@ -911,17 +920,9 @@ write_syslogger_file(const char *buffer, int count, int destination)
 	if (destination == LOG_DESTINATION_CSVLOG && csvlogFile == NULL)
 		open_csvlogfile();
 
-#ifdef WIN32
-	EnterCriticalSection(&sysfileSection);
-#endif
-
 	logfile = destination == LOG_DESTINATION_CSVLOG ? csvlogFile : syslogFile;
 	rc = fwrite(buffer, 1, count, logfile);
 
-#ifdef WIN32
-	LeaveCriticalSection(&sysfileSection);
-#endif
-
 	/* can't use ereport here because of possible recursion */
 	if (rc != count)
 		write_stderr("could not write to log file: %s\n", strerror(errno));
@@ -945,11 +946,21 @@ pipeThread(void *arg)
 	for (;;)
 	{
 		DWORD		bytesRead;
+		BOOL		result;
+
+		result = ReadFile(syslogPipe[0],
+		  logbuffer + bytes_in_logbuffer,
+		  sizeof(logbuffer) - bytes_in_logbuffer,
+		  &bytesRead, 0);
 
-		if (!ReadFile(syslogPipe[0],
-	  logbuffer + bytes_in_logbuffer,
-	  sizeof(logbuffer) - bytes_in_logbuffer,
-	  &bytesRead, 0))
+		/*
+		 * Enter critical section before doing anything that might touch
+		 * global state shared by the main thread. Anything that uses
+		 * palloc()/pfree() in particular are not safe outside the critical
+		 * section.
+		 */
+		EnterCriticalSection(&sysloggerSection);
+		if (!result)
 		{
 			DWORD		error = GetLastError();
 
@@ -966,6 +977,7 @@ pipeThread(void *arg)
 			bytes_in_logbuffer += bytesRead;
 			process_pipe_input(logbuffer, &bytes_in_logbuffer);
 		}
+		LeaveCriticalSection(&sysloggerSection);
 	}
 
 	/* We exit the above loop only upon detecting pipe EOF */
@@ -974,6 +986,7 @@ pipeThread(void *arg)
 	/* if there's any data left then force it out now */
 	flush_pipe_input(logbuffer, &bytes_in_logbuffer);
 
+	LeaveCriticalSection(&sysloggerSection);
 	_endthread();
 	return 0;
 }
@@ -1097,18 +1110,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 		_setmode(_fileno(fh), _O_TEXT); /* use CRLF line endings on Windows */
 #endif
 
-		/* On Windows, need to interlock against data-transfer thread */
-#ifdef WIN32
-		EnterCriticalSection(&sysfileSection);
-#endif
-
 		fclose(syslogFile);
 		syslogFile = fh;
 
-#ifdef WIN32
-		LeaveCriticalSection(&sysfileSection);
-#endif
-
 		/* instead of pfree'ing filename, remember it for next time */
 		if (last_file_name != NULL)
 			pfree(last_file_name);
@@ -1164,18 +1168,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 		_setmode(_fileno(fh), _O_TEXT); /* use CRLF line endings on Windows */
 #endif
 
-		/* On Windows, need to interlock against data-transfer thread */
-#ifdef WIN32
-		EnterCriticalSection(&sysfileSection);
-#endif
-
 		fclose(csvlogFile);
 		csvlogFile = fh;
 
-#ifdef WIN32
-		LeaveCriticalSection(&sysfileSection);
-#endif
-
 		/* instead of pfree'ing filename, remember it for next time */

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Magnus Hagander

On Thu, Apr 8, 2010 at 20:35, Kevin Grittner
 wrote:
> Greg Stark  wrote:
>
>> But all it means is you get a random subset of the messages.
>> You're still missing most of the admin or sql or performance
>> related threads since they're mostly on -general anyways. Those
>> three categories cover pretty much all of -general.
>
> Perhaps -general should be eliminated in favor of more specific
> lists?

That sounds like a great way to make things harder for newbies and outsiders.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Greg Stark

On Thu, Apr 8, 2010 at 7:58 PM, Ned Lilly  wrote:
> +1 for the idea, and +1 for the Zork reference.  Hello sailor.

fwiw it's older than Zork. It comes from Adventure
(http://en.wikipedia.org/wiki/Colossal_Cave_Adventure)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

On Thu, Apr 8, 2010 at 4:23 PM, Necati Batur  wrote:
> In order to make a system change a student need to be more informed
> and experienced for the issue.Nonetheless,this step of work is
> actually not the phase of determining all the details,I
> guess.Otherwise,I would just do a few weeks of coding in summer to
> complete the project and I would be the person in charge in project
> management :)

Well, obviously there are going to be details that won't get worked
out until you really settle down to do the project.  But I don't think
you've even really defined what exactly you would be working on.
Going through your deliverables one by one:

*The trigger based operations can be done automatically

What trigger-based operations are you talking about and how do you
plan to automate them?

*The stored procedures can help us to do some functionalities like
check constraint problem

What check constraint problem?  PostgreSQL already has check
constraints.  If you think they have a problem, say what it is.

*manual VACUUM or ANALYZE commands can be handled by using triggers
DBMS SQL can help to provide faster executions

We already have an autovacuum daemon that automates VACUUM and ANALYZE
commands and it works pretty well.  Certainly, there's room for
improvement, but what do you think needs improving?   What about the
current implementation do you not like and how do you propose to fix
it?

*Some more functionalities can be added to UPDATE operations to make
administrations easy

What administrations are currently difficult and what functionality do
you propose to add to simplify them?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] some more data corruption

2010-04-08 Thread Alvaro Herrera

Hi,

So I'm still investigating data corruption issues.  They have some
serious TOAST index corruption too; for example,

For example, notice how an indexscan returns invalid chunk_ids here:

select xmin,xmax,ctid,cmin, chunk_id, chunk_seq,length(chunk_data)
from pg_toast.pg_toast_18141
 where chunk_id between 194679500 and 194679700
order by chunk_id;
  xmin   |  xmax   |ctid | cmin | chunk_id  | chunk_seq | length
-+-+-+--+---+---+
 9767631 |   0 | (2855683,1) |   27 |  94929948 | 1 | 26
 9767659 |   0 | (2855683,5) |   28 |  94929949 | 1 | 26
 9767659 |   0 | (2855683,6) |   32 |  94929950 | 1 | 26
 9767659 |   0 | (2855683,7) |   34 |  94929951 | 1 | 26
 9767797 | 9773223 | (2855684,4) |   36 |  94929958 | 1 | 27
 9767797 | 9773223 | (2855684,5) |   38 |  94929959 | 1 | 27
 9767797 |   0 | (2855684,6) |   41 |  94929960 | 1 | 27



One thing that surprised me is that they are seeing such short values
being pushed out to TOAST.  The rows are typically 156 bytes wide, and
the values being pushed out are 30 bytes, so I fail to see why it would
get done.  Any theories?  Is this normal?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Necati Batur

In order to make a system change a student need to be more informed
and experienced for the issue.Nonetheless,this step of work is
actually not the phase of determining all the details,I
guess.Otherwise,I would just do a few weeks of coding in summer to
complete the project and I would be the person in charge in project
management :)

2010/4/8 Necati Batur :
> Benefits of Project
>
> Partitioning refers to splitting what is logically one large table
> into smaller physical pieces. Partitioning can provide several
> benefits:
>
> Query performance can be improved dramatically in certain situations,
> particularly when most of the heavily accessed rows of the table are
> in a single partition or a small number of partitions. The
> partitioning substitutes for leading columns of indexes, reducing
> index size and making it more likely that the heavily-used parts of
> the indexes fit in memory.
>
> When queries or updates access a large percentage of a single
> partition, performance can be improved by taking advantage of
> sequential scan of that partition instead of using an index and random
> access reads scattered across the whole table.
>
> Bulk loads and deletes can be accomplished by adding or removing
> partitions, if that requirement is planned into the partitioning
> design. ALTER TABLE is far faster than a bulk operation. It also
> entirely avoids the VACUUM overhead caused by a bulk DELETE.
>
> Seldom-used data can be migrated to cheaper and slower storage media.
>
> Delivarables
>
> *The trigger based operations can be done automatically
>
> *The stored procedures can help us to do some functionalities like
> check constraint problem
>
> *manual VACUUM or ANALYZE commands can be handled by using triggers
> DBMS SQL can help to provide faster executions
>
> *Some more functionalities can be added to UPDATE operations to make
> administrations easy
>
> Timeline (not exact but most probably)
>
> Start at june 7 and End around 7 september
>
> *Warm up to environment to Postgresql(1-2 weeks)
>
> *Determine exact operations to be addded on postgresql
>
> *Initial coding as to workbreakdown structure
>
> *Start implementing on distributed environment to check inital functions work
>
> *Write test cases for code
>
> *Further implementation to support full functionalities on ideas
>
> *Write it to discussion site and collect feedbacks
>
> *More support upon feedbacks
>
> *Last tests and documentation of final operations
>
> About me
>
> I am a senior student at computer engineering at iztech in turkey. My
> areas of inetrests are information management, OOP(Object Oriented
> Programming) and currently bioinformatics. I have been working with a
> Asistan Professor(Jens Allmer) in molecular biology genetics
> department for one year.Firstly, we worked on a protein database 2DB
> and we presented the project in HIBIT09 organization. The Project  was
> “Database management system independence by amending 2DB with a
> database access layer”. Currently, I am working on another project
> (Kerb) as my senior project which is a general sqeuential task
> management system intend to reduce the errors and increase time saving
> in biological experiments. We will present this project in HIBIT2010
> too. Moreover,I am good at data structures and implementations on C.
>
>
> Contact: e-mails; necatiba...@gmail.com , necati_ba...@hotmail.com(msn)
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

On Thu, Apr 8, 2010 at 4:14 PM, Necati Batur  wrote:
> The more specific of the items will just be the exact job I guess.
> However the detailed form will be hard to write now.
> Or should I explain the sql issues for each point?

Partitioning is a big project.  It seems to me that if you want to
have any chance of making a meaningful contribution in one summer,
you're going to need to have a pretty specific idea of what you hope
to accomplish up front, and I don't think you have that right now.
The hard changes are not going to be adjustments to SQL syntax, but in
the guts of the planner, executor, system catalogs, etc.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Necati Batur

The more specific of the items will just be the exact job I guess.
However the detailed form will be hard to write now.
Or should I explain the sql issues for each point?


2010/4/8 Necati Batur :
> Benefits of Project
>
> Partitioning refers to splitting what is logically one large table
> into smaller physical pieces. Partitioning can provide several
> benefits:
>
> Query performance can be improved dramatically in certain situations,
> particularly when most of the heavily accessed rows of the table are
> in a single partition or a small number of partitions. The
> partitioning substitutes for leading columns of indexes, reducing
> index size and making it more likely that the heavily-used parts of
> the indexes fit in memory.
>
> When queries or updates access a large percentage of a single
> partition, performance can be improved by taking advantage of
> sequential scan of that partition instead of using an index and random
> access reads scattered across the whole table.
>
> Bulk loads and deletes can be accomplished by adding or removing
> partitions, if that requirement is planned into the partitioning
> design. ALTER TABLE is far faster than a bulk operation. It also
> entirely avoids the VACUUM overhead caused by a bulk DELETE.
>
> Seldom-used data can be migrated to cheaper and slower storage media.
>
> Delivarables
>
> *The trigger based operations can be done automatically
>
> *The stored procedures can help us to do some functionalities like
> check constraint problem
>
> *manual VACUUM or ANALYZE commands can be handled by using triggers
> DBMS SQL can help to provide faster executions
>
> *Some more functionalities can be added to UPDATE operations to make
> administrations easy
>
> Timeline (not exact but most probably)
>
> Start at june 7 and End around 7 september
>
> *Warm up to environment to Postgresql(1-2 weeks)
>
> *Determine exact operations to be addded on postgresql
>
> *Initial coding as to workbreakdown structure
>
> *Start implementing on distributed environment to check inital functions work
>
> *Write test cases for code
>
> *Further implementation to support full functionalities on ideas
>
> *Write it to discussion site and collect feedbacks
>
> *More support upon feedbacks
>
> *Last tests and documentation of final operations
>
> About me
>
> I am a senior student at computer engineering at iztech in turkey. My
> areas of inetrests are information management, OOP(Object Oriented
> Programming) and currently bioinformatics. I have been working with a
> Asistan Professor(Jens Allmer) in molecular biology genetics
> department for one year.Firstly, we worked on a protein database 2DB
> and we presented the project in HIBIT09 organization. The Project  was
> “Database management system independence by amending 2DB with a
> database access layer”. Currently, I am working on another project
> (Kerb) as my senior project which is a general sqeuential task
> management system intend to reduce the errors and increase time saving
> in biological experiments. We will present this project in HIBIT2010
> too. Moreover,I am good at data structures and implementations on C.
>
>
> Contact: e-mails; necatiba...@gmail.com , necati_ba...@hotmail.com(msn)
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC PostgreSQL partitioning issue

On Thu, Apr 8, 2010 at 3:58 PM, Necati Batur  wrote:
> *The trigger based operations can be done automatically
>
> *The stored procedures can help us to do some functionalities like
> check constraint problem
>
> *manual VACUUM or ANALYZE commands can be handled by using triggers
> DBMS SQL can help to provide faster executions
>
> *Some more functionalities can be added to UPDATE operations to make
> administrations easy

I think you need to be a LOT more specific about each of these items
and what you intend to do about them.  You also need to explain the
relationship between your work and Itagaki Takahiro's existing patch.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSOC PostgreSQL partitioning issue

2010-04-08 Thread Necati Batur

Benefits of Project

Partitioning refers to splitting what is logically one large table
into smaller physical pieces. Partitioning can provide several
benefits:

Query performance can be improved dramatically in certain situations,
particularly when most of the heavily accessed rows of the table are
in a single partition or a small number of partitions. The
partitioning substitutes for leading columns of indexes, reducing
index size and making it more likely that the heavily-used parts of
the indexes fit in memory.

When queries or updates access a large percentage of a single
partition, performance can be improved by taking advantage of
sequential scan of that partition instead of using an index and random
access reads scattered across the whole table.

Bulk loads and deletes can be accomplished by adding or removing
partitions, if that requirement is planned into the partitioning
design. ALTER TABLE is far faster than a bulk operation. It also
entirely avoids the VACUUM overhead caused by a bulk DELETE.

Seldom-used data can be migrated to cheaper and slower storage media.

Delivarables

*The trigger based operations can be done automatically

*The stored procedures can help us to do some functionalities like
check constraint problem

*manual VACUUM or ANALYZE commands can be handled by using triggers
DBMS SQL can help to provide faster executions

*Some more functionalities can be added to UPDATE operations to make
administrations easy

Timeline (not exact but most probably)

Start at june 7 and End around 7 september

*Warm up to environment to Postgresql(1-2 weeks)

*Determine exact operations to be addded on postgresql

*Initial coding as to workbreakdown structure

*Start implementing on distributed environment to check inital functions work

*Write test cases for code

*Further implementation to support full functionalities on ideas

*Write it to discussion site and collect feedbacks

*More support upon feedbacks

*Last tests and documentation of final operations

About me

I am a senior student at computer engineering at iztech in turkey. My
areas of inetrests are information management, OOP(Object Oriented
Programming) and currently bioinformatics. I have been working with a
Asistan Professor(Jens Allmer) in molecular biology genetics
department for one year.Firstly, we worked on a protein database 2DB
and we presented the project in HIBIT09 organization. The Project  was
“Database management system independence by amending 2DB with a
database access layer”. Currently, I am working on another project
(Kerb) as my senior project which is a general sqeuential task
management system intend to reduce the errors and increase time saving
in biological experiments. We will present this project in HIBIT2010
too. Moreover,I am good at data structures and implementations on C.


Contact: e-mails; necatiba...@gmail.com , necati_ba...@hotmail.com(msn)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

On Thu, Apr 8, 2010 at 3:22 PM, Oleg Bartunov  wrote:
>> On general thought I've had is that it would be nice if the first
>> attempt to SELECT against a table with no statistics would trigger an
>> automatic ANALYZE by the backend on which the query was executed.
>> It's pretty common to populate a table using INSERT, or CTAS, or COPY
>> and then try to immediately run a query against it, and I've often
>> found that it's necessary to insert manual analyze statements in there
>> to get decent query plans.
>
> Oracle does this. So, is't worth to add support (configurable, like
> Oracle's optimizer_dynamic_sampling) ?

Well, dynamic sampling is considerably more complicated than what I
proposed, which is just to force an ordinary ANALYZE before the first
query against the table.  It would be a very powerful feature if we
could use it to ameliorate, for example, the gross statistical errors
that sometimes occur when multiple, correlated filter conditions are
applied to the same base table; but I don't think it's in the direct
path of solving the present complaint.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

2010-04-08 Thread Oleg Bartunov


On Thu, 8 Apr 2010, Robert Haas wrote:


On Thu, Apr 8, 2010 at 2:53 PM, Alvaro Herrera
 wrote:

Oleg Bartunov wrote:


our client complained about slow query, which involves temporary tables.
Analyzing them manually solved the problem. I don't remember
arguments against temporary tables support by autovacuum. I'd
appreciate any
pointers.


Autovacuum can't process temp tables; they could reside in a backend's
private temp buffers (local memory, not shared).


On general thought I've had is that it would be nice if the first
attempt to SELECT against a table with no statistics would trigger an
automatic ANALYZE by the backend on which the query was executed.
It's pretty common to populate a table using INSERT, or CTAS, or COPY
and then try to immediately run a query against it, and I've often
found that it's necessary to insert manual analyze statements in there
to get decent query plans.


Oracle does this. So, is't worth to add support (configurable, like
Oracle's optimizer_dynamic_sampling) ?


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Jaime Casanova

On Thu, Apr 8, 2010 at 3:09 PM, Joshua D. Drake  wrote:
> On Thu, 2010-04-08 at 15:06 -0400, Jaime Casanova wrote:
>> >
>>
>> if we want specific topics, then remove -general, -novice, -admin
>
> This will likely never fly, see the archives.
>

well, -novice shuold be easy... actually it has no reason to exist.
after all what are the rules? you should subscribe here first 6 month
you use postgres or until you make a course?

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

Robert Haas  wrote:
> On Thu, Apr 8, 2010 at 2:53 PM, Alvaro Herrera
 
>> Autovacuum can't process temp tables; they could reside in a
>> backend's private temp buffers (local memory, not shared).
> 
> it would be nice if the first attempt to SELECT against a table
> with no statistics would trigger an automatic ANALYZE by the
> backend on which the query was executed.
 
+1 as an RFE
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Joshua D. Drake

On Thu, 2010-04-08 at 15:06 -0400, Jaime Casanova wrote:
> On Thu, Apr 8, 2010 at 2:31 PM, Robert Haas  wrote:
> > On Thu, Apr 8, 2010 at 2:30 PM, Greg Stark  wrote:
> >> On Thu, Apr 8, 2010 at 5:09 PM, Kevin Grittner
> >>  wrote:
> >>> My set is different, but the principle is the same -- I can't find
> >>> the time to read all messages to all lists (really, I've tried), so
> >>> I limit by list to try to target the issues of most interest to me.
> >>
> >> But all it means is you get a random subset of the messages. You're
> >> still missing most of the admin or sql or performance related threads
> >> since they're mostly on -general anyways. Those three categories cover
> >> pretty much all of -general.
> >
> > Maybe we should remove -general.  :-)
> >
> >
> 
> if we want specific topics, then remove -general, -novice, -admin

This will likely never fly, see the archives.

Joshua D. Drake


> 
> -- 
> Atentamente,
> Jaime Casanova
> Soporte y capacitación de PostgreSQL
> Asesoría y desarrollo de sistemas
> Guayaquil - Ecuador
> Cel. +59387171157
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or 
Sir.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

On Thu, Apr 8, 2010 at 2:53 PM, Alvaro Herrera
 wrote:
> Oleg Bartunov wrote:
>
>> our client complained about slow query, which involves temporary tables.
>> Analyzing them manually solved the problem. I don't remember
>> arguments against temporary tables support by autovacuum. I'd
>> appreciate any
>> pointers.
>
> Autovacuum can't process temp tables; they could reside in a backend's
> private temp buffers (local memory, not shared).

On general thought I've had is that it would be nice if the first
attempt to SELECT against a table with no statistics would trigger an
automatic ANALYZE by the backend on which the query was executed.
It's pretty common to populate a table using INSERT, or CTAS, or COPY
and then try to immediately run a query against it, and I've often
found that it's necessary to insert manual analyze statements in there
to get decent query plans.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Jaime Casanova

On Thu, Apr 8, 2010 at 2:31 PM, Robert Haas  wrote:
> On Thu, Apr 8, 2010 at 2:30 PM, Greg Stark  wrote:
>> On Thu, Apr 8, 2010 at 5:09 PM, Kevin Grittner
>>  wrote:
>>> My set is different, but the principle is the same -- I can't find
>>> the time to read all messages to all lists (really, I've tried), so
>>> I limit by list to try to target the issues of most interest to me.
>>
>> But all it means is you get a random subset of the messages. You're
>> still missing most of the admin or sql or performance related threads
>> since they're mostly on -general anyways. Those three categories cover
>> pretty much all of -general.
>
> Maybe we should remove -general.  :-)
>
>

if we want specific topics, then remove -general, -novice, -admin

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Ned Lilly

+1 for the idea, and +1 for the Zork reference. Hello sailor.

On 4/8/2010 1:11 AM Greg Stark wrote:

I've often said in the past that we have too many mailing lists with
overlapping and vague charters. I submit the following thread as
evidence that this causes real problems.

http://archives.postgresql.org/message-id/g2o4b46b5f01004010610ib8625426uae6ee90ac1435...@mail.gmail.com

Because the poster chose to send it to pgsql-admin instead of
pgsql-general (or pgsql-bugs) very few of the usual suspects had a
chance to see it. 7 days later a question about a rather serious
database corruption problem had no responses. I've never understand
what the point of pgsql-admin is; just about every question posted is
an "admin" question of some sort.

Likewise I don't think we should have pgsql-performance or pgsql-sql
or pgsql-novice -- any thread appropriate for any of these would be
better served by sending it to pgsql-general anyways (with the
exception of pgsql-performance which has a weird combination of hacker
threads and user performance tuning threads). Sending threads to
pgsql-general would get more eyes on them and would avoid a lot of the
cross-posting headaches. What would someone subscribed to one of these
lists but not pgsql-general get anyways but some random sample of
threads that might be vaguely performance or admin related. They would
still miss most of the administration and performance questions and
discussions which happen on -general and -hackers as appropriate.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

Greg Stark  wrote:
 
> But all it means is you get a random subset of the messages.
> You're still missing most of the admin or sql or performance
> related threads since they're mostly on -general anyways. Those
> three categories cover pretty much all of -general.
 
Well, one of these more specific lists must be getting over half of
the message relevant to the title, unless things are freakishly
evenly divided.  Message counts in the last 30 days:
 
 143 -novice
 199 -sql
 321 -admin
 436 -performance

1099  *subtotal*

1102 -general

2201  **total**

 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and temp tables support

2010-04-08 Thread Alvaro Herrera

Oleg Bartunov wrote:

> our client complained about slow query, which involves temporary tables.
> Analyzing them manually solved the problem. I don't remember
> arguments against temporary tables support by autovacuum. I'd
> appreciate any
> pointers.

Autovacuum can't process temp tables; they could reside in a backend's
private temp buffers (local memory, not shared).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

Greg Stark  wrote:
 
> But all it means is you get a random subset of the messages.
> You're still missing most of the admin or sql or performance
> related threads since they're mostly on -general anyways. Those
> three categories cover pretty much all of -general.
 
Perhaps -general should be eliminated in favor of more specific
lists?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

On Thu, Apr 8, 2010 at 2:30 PM, Greg Stark  wrote:
> On Thu, Apr 8, 2010 at 5:09 PM, Kevin Grittner
>  wrote:
>> My set is different, but the principle is the same -- I can't find
>> the time to read all messages to all lists (really, I've tried), so
>> I limit by list to try to target the issues of most interest to me.
>
> But all it means is you get a random subset of the messages. You're
> still missing most of the admin or sql or performance related threads
> since they're mostly on -general anyways. Those three categories cover
> pretty much all of -general.

Maybe we should remove -general.  :-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Greg Stark

On Thu, Apr 8, 2010 at 5:09 PM, Kevin Grittner
 wrote:
> My set is different, but the principle is the same -- I can't find
> the time to read all messages to all lists (really, I've tried), so
> I limit by list to try to target the issues of most interest to me.

But all it means is you get a random subset of the messages. You're
still missing most of the admin or sql or performance related threads
since they're mostly on -general anyways. Those three categories cover
pretty much all of -general.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] autovacuum and temp tables support

2010-04-08 Thread Oleg Bartunov


Hi there,

our client complained about slow query, which involves temporary tables.
Analyzing them manually solved the problem. I don't remember arguments 
against temporary tables support by autovacuum. I'd appreciate any

pointers.

Also, it's worth to add autovacuum_enable_temp_tables variable to control
autovacuum behaviour ?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set LC_COLLATE to de_DE_phoneb

2010-04-08 Thread Frank Jagusch

Am Donnerstag, den 08.04.2010, 10:27 +0900 schrieb Takahiro Itagaki:
> Frank Jagusch  wrote:
> 
> > "de_DE_phoneb" is the name of an alternative sorting in german (only a
> > few languages have alternate sorting). You may find some information
> > when you search the MSDN for "de_DE_phoneb", i.e.
> > http://msdn.microsoft.com/en-en/library/ms404373.aspx
> > These alternate sorting is supported by the OS, but I don't know how it
> > is supported in the msvcrt.
> 
> Hmmm, I found "de-DE_phoneb" in MSDN:
> http://msdn.microsoft.com/en-us/library/dd374060
> but setlocale("de-DE_phoneb") always fails at least on my machine.

Is it a windows box? May be you need to install some german language
support?

> The doc says "de-DE_phoneb" is a locale name for 
> MAKELCID(MAKELANGID(LANG_GERMAN, SUBLANG_GERMAN), SORT_GERMAN_PHONE_BOOK).
> Some of native Win32 APIs could accept the locale and sort-order
> combination, but setlocale() in CRT seems to reject it.
> 
> So, you could use the locale if you find a setlocale-compatible name of
> "de-DE_phoneb".

As far as I investigated I didn't find one. I hoped to find such a
setlocale-compatible name or a hint where to search here...

> Or, you cannot use it, unless we modify PostgreSQL to
> use Win32 locale functions instead of standard libc ones -- but it is
> hardly acceptable.

I thought PostgreSQL is using OS specific functions for sorting (means
Win32 functions?). This sounds not good for my request. So I ask the
whole audience: Are others out there asking for a support for the
alternate sort orders? Is it worth to discuss further in this direction?

Here an other Idea: Is there a way to define a custom collation for a
database?

Thanks so far,
Frank Jagusch
-- 
http://www.jagusch-online.de/frank


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Josh Berkus

On 4/7/10 10:11 PM, Greg Stark wrote:
> Likewise I don't think we should have pgsql-performance or pgsql-sql
> or pgsql-novice -- any thread appropriate for any of these would be
> better served by sending it to pgsql-general anyways (with the
> exception of pgsql-performance which has a weird combination of hacker
> threads and user performance tuning threads). Sending threads to
> pgsql-general would get more eyes on them and would avoid a lot of the
> cross-posting headaches. What would someone subscribed to one of these
> lists but not pgsql-general get anyways but some random sample of
> threads that might be vaguely performance or admin related. They would
> still miss most of the administration and performance questions and
> discussions which happen on -general and -hackers as appropriate.

(1) Regarding -sql and -performance, I couldn't disagree more.  I agree
that the charter of -admin is extremely vague.

(2) This is *definitely* the wrong list for this discussion; it should
be on -www.

And, no, #2 was not meant to be ironic, even if it is.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Thu, 2010-04-08 at 19:02 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > OK, that seems better. I'm happy with that instead.
> > 
> > Have you tested this? Is it ready to commit?
> 
> Only very briefly. I think the code is ready, but please review and test
> to see I didn't miss anything.

I'm going to need you to commit this. I'm on holiday now until 14 April,
so its not going to get a retest before then otherwise; its not smart to
commit and then go on holiday, IIRC. 

I've reviewed your changes and they look correct to me; the main chunk
of code is mine and that was tested by me.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


> Perhaps further clarifying the charters of the various lists would
> help, but folding too much into any one list is likely to reduce the
> number of readers or cause "spotty" attention.  (When I was
> attempting to follow all the lists, I'd typically give up when I
> fell about 6000 messages behind, and try to start up again "cold"
> after having missed a big interval of messages.)

I don't quite agree with this, because -general is *already* at the level 
where it takes a significant chunk of daily time to keep up with it. 
All the other mergeable lists pale in comparison to its volume.
I stopped trying to read lists completely a time ago, and merely read 
subject lines at this point, diving into ones that seem interesting 
or important.

Merging the smaller lists that have a huge overlap of topics with -general 
already would thus be a win, as there would be a larger audience to 
reply to, and less lists to administer and have people keep track of.
It would also reduce the confusion of "which list should I post this to?"

I think -admin should absolutely be folded in, -sql really should as well, 
and slightly less so -performance and -interfaces.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201004081214
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAku+AV0ACgkQvJuQZxSWSsgVvgCbBh9vsx2cecfAhZQRmju4Vtyi
zz0An0OjXFGtAtOyTUZFDDWGxrRZltBB
=gDQ+
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

Robert Haas  wrote: 
> Dave Page  wrote:
>> Greg Stark  wrote:

>>> Because the poster chose to send it to pgsql-admin instead of
>>> pgsql-general (or pgsql-bugs) very few of the usual suspects had
>>> a chance to see it. 7 days later a question about a rather
>>> serious database corruption problem had no responses.

I do monitor that list, and try to respond to those issues I can,
but had no clue what that message was about -- so I left it for
someone else to take up.  I often see Tom responding to posts on
that list, so I kinda figure anything serious (or where I get it
wrong) will be addressed by him, but this thread makes me wonder
whether we should advise people not to post there when there is any
indication of possible corruption or bugs.

>>> I've never understand what the point of pgsql-admin is;  just
>>> about every question posted is an "admin" question of some sort.

I think you just answered your own question.  I've considered it to
be a list for DBAs (or those filling that role, regardless of title)
to discuss administrative and operational issues and "best
practices".  That seems useful to me.

>> a counter argument is that merging lists would significantly
>> increase the traffic on -general would may not be appreciated by
>> the many people that are only subscribed to one or two of the
>> affected lists. I would wager that the majority of people aren't
>> subscribed to more than a small number of the available lists.
> 
> Yeah.  I read -performance, -hackers, -bugs, but not -sql, -admin,
> -general.

My set is different, but the principle is the same -- I can't find
the time to read all messages to all lists (really, I've tried), so
I limit by list to try to target the issues of most interest to me.

> Consolidating multiple mailing lists to increase viewership of
> certain messages is only going to work if everyone who now follows
> each of the smaller mailing lists does an equally good job
> following the bigger one.  That doesn't seem like a safe
> assumption.

Agreed.

Perhaps further clarifying the charters of the various lists would
help, but folding too much into any one list is likely to reduce the
number of readers or cause "spotty" attention.  (When I was
attempting to follow all the lists, I'd typically give up when I
fell about 6000 messages behind, and try to start up again "cold"
after having missed a big interval of messages.)

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

Simon Riggs wrote:
> OK, that seems better. I'm happy with that instead.
> 
> Have you tested this? Is it ready to commit?

Only very briefly. I think the code is ready, but please review and test
to see I didn't miss anything.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Thu, 2010-04-08 at 18:35 +0300, Heikki Linnakangas wrote:
> 
> > So I have introduced the new mode ("snapshot mode") to enter hot
> standby
> > anyway. That avoids us having to screw around with the loop logic
> for
> > redo. I don't see any need to support the case of where we have no
> WAL
> > source defined, yet we want Hot Standby but we also want to allow
> > somebody to drop a WAL file into pg_xlog at some future point. That
> has
> > no use case of value AFAICS and is too complex to add at this stage
> of
> > the release cycle.
> 
> You don't need a new mode for that. Just do the same "are we
> consistent now?" check you do in the loop once before calling
> ReadRecord to fetch the record that follows the checkpoint pointer.
> Attached is a patch to show what I mean. We just need to let
> postmaster know that recovery has started a bit earlier, right after
> processing the checkpoint record, not delaying it until we've read the
> first record after it.

OK, that seems better. I'm happy with that instead.

Have you tested this? Is it ready to commit?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unsafe threading in syslogger on Windows

2010-04-08 Thread Andrew Dunstan




Heikki Linnakangas wrote:

On Windows, syslogger uses two threads. The main thread loops and polls
if any SIGHUPs have been received or if the log file needs to be
rotated. Another thread, "pipe thread", does ReadFile() on the pipe that
other processes send their log messages to. ReadFile() blocks, and
whenever new data arrives, it is processed in the pipe thread.

Both threads use palloc()/pfree(), which are not thread-safe :-(.

It's hard to trigger a crash because the main thread mostly just sleeps,
and the pipe thread only uses palloc()/pfree() when it receives chunked
messages, larger than 512 bytes. Browsing the CVS history, this was made
visibly broken by the patch that introduced the message chunking. Before
that the pipe thread just read from the pipe and wrote to the log file,
which was safe. It has always used ereport() to report read errors,
though, which can do palloc(), but there shouldn't normally be any read
errors.

I chatted with Magnus about this, and he suggested using a Windows
critical section to make sure that only one of the threads is active at
a time. That seems suitable for back-porting, but I'd like to get rid of
this threading in CVS head, it seems too error-prone.

The reason it uses threads like this on Windows is explained in the
comments:
  

/*
 * Worker thread to transfer data from the pipe to the current logfile.
 *
 * We need this because on Windows, WaitForSingleObject does not work on
 * unnamed pipes: it always reports "signaled", so the blocking ReadFile won't
 * allow for SIGHUP; and select is for sockets only.
 */



But Magnus pointed out that our pgpipe() implementation on Windows
actually creates a pair of sockets instead of pipes, for exactly that
reason, so that you can use select() on the returned file descriptor.
For some reason syslogger explicitly doesn't use pgpipe() on Windows,
though, but calls CreatePipe(). I don't see any explanation why.

I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
syslogger, and let it use pgpipe() and select() instead of the extra thread.


  



Sounds reasonable. Let's see how big the changes are on HEAD. I'm not 
sure it's worth creating a different smaller fix for the back branches.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

Simon Riggs wrote:
> In StartupXlog() when we get to the point where we "Find the first
> record that logically follows the checkpoint", in the current code
> ReadRecord() loops forever, spitting out
> LOG: record with zero length at 0/C88
> ...
> 
> That prevents us from going further down StartupXLog() to the point
> where we start the InRedo loop and hence start hot standby. As long as
> we retry we cannot progress further: this is the main problem.
> 
> So in the patch, I have modified the retry test in ReadRecord() so it no
> longer retries iff there is no WAL source defined. Now, when
> ReadRecord() exits, record == NULL at that point and so we do not (and
> cannot) enter the redo loop.

Oh, I see.

> So I have introduced the new mode ("snapshot mode") to enter hot standby
> anyway. That avoids us having to screw around with the loop logic for
> redo. I don't see any need to support the case of where we have no WAL
> source defined, yet we want Hot Standby but we also want to allow
> somebody to drop a WAL file into pg_xlog at some future point. That has
> no use case of value AFAICS and is too complex to add at this stage of
> the release cycle.

You don't need a new mode for that. Just do the same "are we consistent
now?" check you do in the loop once before calling ReadRecord to fetch
the record that follows the checkpoint pointer. Attached is a patch to
show what I mean. We just need to let postmaster know that recovery has
started a bit earlier, right after processing the checkpoint record, not
delaying it until we've read the first record after it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 1719,1724  PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
--- 1719,1806 
  }
  
  /*
+  * StandbyRecoverPreparedTransactions
+  *
+  * Scan the pg_twophase directory and setup all the required information to
+  * allow standby queries to treat prepared transactions as still active.
+  * This is never called at the end of recovery - we use
+  * RecoverPreparedTransactions() at that point.
+  *
+  * Currently we simply call SubTransSetParent() for any subxids of prepared
+  * transactions.
+  */
+ void
+ StandbyRecoverPreparedTransactions(bool can_overwrite)
+ {
+ 	DIR		   *cldir;
+ 	struct dirent *clde;
+ 
+ 	cldir = AllocateDir(TWOPHASE_DIR);
+ 	while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
+ 	{
+ 		if (strlen(clde->d_name) == 8 &&
+ 			strspn(clde->d_name, "0123456789ABCDEF") == 8)
+ 		{
+ 			TransactionId xid;
+ 			char	   *buf;
+ 			TwoPhaseFileHeader *hdr;
+ 			TransactionId *subxids;
+ 			int			i;
+ 
+ 			xid = (TransactionId) strtoul(clde->d_name, NULL, 16);
+ 
+ 			/* Already processed? */
+ 			if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
+ 			{
+ ereport(WARNING,
+ 		(errmsg("removing stale two-phase state file \"%s\"",
+ clde->d_name)));
+ RemoveTwoPhaseFile(xid, true);
+ continue;
+ 			}
+ 
+ 			/* Read and validate file */
+ 			buf = ReadTwoPhaseFile(xid, true);
+ 			if (buf == NULL)
+ 			{
+ ereport(WARNING,
+ 	  (errmsg("removing corrupt two-phase state file \"%s\"",
+ 			  clde->d_name)));
+ RemoveTwoPhaseFile(xid, true);
+ continue;
+ 			}
+ 
+ 			/* Deconstruct header */
+ 			hdr = (TwoPhaseFileHeader *) buf;
+ 			if (!TransactionIdEquals(hdr->xid, xid))
+ 			{
+ ereport(WARNING,
+ 	  (errmsg("removing corrupt two-phase state file \"%s\"",
+ 			  clde->d_name)));
+ RemoveTwoPhaseFile(xid, true);
+ pfree(buf);
+ continue;
+ 			}
+ 
+ 			/*
+ 			 * Examine subtransaction XIDs ... they should all follow main
+ 			 * XID, and they may force us to advance nextXid.
+ 			 */
+ 			subxids = (TransactionId *)
+ (buf + MAXALIGN(sizeof(TwoPhaseFileHeader)));
+ 			for (i = 0; i < hdr->nsubxacts; i++)
+ 			{
+ TransactionId subxid = subxids[i];
+ 
+ Assert(TransactionIdFollows(subxid, xid));
+ SubTransSetParent(xid, subxid, can_overwrite);
+ 			}
+ 		}
+ 	}
+ 	FreeDir(cldir);
+ }
+ 
+ /*
   * RecoverPreparedTransactions
   *
   * Scan the pg_twophase directory and reload shared-memory state for each
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 494,499  static XLogRecPtr minRecoveryPoint;		/* local copy of
--- 494,501 
  		 * ControlFile->minRecoveryPoint */
  static bool updateMinRecoveryPoint = true;
  
+ static bool reachedMinRecoveryPoint = false;
+ 
  static bool InRedo = false;
  
  /*
***
*** 547,552  static void ValidateXLOGDirectoryStructure(void);
--- 549,555 
  static void CleanupBackupHistory(void);
  static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
  static XLogRecord *ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt);
+ static void CheckRecoveryConsistency(void);
  static bool ValidXLOGHeader(XLogPageHead

[HACKERS] Unsafe threading in syslogger on Windows

On Windows, syslogger uses two threads. The main thread loops and polls
if any SIGHUPs have been received or if the log file needs to be
rotated. Another thread, "pipe thread", does ReadFile() on the pipe that
other processes send their log messages to. ReadFile() blocks, and
whenever new data arrives, it is processed in the pipe thread.

Both threads use palloc()/pfree(), which are not thread-safe :-(.

It's hard to trigger a crash because the main thread mostly just sleeps,
and the pipe thread only uses palloc()/pfree() when it receives chunked
messages, larger than 512 bytes. Browsing the CVS history, this was made
visibly broken by the patch that introduced the message chunking. Before
that the pipe thread just read from the pipe and wrote to the log file,
which was safe. It has always used ereport() to report read errors,
though, which can do palloc(), but there shouldn't normally be any read
errors.

I chatted with Magnus about this, and he suggested using a Windows
critical section to make sure that only one of the threads is active at
a time. That seems suitable for back-porting, but I'd like to get rid of
this threading in CVS head, it seems too error-prone.

The reason it uses threads like this on Windows is explained in the
comments:
> /*
>  * Worker thread to transfer data from the pipe to the current logfile.
>  *
>  * We need this because on Windows, WaitForSingleObject does not work on
>  * unnamed pipes: it always reports "signaled", so the blocking ReadFile won't
>  * allow for SIGHUP; and select is for sockets only.
>  */

But Magnus pointed out that our pgpipe() implementation on Windows
actually creates a pair of sockets instead of pipes, for exactly that
reason, so that you can use select() on the returned file descriptor.
For some reason syslogger explicitly doesn't use pgpipe() on Windows,
though, but calls CreatePipe(). I don't see any explanation why.

I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
syslogger, and let it use pgpipe() and select() instead of the extra thread.

Thoughts?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 9:56 AM, Simon Riggs  wrote:
> On Thu, 2010-04-08 at 09:40 -0400, Robert Haas wrote:
>> On Thu, Apr 8, 2010 at 8:00 AM, Simon Riggs  wrote:
>> > On Thu, 2010-04-08 at 07:53 -0400, Robert Haas wrote:
>> >
>> >> > I do. I see no reason to do the latter, ever, so should not be added to
>> >> > any TODO.
>> >>
>> >> Well, stopping recovery earlier would mean fewer locks, which would
>> >> mean a better chance for the read-only backends to finish their work
>> >> and exit quickly.  But I'm not sure how much it's worth worrying
>> >> about.
>> >
>> > The purpose of the lock is to prevent access to objects when they are in
>> > inappropriate states for access. If we stopped startup and allowed
>> > access, how do we know that things are in sufficiently good state to
>> > allow access? We don't. If the Startup process is holding a lock then
>> > that is the only safe thing to do. Otherwise we might allow access to a
>> > table with a partially built index or other screw ups.
>>
>> Hmm.  Good point.  I guess you could really only stop the startup
>> process safely when it wasn't holding any locks anyhow - you couldn't
>> just kill it and have it release the locks.
>
> ... and if it isn't holding any locks at all, there is no reason to kill
> Startup first => no TODO item.

I think you could shut it down at the first point at which it is
holding no locks, rather than letting it continue recovering and
potentially retake some new locks.  That would be more consistent with
the general idea of what a smart shutdown is supposed to be about.  I
think the real question is whether it's worth the code complexity.  I
suspect most people use fast shutdown most of the time anyway in
real-world applications.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, 2010-04-08 at 09:40 -0400, Robert Haas wrote:
> On Thu, Apr 8, 2010 at 8:00 AM, Simon Riggs  wrote:
> > On Thu, 2010-04-08 at 07:53 -0400, Robert Haas wrote:
> >
> >> > I do. I see no reason to do the latter, ever, so should not be added to
> >> > any TODO.
> >>
> >> Well, stopping recovery earlier would mean fewer locks, which would
> >> mean a better chance for the read-only backends to finish their work
> >> and exit quickly.  But I'm not sure how much it's worth worrying
> >> about.
> >
> > The purpose of the lock is to prevent access to objects when they are in
> > inappropriate states for access. If we stopped startup and allowed
> > access, how do we know that things are in sufficiently good state to
> > allow access? We don't. If the Startup process is holding a lock then
> > that is the only safe thing to do. Otherwise we might allow access to a
> > table with a partially built index or other screw ups.
> 
> Hmm.  Good point.  I guess you could really only stop the startup
> process safely when it wasn't holding any locks anyhow - you couldn't
> just kill it and have it release the locks.

... and if it isn't holding any locks at all, there is no reason to kill
Startup first => no TODO item.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgadmin-hackers] Feature request: limited deletions

2010-04-08 Thread Csaba Nagy

Hi all,

On Thu, 2010-04-08 at 07:45 -0400, Robert Haas wrote:
> >> 2010/4/8 Thom Brown :
> >> > So you could write:
> >> >
> >> > DELETE FROM massive_table WHERE id < 4000 LIMIT 1;

> I've certainly worked around the lack of this syntax more than once.
> And I bet it's not even that hard to implement.

The fact that it's not implemented has nothing to do with it's
complexity (in fact it is probably just a matter of enabling it) -
you'll have a hard time to convince some old-time hackers on this list
that the non-determinism inherent in this kind of query is
acceptable ;-)

There is a workaround to do it, which works quite good in fact:

delete from massive_table where ctid = any(array(select ctid from
massive_table WHERE id < 4000 LIMIT 1));

Just run an explain on it and you'll see it won't get any better, but
beware that it might be less optimal than you think, as you will be
likely sequential scanning the table for each chunk unless you put some
selective where conditions on it too - and then you'll still scan the
whole deleted part and not just the next chunk - the deleted records
won't go out of the way magically, you need to vacuum, and that's
probably a problem too on a big table. So most likely it will help you
less than you think on a massive table, the run time per chunk will
increase with each chunk unless you're able to vacuum efficiently. In
any case you need to balance the chunk size with the scanned portion of
the table so you get a reasonable run time per chunk, and not too much
overhead of the whole chunking process...

Cheers,
Csaba.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system table/view and sequence

2010-04-08 Thread Olivier Baheux

On 7 avr, 17:44, nicolas.barb...@gmail.com (Nicolas Barbier) wrote:
> 2010/4/7 Olivier Baheux :
>
> > i'm trying to find where are stored sequence definition
> > (increment,minvalue,maxvalue,start,cache) in system tables. Atm I
> > found everything exept sequence.
>
> It's in the sequence itself (which can be accessed like a table). The
> fact that this "table" is in fact a sequence is stored in pg_class:
>
> 8<
> itsme=# CREATE TABLE a (i serial);
> HINWEIS:  CREATE TABLE erstellt implizit eine Sequenz »a_i_seq« für
> die »serial«-Spalte »a.i«
> CREATE TABLE
> itsme=# SELECT * FROM a_i_seq;
>  sequence_name | last_value | increment_by |      max_value      |
> min_value | cache_value | log_cnt | is_cycled | is_called
> ---++--+-+---+-+-+---+---
>  a_i_seq       |          1 |            1 | 9223372036854775807 |
>     1 |           1 |       1 | f         | f
> (1 Zeile)
>
> itsme=# SELECT relkind FROM pg_class WHERE relname = 'a_i_seq';
>  relkind
> -
>  S
> (1 Zeile)
> >8
>
> Nicolas
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org)
> To make changes to your 
> subscription:http://www.postgresql.org/mailpref/pgsql-hackers

it work, thanx so much

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Thu, Apr 8, 2010 at 6:16 AM, Simon Riggs  wrote:
> If standby_mode is enabled and there is no source of WAL, then we get a
> stream of messages saying
>
> LOG:  record with zero length at 0/C88
> ...
>
> but most importantly we never get to the main recovery loop, so Hot
> Standby never gets to start at all. We can't keep retrying the request
> for WAL and at the same time enter the retry loop, executing lots of
> things that expect non-NULL pointers using a NULL xlog pointer.

This is pretty much a corner case, so I don't think it's a good idea
to add a new mode to handle it.  It also seems like it would be pretty
inconsistent if we allow WAL to be dropped in pg_xlog, but only if we
are also doing archive recovery or streaming replication.  If we can't
support this case with the same code path we use otherwise, I think we
should revert to disallowing it.

Having said that, I guess I don't understand how having a source of
WAL solves the problem described above.  Do we always have to read at
least 1 byte of WAL from either SR or the archive before starting up?
If not, why do we need to do so here?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 8:00 AM, Simon Riggs  wrote:
> On Thu, 2010-04-08 at 07:53 -0400, Robert Haas wrote:
>
>> > I do. I see no reason to do the latter, ever, so should not be added to
>> > any TODO.
>>
>> Well, stopping recovery earlier would mean fewer locks, which would
>> mean a better chance for the read-only backends to finish their work
>> and exit quickly.  But I'm not sure how much it's worth worrying
>> about.
>
> The purpose of the lock is to prevent access to objects when they are in
> inappropriate states for access. If we stopped startup and allowed
> access, how do we know that things are in sufficiently good state to
> allow access? We don't. If the Startup process is holding a lock then
> that is the only safe thing to do. Otherwise we might allow access to a
> table with a partially built index or other screw ups.

Hmm.  Good point.  I guess you could really only stop the startup
process safely when it wasn't holding any locks anyhow - you couldn't
just kill it and have it release the locks.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, 2010-04-08 at 07:53 -0400, Robert Haas wrote:

> > I do. I see no reason to do the latter, ever, so should not be added to
> > any TODO.
> 
> Well, stopping recovery earlier would mean fewer locks, which would
> mean a better chance for the read-only backends to finish their work
> and exit quickly.  But I'm not sure how much it's worth worrying
> about.

The purpose of the lock is to prevent access to objects when they are in
inappropriate states for access. If we stopped startup and allowed
access, how do we know that things are in sufficiently good state to
allow access? We don't. If the Startup process is holding a lock then
that is the only safe thing to do. Otherwise we might allow access to a
table with a partially built index or other screw ups.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 7:37 AM, Simon Riggs  wrote:
> On Thu, 2010-04-08 at 06:58 -0400, Robert Haas wrote:
>
>> >> Thanks.  Committed.
>> >
>> > Thanks. The following TODO item should be removed?
>> >
>> > "Redefine smart shutdown in standby mode to exist as soon as all
>> > read-only connections are gone."
>> > http://wiki.postgresql.org/wiki/Todo#Standby_server_mode
>
>> > Or change it to something like?
>> >
>> > "Change smart shutdown in standby mode so that it kills the startup
>> >  and walreceiver process before waiting for the regular backends to die 
>> > off"
>>
>> Yeah, we should do one of those two things, but I don't much care which.
>
> I do. I see no reason to do the latter, ever, so should not be added to
> any TODO.

Well, stopping recovery earlier would mean fewer locks, which would
mean a better chance for the read-only backends to finish their work
and exit quickly.  But I'm not sure how much it's worth worrying
about.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgadmin-hackers] Feature request: limited deletions

On Thu, Apr 8, 2010 at 7:05 AM, Thom Brown  wrote:
> On 8 April 2010 11:55, Ian Barwick  wrote:
>>
>> 2010/4/8 Thom Brown :
>> > I couldn't find any discussion on this, but the request is quite
>> > straightforward.  Implement a LIMIT on DELETE statements like SELECT
>> > statements.
>> >
>> > So you could write:
>> >
>> > DELETE FROM massive_table WHERE id < 4000 LIMIT 1;
>> >
>> > This would allow deletions in smaller batches rather than waiting
>> > potentially hours for the server to mark all those rows as deleted and
>> > commit it as one massive transaction.
>>
>> Is this a PgAdmin-specific question? If it is, apologies I am missing
>> the context.
>>
>> If not, this is totally the wrong list, but why not use a subquery to
>> control what is deleted?
>
> Erm... my mistake, I thought this was on the generic hackers list.  Moving
> it over in this reply.

I've certainly worked around the lack of this syntax more than once.
And I bet it's not even that hard to implement.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, 2010-04-08 at 06:58 -0400, Robert Haas wrote:

> >> Thanks.  Committed.
> >
> > Thanks. The following TODO item should be removed?
> >
> > "Redefine smart shutdown in standby mode to exist as soon as all
> > read-only connections are gone."
> > http://wiki.postgresql.org/wiki/Todo#Standby_server_mode

> > Or change it to something like?
> >
> > "Change smart shutdown in standby mode so that it kills the startup
> >  and walreceiver process before waiting for the regular backends to die off"
> 
> Yeah, we should do one of those two things, but I don't much care which.

I do. I see no reason to do the latter, ever, so should not be added to
any TODO.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Thu, 2010-04-08 at 13:33 +0300, Heikki Linnakangas wrote:

> > If standby_mode is enabled and there is no source of WAL, then we get a
> > stream of messages saying
> > 
> > LOG:  record with zero length at 0/C88
> > ...
> > 
> > but most importantly we never get to the main recovery loop, so Hot
> > Standby never gets to start at all. We can't keep retrying the request
> > for WAL and at the same time enter the retry loop, executing lots of
> > things that expect non-NULL pointers using a NULL xlog pointer.
> 
> You mean it can't find even the checkpoint record to start replaying? 

Clearly I don't mean that. Otherwise it wouldn't be "start from a
shutdown checkpoint". I think you are misunderstanding me.

Let me explain in more detail though please also read the patch before
replying, if you do.

The patch I submitted at top of this thread works for allowing Hot
Standby during recovery. Yes, of course that occurs when the database is
consistent. The trick is to get recovery to the point where it can be
enabled. The second patch on this thread presents a way to get the
database to that point; it touches some of the other recovery code that
you and Masao have worked on. We *must* touch that code if we are to
enable Hot Standby in the way you desire.

In StartupXlog() when we get to the point where we "Find the first
record that logically follows the checkpoint", in the current code
ReadRecord() loops forever, spitting out
LOG: record with zero length at 0/C88
...

That prevents us from going further down StartupXLog() to the point
where we start the InRedo loop and hence start hot standby. As long as
we retry we cannot progress further: this is the main problem.

So in the patch, I have modified the retry test in ReadRecord() so it no
longer retries iff there is no WAL source defined. Now, when
ReadRecord() exits, record == NULL at that point and so we do not (and
cannot) enter the redo loop.

So I have introduced the new mode ("snapshot mode") to enter hot standby
anyway. That avoids us having to screw around with the loop logic for
redo. I don't see any need to support the case of where we have no WAL
source defined, yet we want Hot Standby but we also want to allow
somebody to drop a WAL file into pg_xlog at some future point. That has
no use case of value AFAICS and is too complex to add at this stage of
the release cycle.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgadmin-hackers] Feature request: limited deletions

2010-04-08 Thread Thom Brown

On 8 April 2010 11:55, Ian Barwick  wrote:

> 2010/4/8 Thom Brown :
> > I couldn't find any discussion on this, but the request is quite
> > straightforward.  Implement a LIMIT on DELETE statements like SELECT
> > statements.
> >
> > So you could write:
> >
> > DELETE FROM massive_table WHERE id < 4000 LIMIT 1;
> >
> > This would allow deletions in smaller batches rather than waiting
> > potentially hours for the server to mark all those rows as deleted and
> > commit it as one massive transaction.
>
> Is this a PgAdmin-specific question? If it is, apologies I am missing
> the context.
>
> If not, this is totally the wrong list, but why not use a subquery to
> control what is deleted?
>
>
> Ian Barwick
>

Erm... my mistake, I thought this was on the generic hackers list.  Moving
it over in this reply.

Thom

Re: [HACKERS] Remaining Streaming Replication Open Items

On Thu, Apr 8, 2010 at 2:54 AM, Fujii Masao  wrote:
> On Thu, Apr 8, 2010 at 10:41 AM, Robert Haas  wrote:
>> On Wed, Apr 7, 2010 at 8:17 AM, Simon Riggs  wrote:
>>> OK, that looks a lot less risky than I had understood from discussions.
>>> The main thing for me is it doesn't interfere with Startup or
>>> WalReceiver, so assuming it works I've got no objections. Thanks for
>>> chasing this down, good addition.
>>
>> Thanks.  Committed.
>
> Thanks. The following TODO item should be removed?
>
> "Redefine smart shutdown in standby mode to exist as soon as all
> read-only connections are gone."
> http://wiki.postgresql.org/wiki/Todo#Standby_server_mode
>
> Or change it to something like?
>
> "Change smart shutdown in standby mode so that it kills the startup
>  and walreceiver process before waiting for the regular backends to die off"

Yeah, we should do one of those two things, but I don't much care which.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

On Thu, Apr 8, 2010 at 3:46 AM, Dave Page  wrote:
> On Thu, Apr 8, 2010 at 6:11 AM, Greg Stark  wrote:
>> Because the poster chose to send it to pgsql-admin instead of
>> pgsql-general (or pgsql-bugs) very few of the usual suspects had a
>> chance to see it. 7 days later a question about a rather serious
>> database corruption problem had no responses. I've never understand
>> what the point of pgsql-admin is;  just about every question posted is
>> an "admin" question of some sort.
>
> I can't argue with that... but a counter argument is that merging
> lists would significantly increase the traffic on -general would may
> not be appreciated by the many people that are only subscribed to one
> or two of the affected lists. I would wager that the majority of
> people aren't subscribed to more than a small number of the available
> lists.

Yeah.  I read -performance, -hackers, -bugs, but not -sql, -admin,
-general.  Consolidating multiple mailing lists to increase viewership
of certain messages is only going to work if everyone who now follows
each of the smaller mailing lists does an equally good job following
the bigger one.  That doesn't seem like a safe assumption.

I might be able to buy an argument that -admin is too fuzzy to be
readily distinguishable, although I don't really know since I don't
read it.  But -performance seems to have a fairly well-defined charter
and it's a subset of messages I enjoy reading.  Of course if some
performance questions get posted elsewhere, yeah, I'll miss them, but
oh well: reading every message on every topic hasn't seemed like a
good way to address that problem.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

Simon Riggs wrote:
> On Tue, 2010-04-06 at 10:22 +0100, Simon Riggs wrote:
> 
>> Initial patch. I will be testing over next day. No commit before at
>> least midday on Wed 7 Apr.
> 
> Various previous discussions sidelined a very important point: what
> exactly does it mean to "start recovery from a shutdown checkpoint"?

Hot standby should be possible as soon we know that the database is
consistent. That is, as soon as we've replayed WAL past the
minRecoveryPoint/backupStartPoint point indicated in pg_control.

> If standby_mode is enabled and there is no source of WAL, then we get a
> stream of messages saying
> 
> LOG:  record with zero length at 0/C88
> ...
> 
> but most importantly we never get to the main recovery loop, so Hot
> Standby never gets to start at all. We can't keep retrying the request
> for WAL and at the same time enter the retry loop, executing lots of
> things that expect non-NULL pointers using a NULL xlog pointer.

You mean it can't find even the checkpoint record to start replaying? I
think the behavior in that scenario is fine as it is. The database isn't
consistent (or at least we can't know if it is, because we don't know
the redo pointer) until you read and replay the first checkpoint record.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby: Startup at shutdown checkpoint

On Tue, 2010-04-06 at 10:22 +0100, Simon Riggs wrote:

> Initial patch. I will be testing over next day. No commit before at
> least midday on Wed 7 Apr.

Various previous discussions sidelined a very important point: what
exactly does it mean to "start recovery from a shutdown checkpoint"?

If standby_mode is enabled and there is no source of WAL, then we get a
stream of messages saying

LOG:  record with zero length at 0/C88
...

but most importantly we never get to the main recovery loop, so Hot
Standby never gets to start at all. We can't keep retrying the request
for WAL and at the same time enter the retry loop, executing lots of
things that expect non-NULL pointers using a NULL xlog pointer.

What we are asking for here is a completely new state: the database is
not "in recovery" - by definition there is nothing at all to recover. 

The following patch adds "Snapshot Mode", a very simple variation on the
existing code - emphasis on the "simple":

LOG:  entering snapshot mode
LOG:  record with zero length at 0/C88
LOG:  consistent recovery state reached at 0/C88
LOG:  database system is ready to accept read only connections

this mode does *not* continually check to see if new WAL files have been
added. Startup just sits and waits, backends allowed. If a trigger file
is specified, then we can leave recovery. Otherwise Startup process just
sits doing nothing.

There's possibly an argument for inventing some more special modes where
we do allow read only connections but don't start the bgwriter. I don't
personally wish to do this at this stage of the release cycle. The
attached patch is non-invasive and safe and I want to leave it at that.

I will be committing later today, unless major objections, but I ask you
to read the patch before you sharpen your pen. It's simple.

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e2566a4..365cd17 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1719,6 +1719,88 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
 }
 
 /*
+ * StandbyRecoverPreparedTransactions
+ *
+ * Scan the pg_twophase directory and setup all the required information to
+ * allow standby queries to treat prepared transactions as still active.
+ * This is never called at the end of recovery - we use
+ * RecoverPreparedTransactions() at that point.
+ *
+ * Currently we simply call SubTransSetParent() for any subxids of prepared
+ * transactions.
+ */
+void
+StandbyRecoverPreparedTransactions(bool can_overwrite)
+{
+	DIR		   *cldir;
+	struct dirent *clde;
+
+	cldir = AllocateDir(TWOPHASE_DIR);
+	while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
+	{
+		if (strlen(clde->d_name) == 8 &&
+			strspn(clde->d_name, "0123456789ABCDEF") == 8)
+		{
+			TransactionId xid;
+			char	   *buf;
+			TwoPhaseFileHeader *hdr;
+			TransactionId *subxids;
+			int			i;
+
+			xid = (TransactionId) strtoul(clde->d_name, NULL, 16);
+
+			/* Already processed? */
+			if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
+			{
+ereport(WARNING,
+		(errmsg("removing stale two-phase state file \"%s\"",
+clde->d_name)));
+RemoveTwoPhaseFile(xid, true);
+continue;
+			}
+
+			/* Read and validate file */
+			buf = ReadTwoPhaseFile(xid, true);
+			if (buf == NULL)
+			{
+ereport(WARNING,
+	  (errmsg("removing corrupt two-phase state file \"%s\"",
+			  clde->d_name)));
+RemoveTwoPhaseFile(xid, true);
+continue;
+			}
+
+			/* Deconstruct header */
+			hdr = (TwoPhaseFileHeader *) buf;
+			if (!TransactionIdEquals(hdr->xid, xid))
+			{
+ereport(WARNING,
+	  (errmsg("removing corrupt two-phase state file \"%s\"",
+			  clde->d_name)));
+RemoveTwoPhaseFile(xid, true);
+pfree(buf);
+continue;
+			}
+
+			/*
+			 * Examine subtransaction XIDs ... they should all follow main
+			 * XID, and they may force us to advance nextXid.
+			 */
+			subxids = (TransactionId *)
+(buf + MAXALIGN(sizeof(TwoPhaseFileHeader)));
+			for (i = 0; i < hdr->nsubxacts; i++)
+			{
+TransactionId subxid = subxids[i];
+
+Assert(TransactionIdFollows(subxid, xid));
+SubTransSetParent(xid, subxid, can_overwrite);
+			}
+		}
+	}
+	FreeDir(cldir);
+}
+
+/*
  * RecoverPreparedTransactions
  *
  * Scan the pg_twophase directory and reload shared-memory state for each
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3000ab7..d26b369 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -180,6 +180,7 @@ static TimestampTz recoveryLastXTime = 0;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyMode = false;
+static bool SnapshotMode = false;
 static char *PrimaryConnInfo = NULL;
 char	   *TriggerFile = NULL;
 
@@ -3845,7 +3846,7 @@ next_record_is_invalid:
 	}
 
 	/* In standby-mode, keep trying *

Re: [HACKERS] Oddly indented raw_expression_tree_walker

2010-04-08 Thread Tom Lane

Takahiro Itagaki  writes:
> I found raw_expression_tree_walker() is oddly indented in 8.4 and HEAD.
> I expected pgindent would fix those clutter, but it could not.
> Should we cleanup it manually, or leave it as-is?

There is exactly zero point in a manual cleanup, because pgindent will
just do it again next time.  If you want to try to fix pgindent,
though, have at it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Magnus Hagander

On Thu, Apr 8, 2010 at 9:46 AM, Dave Page  wrote:
> On Thu, Apr 8, 2010 at 6:11 AM, Greg Stark  wrote:
>> Because the poster chose to send it to pgsql-admin instead of
>> pgsql-general (or pgsql-bugs) very few of the usual suspects had a
>> chance to see it. 7 days later a question about a rather serious
>> database corruption problem had no responses. I've never understand
>> what the point of pgsql-admin is;  just about every question posted is
>> an "admin" question of some sort.
>
> I can't argue with that... but a counter argument is that merging
> lists would significantly increase the traffic on -general would may
> not be appreciated by the many people that are only subscribed to one
> or two of the affected lists. I would wager that the majority of
> people aren't subscribed to more than a small number of the available
> lists.

That's actually something we can easily find out, if we can get a list
of the subscribers emails into a Real Database. I know this bunch of
database geeks who write strange "ess-cue-ell kweriis", or whatever
they call it, to make such analysis...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walreceiver is uninterruptible on win32

2010-04-08 Thread Fujii Masao

On Wed, Apr 7, 2010 at 1:45 AM, Magnus Hagander  wrote:
> No, I don't mean that. I mean store it in one place, and copy/link it
> into where it's used. Look at for example how crypt.c and
> getaddrinfo.c are handled in libpq.

Thanks for the advice!

> Not sure how that will play with PGXS, though, but I'm not entirely
> sure we care if it can be built that way?

Probably Yes.

> If it does, there should be
> some way to get PGXS to execute that rule as well, I'm sure.

If we can copy/link the source file defining "new PQexec" when
we compile the dblink, DLL doesn't seem to be required. So I
stop creating new DLL for PGXS.

> Also note that per Tom's comments this is not a win32 only fix, so it
> shouldn't be called pgwin32_*().

Yep.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Dave Page

On Thu, Apr 8, 2010 at 8:46 AM, Dave Page  wrote:

> I can't argue with that... but a counter argument is ...

Yes, I know. Clearly it's coffee time :-p



-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()

On Thu, 2010-04-08 at 16:41 +0900, Fujii Masao wrote:
> >> > Why? The tli of the last WAL record received is always the
> >> > recovery target tli currently.
> >>
> >> True.
> >
> > Only in streaming mode. If you use the current TLI as I have suggested
> > then it will be correct in more cases.
> 
> pg_last_xlog_receive_location() might be executed also after archive
> recovery ends. In this case, using the current tli seems not correct
> because it's always different from the recovery target tli after recovery.

Which is why the code I write says "if (RecoveryInProgress())"

> >> Hmm, currently pg_last_xlog_receive_location() returns the last location
> >>  streamed via streaming replication. Should that be changed so that it
> >> also advances when a WAL segment is restored from archive? It seems
> >> strange that pg_last_xlog_receive_location() can be smaller than
> >> pg_last_xlog_replay_location().
> >
> > Yes, it should be changed.
> 
> Should it advance when WAL file in pg_xlog is read? If not,
> pg_last_xlog_receive_location() can be smaller than
> pg_last_xlog_replay_location().
> 
> And, how far should it advance when WAL file is
> partially-filled for some reasons?

Just make pg_last_xlog_receive_location() do exactly the same thing as
pg_last_xlog_replay_location() when working with files. No need to try
to keep two things exactly in sync.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A maze of twisty mailing lists all the same

2010-04-08 Thread Dave Page

On Thu, Apr 8, 2010 at 6:11 AM, Greg Stark  wrote:
> Because the poster chose to send it to pgsql-admin instead of
> pgsql-general (or pgsql-bugs) very few of the usual suspects had a
> chance to see it. 7 days later a question about a rather serious
> database corruption problem had no responses. I've never understand
> what the point of pgsql-admin is;  just about every question posted is
> an "admin" question of some sort.

I can't argue with that... but a counter argument is that merging
lists would significantly increase the traffic on -general would may
not be appreciated by the many people that are only subscribed to one
or two of the affected lists. I would wager that the majority of
people aren't subscribed to more than a small number of the available
lists.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()

2010-04-08 Thread Fujii Masao

On Thu, Apr 8, 2010 at 4:06 PM, Simon Riggs  wrote:
> On Thu, 2010-04-08 at 09:54 +0300, Heikki Linnakangas wrote:
>> Fujii Masao wrote:
>> > On Wed, Apr 7, 2010 at 7:23 PM, Heikki Linnakangas
>> >  wrote:
>> >> This commit is a stop-gap solution until we figure out what exactly to
>> >> do about that. Masao-san wrote a patch that included the TLI in the
>> >> string returned by pg_last_xlog_receive/replay_location() (see
>> >> http://archives.postgresql.org/message-id/3f0b79eb1003030603ibd0cbadjebb09fa424930...@mail.gmail.com
>> >> and
>> >> http://archives.postgresql.org/message-id/3f0b79eb1003300214r6cf98c46tc9be5d563ccf4...@mail.gmail.com),
>> >> but it still wasn't clear it did the right thing in corner-cases where
>> >> the TLI changes. Using GetRecoveryTargetTLI() for the tli returned by
>> >> pg_last_receive_location() seems bogus, at least.
>> >
>> > Why? The tli of the last WAL record received is always the
>> > recovery target tli currently.
>>
>> True.
>
> Only in streaming mode. If you use the current TLI as I have suggested
> then it will be correct in more cases.

pg_last_xlog_receive_location() might be executed also after archive
recovery ends. In this case, using the current tli seems not correct
because it's always different from the recovery target tli after recovery.

>> Hmm, currently pg_last_xlog_receive_location() returns the last location
>>  streamed via streaming replication. Should that be changed so that it
>> also advances when a WAL segment is restored from archive? It seems
>> strange that pg_last_xlog_receive_location() can be smaller than
>> pg_last_xlog_replay_location().
>
> Yes, it should be changed.

Should it advance when WAL file in pg_xlog is read? If not,
pg_last_xlog_receive_location() can be smaller than
pg_last_xlog_replay_location().

And, how far should it advance when WAL file is
partially-filled for some reasons?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset()