Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Jul 11, 2007 at 12:38:09AM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] writes:
[...]
> > It might make sense then to clear the pg_twophase directory on DB
> > startup.
> 
>   I fear you have 100% misunderstood the point.  The *only*
> reason for that feature is to survive DB crashes.

Ah -- so it is intentional that it keeps the DB from starting again.
OK, then I misunderstood. Sorry for the noise.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGlJG5Bcgs9XrR2kYRApC9AJsF+wm9z5zJXpZ98ThuV/gn9ozpVwCfbf3L
G4OA0pu3rh/o2rOL/OvZ9bU=
=+fd6
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bgwriter strategies

2007-07-11 Thread Heikki Linnakangas
In the last couple of days, I've been running a lot of DBT-2 tests and 
smaller microbenchmarks with different bgwriter settings and 
experimental patches, but I have not been able to produce a repeatable 
test case where any of the bgwriter configurations perform better than 
not having bgwriter at all.


I encountered a strange phenomenon that I don't understand. I ran a 
small test case with DELETEs in random order, using an index, on a table 
~300MB table, with shared_buffers smaller than that. I expected that to 
be dominated by the speed postgres can swap pages in and out of the 
shared buffer cache, but surprisingly the test starts to block on the 
write I/O, even though the table fits completely in OS cache. I was able 
to reproduce the phenomenon with a simple C program that writes 8k 
blocks in random order to a fixed size file. I've attached it along with 
output of running it on my test server. The output shows how the writes 
start to periodically block after a while. I was able to reproduce the 
problem on my laptop as well. Can anyone explain what's going on?


Anyone out there have a repeatable test case where bgwriter helps?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
  int fd;
  off_t len;
  char buf[8192];
  int i;
  int size;
  struct timeval begin_t;

  if (argc != 3)
  {
printf("Usage: writetest  \n");
exit(1);
  }

  fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, S_IWUSR | S_IRUSR);
  if (fd == -1)
  {
perror(NULL);
exit(1);
  }
  size = atoi(argv[2]) * 1024 * 1024;

  for(i=0; i < size;)
i += write(fd, buf, sizeof(buf));

  len = i;

  fsync(fd);

  gettimeofday(&begin_t, NULL);
  for(i = 0; i < 1000; i++)
  {
lseek(fd, ((random() % (len / sizeof(buf * sizeof(buf), SEEK_SET);
write(fd, buf, sizeof(buf));
if(i % 4 == 0)
{
  struct timeval t;
  long msecs;

  gettimeofday(&t, NULL);
  msecs = (t.tv_sec - begin_t.tv_sec) * 1000 +(t.tv_usec - begin_t.tv_usec) / 1000;
  printf("%d blocks written, time=%ld ms\n", i, msecs);
  begin_t = t;
}
  }
}
./writetest /mnt/data/writetest-data 80
0 blocks written, time=0 ms
4 blocks written, time=251 ms
8 blocks written, time=241 ms
12 blocks written, time=241 ms
16 blocks written, time=241 ms
20 blocks written, time=242 ms
24 blocks written, time=242 ms
28 blocks written, time=241 ms
32 blocks written, time=241 ms
36 blocks written, time=242 ms
40 blocks written, time=241 ms
44 blocks written, time=241 ms
48 blocks written, time=241 ms
52 blocks written, time=242 ms
56 blocks written, time=241 ms
60 blocks written, time=241 ms
64 blocks written, time=242 ms
68 blocks written, time=242 ms
72 blocks written, time=242 ms
76 blocks written, time=241 ms
80 blocks written, time=242 ms
84 blocks written, time=4579 ms
88 blocks written, time=244 ms
92 blocks written, time=242 ms
96 blocks written, time=4752 ms
100 blocks written, time=241 ms
104 blocks written, time=4618 ms
108 blocks written, time=242 ms
112 blocks written, time=4614 ms
116 blocks written, time=246 ms
120 blocks written, time=243 ms
124 blocks written, time=4619 ms
128 blocks written, time=242 ms
132 blocks written, time=242 ms
136 blocks written, time=4605 ms
140 blocks written, time=242 ms


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Bgwriter strategies

2007-07-11 Thread Pavan Deolasee

On 7/11/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:


I was able
to reproduce the phenomenon with a simple C program that writes 8k
blocks in random order to a fixed size file. I've attached it along with
output of running it on my test server. The output shows how the writes
start to periodically block after a while. I was able to reproduce the
problem on my laptop as well. Can anyone explain what's going on?




I think you are assuming that the next write of the same block won't
use another OS cache block. I doubt if thats the way writes are handled
by the kernel. Each write would typically end up being queued up in the
kernel
where each write will have its own copy of the block to the written. Isn't
it ?


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Bgwriter strategies

2007-07-11 Thread Alvaro Herrera
Pavan Deolasee escribió:
> On 7/11/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
> >
> >I was able
> >to reproduce the phenomenon with a simple C program that writes 8k
> >blocks in random order to a fixed size file. I've attached it along with
> >output of running it on my test server. The output shows how the writes
> >start to periodically block after a while. I was able to reproduce the
> >problem on my laptop as well. Can anyone explain what's going on?
> >
> I think you are assuming that the next write of the same block won't
> use another OS cache block. I doubt if thats the way writes are handled
> by the kernel. Each write would typically end up being queued up in the
> kernel
> where each write will have its own copy of the block to the written. Isn't
> it ?

I don't think so -- at least not on Linux.  See
https://ols2006.108.redhat.com/2007/Reprints/zijlstra-Reprint.pdf
where he talks about a patch to the page cache.  He describes the
current page cache there; each page is kept on a tree, so a second write
to the same page would "overwrite" the page of the original write.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Las mujeres son como hondas:  mientras más resistencia tienen,
 más lejos puedes llegar con ellas"  (Jonas Nightingale, Leap of Faith)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] doubt

2007-07-11 Thread Narasimha Rao P.A
Does postgreSQL support distributive query processing

   
-
 Get the freedom to save as many mails as you wish. Click here to know how.

Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Chris Browne
[EMAIL PROTECTED] writes:
> On Wed, Jul 11, 2007 at 12:38:09AM -0400, Tom Lane wrote:
>> [EMAIL PROTECTED] writes:
> [...]
>> > It might make sense then to clear the pg_twophase directory on DB
>> > startup.
>> 
>>   I fear you have 100% misunderstood the point.  The *only*
>> reason for that feature is to survive DB crashes.
>
> Ah -- so it is intentional that it keeps the DB from starting again.
> OK, then I misunderstood. Sorry for the noise.

I don't think that is so much "intentional" as it is an "emergent
property."

The usual point to 2PC is that once transactions are PREPAREd, they
*need* to be stored robustly enough to survive even a DB crash.

If one locks certain vital system resources, as part of that PREPAREd
transaction, that evidently causes some problems, alas...

The right resolution to this is not, a priori, evident yet.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "linuxdatabases.info")
http://cbbrowne.com/info/linuxdistributions.html
Rules of the Evil Overlord #38. "If  an enemy I have just killed has a
younger sibling or offspring anywhere,  I will find them and have them
killed immediately, instead  of waiting for them to  grow up harboring
feelings of vengeance towards me in my old age."


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Bgwriter strategies

2007-07-11 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> I think you are assuming that the next write of the same block won't
> use another OS cache block. I doubt if thats the way writes are handled
> by the kernel. Each write would typically end up being queued up in the
> kernel
> where each write will have its own copy of the block to the written. Isn't
> it ?

A kernel that worked like that would have a problem doing read(), ie,
it'd have to search to find the latest version of the block.  So I'd
expect that most systems would prefer to keep only one in-memory copy
of any given block and overwrite it at write() time.  No sane kernel
designer will optimize write() at the expense of read() performance,
especially when you consider that a design as above really pessimizes
write() too --- it does more I/O than is necessary when the same block
is modified repeatedly in a short time.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Andrew Sullivan
On Wed, Jul 11, 2007 at 10:43:23AM -0400, Chris Browne wrote:
> The right resolution to this is not, a priori, evident yet.

_A posteriori_, though, it seems to me the right resolution is "don't
do that" ;-)

A
-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."
--Damien Katz

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PQescapeBytea* version for parameters

2007-07-11 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> Do we want something like this which provides a PQescapeByteaParam for
>> escaping bytea strings before passing them as text-mode parameters in
>> PQexecParam?
>
> Seems a lot easier and more efficient to just pass out-of-line bytea
> parameters as binary mode.

Hm, the cause of the problem with using PQescapeBytea with
standard_comforming_strings as a cheap substitute for an actual
PQescapeByteaParam is that it currently escapes ' as '' regardless of the
setting of standard_conforming_string.

else if (*vp == '\'')
{
*rp++ = '\'';
*rp++ = '\'';
}

Shouldn't it escape ' as \' and not '' if standard_conforming_strings is
false?

What I would actually suggest is that it just escape ' and \ the same way it
does binary characters by inserting the bytea escapes \047 and \134. That
actually simplifies the code quite a bit and avoids a lot of special cases for
standard_conforming_strings.


Index: fe-exec.c
===
RCS file: /home/stark/src/REPOSITORY/pgsql/src/interfaces/libpq/fe-exec.c,v
retrieving revision 1.192
diff -u -r1.192 fe-exec.c
--- fe-exec.c   5 Jan 2007 22:20:01 -   1.192
+++ fe-exec.c   11 Jul 2007 15:34:25 -
@@ -2755,28 +2755,13 @@
vp = from;
for (i = from_length; i > 0; i--, vp++)
{
-   if (*vp < 0x20 || *vp > 0x7e)
+   if (*vp < 0x20 || *vp > 0x7e || *vp == '\'' || *vp == '\\')
{
if (!std_strings)
*rp++ = '\\';
(void) sprintf((char *) rp, "\\%03o", *vp);
rp += 4;
}
-   else if (*vp == '\'')
-   {
-   *rp++ = '\'';
-   *rp++ = '\'';
-   }
-   else if (*vp == '\\')
-   {
-   if (!std_strings)
-   {
-   *rp++ = '\\';
-   *rp++ = '\\';
-   }
-   *rp++ = '\\';
-   *rp++ = '\\';
-   }
else
*rp++ = *vp;
}


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] minor compiler warning on OpenBSD

2007-07-11 Thread Stefan Kaltenbrunner
while looking at some other stuff I noticed that we have the following
compiler warning on OpenBSD 4.0/amd64 with the OS supplied compiler:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wendif-labels -fno-strict-aliasing -DFRONTEND
-I../../../src/interfaces/libpq -I../../../src/include
-I/usr/include/kerberosV/  -c -o initdb.o initdb.c
initdb.c: In function `locale_date_order':
initdb.c:2187: warning: `%x' yields only last 2 digits of year in some
locales


$ gcc -v
Reading specs from /usr/lib/gcc-lib/amd64-unknown-openbsd4.0/3.3.5/specs
Configured with:
Thread model: single
gcc version 3.3.5 (propolice)


Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Heikki Linnakangas

Chris Browne wrote:

If one locks certain vital system resources, as part of that PREPAREd
transaction, that evidently causes some problems, alas...

The right resolution to this is not, a priori, evident yet.


It's not? I agree with Tom here; this is just one of the numerous things 
you can do to screw up your database as a superuser. Why would you LOCK 
the pg_auth table, or any other system table for that matter, in the 
first place? Let alone in a distributed transaction.


FWIW, deleting the files from pg_twophase is safe when the system is 
shut down.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PQescapeBytea* version for parameters

2007-07-11 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> Shouldn't it escape ' as \' and not '' if standard_conforming_strings is
> false?

No.  That's always worked and there's no reason to change it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] minor compiler warning on OpenBSD

2007-07-11 Thread Tom Lane
Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
> while looking at some other stuff I noticed that we have the following
> compiler warning on OpenBSD 4.0/amd64 with the OS supplied compiler:
> initdb.c:2187: warning: `%x' yields only last 2 digits of year in some
> locales

Yeah, mine complains about that too.  Peter's response was "get a newer
compiler" --- apparently the gcc folks thought better of this warning
after the Y2K panic subsided.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] doubt

2007-07-11 Thread Joshua D. Drake

Narasimha Rao P.A wrote:

Does postgreSQL support distributive query processing


No.



Get the freedom to save as many mails as you wish. Click here to know 
how. 
 




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] PQescapeBytea* version for parameters

2007-07-11 Thread Bruce Momjian
Tom Lane wrote:
> Gregory Stark <[EMAIL PROTECTED]> writes:
> > Shouldn't it escape ' as \' and not '' if standard_conforming_strings is
> > false?
> 
> No.  That's always worked and there's no reason to change it.

'' is more standard than \' so we always use ''.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Need help with autoconf

2007-07-11 Thread Magnus Hagander
Hi!

I'm trying to write an autoconf macro to figure out if the function
krb5_free_unparsed_name exists (because it exists in MIT but not Heimdal),
to fix a rather nasty bug in our Kerberos implementation.

However, I'm failing :(

I'm simply using
AC_CHECK_FUNC([krb5_free_unparsed_name])

which works fine on unix, but breaks on win32. Because autoconf tries the
function with no parameters, which doesn't work due to win32 decorations.
The function is declared in krb5.h - is there some way to make autoconf
load that header file and use the declaration from there?

(If I manually set HAVE_KRB5_FREE_UNPARSED_NAME, I can perfectly well *use*
the function as long as I put the correct number of arguments in there)

//Magnus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Need help with autoconf

2007-07-11 Thread Tom Lane
Magnus Hagander <[EMAIL PROTECTED]> writes:
> I'm simply using
> AC_CHECK_FUNC([krb5_free_unparsed_name])
> which works fine on unix, but breaks on win32. Because autoconf tries the
> function with no parameters, which doesn't work due to win32 decorations.

Doesn't work why?  We have dozens of other functions we check for
without needing any special windoze hacks.  Is it a macro?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Andrew Sullivan
On Wed, Jul 11, 2007 at 04:44:12PM +0100, Heikki Linnakangas wrote:
> FWIW, deleting the files from pg_twophase is safe when the system is 
> shut down.

Is it safe for the PREPAREd TRANSACTIONs?  I assume not.  That is, in
Peter's presumably experimental case, it might be ok to delete the
files, but on a production system, you'd violate the semantics of 2PC
by doing this?

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."
--Damien Katz

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Heikki Linnakangas

Andrew Sullivan wrote:

On Wed, Jul 11, 2007 at 04:44:12PM +0100, Heikki Linnakangas wrote:
FWIW, deleting the files from pg_twophase is safe when the system is 
shut down.


Is it safe for the PREPAREd TRANSACTIONs?  I assume not.  That is, in
Peter's presumably experimental case, it might be ok to delete the
files, but on a production system, you'd violate the semantics of 2PC
by doing this?


It's effectively the same as manually issuing a ROLLBACK PREPARED. It 
will brake the atomicity of the global transaction, if some branches of 
that global transaction in other resource managers have already been 
committed.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Need help with autoconf

2007-07-11 Thread Magnus Hagander
Tom Lane wrote:
> Magnus Hagander <[EMAIL PROTECTED]> writes:
>> I'm simply using
>> AC_CHECK_FUNC([krb5_free_unparsed_name])
>> which works fine on unix, but breaks on win32. Because autoconf tries the
>> function with no parameters, which doesn't work due to win32 decorations.
> 
> Doesn't work why?  We have dozens of other functions we check for
> without needing any special windoze hacks.  Is it a macro?

No, it actually depends on how the library is compiled. Functions can
either be exported decorated or not. This one is exported decorated.
Also, if it just checks for a function with zero arguments, decorated
and non-decorated look the same.

//magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Need help with autoconf

2007-07-11 Thread Tom Lane
Magnus Hagander <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Magnus Hagander <[EMAIL PROTECTED]> writes:
>>> I'm simply using
>>> AC_CHECK_FUNC([krb5_free_unparsed_name])
>>> which works fine on unix, but breaks on win32. Because autoconf tries the
>>> function with no parameters, which doesn't work due to win32 decorations.
>> 
>> Doesn't work why?  We have dozens of other functions we check for
>> without needing any special windoze hacks.  Is it a macro?

> No, it actually depends on how the library is compiled. Functions can
> either be exported decorated or not. This one is exported decorated.

It's still not apparent to me how this function is different from every
other one we check for; but I'd suggest you write a check that looks
like the ones we use for functions that might be macros, eg sigsetjmp.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Andrew Sullivan
On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:
> It's effectively the same as manually issuing a ROLLBACK PREPARED. It 
> will brake the atomicity of the global transaction, if some branches of 
> that global transaction in other resource managers have already been 
> committed.

But how do you know which file to delete?  Is it keyed to the
transaction identifier or something?

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
However important originality may be in some fields, restraint and 
adherence to procedure emerge as the more significant virtues in a 
great many others.   --Alain de Botton

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Peter Eisentraut
Heikki Linnakangas wrote:
> It's not? I agree with Tom here; this is just one of the numerous
> things you can do to screw up your database as a superuser. Why would
> you LOCK the pg_auth table, or any other system table for that
> matter, in the first place? Let alone in a distributed transaction.

Well, my test case arose from a real application scenario, not an 
attempt to destroy my database system.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Heikki Linnakangas

Peter Eisentraut wrote:

Heikki Linnakangas wrote:

It's not? I agree with Tom here; this is just one of the numerous
things you can do to screw up your database as a superuser. Why would
you LOCK the pg_auth table, or any other system table for that
matter, in the first place? Let alone in a distributed transaction.


Well, my test case arose from a real application scenario, not an 
attempt to destroy my database system.


Why does the application LOCK pg_auth?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Peter Eisentraut wrote:
>> Heikki Linnakangas wrote:
>>> It's not? I agree with Tom here; this is just one of the numerous
>>> things you can do to screw up your database as a superuser. Why would
>>> you LOCK the pg_auth table, or any other system table for that
>>> matter, in the first place? Let alone in a distributed transaction.
>> 
>> Well, my test case arose from a real application scenario, not an 
>> attempt to destroy my database system.

> Why does the application LOCK pg_auth?

Even if there is a reason for a lock, surely it's not necessary to use
AccessExclusiveLock.  A lesser lock would synchronize whatever the heck
it's doing without locking out readers.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Tom Lane
Andrew Sullivan <[EMAIL PROTECTED]> writes:
> On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:
>> It's effectively the same as manually issuing a ROLLBACK PREPARED.

> But how do you know which file to delete?

You don't.  In extremis you could probably throw together some
inspection tool that could look though the 2PC records to find out which
file mentioned an exclusive lock on pg_authid ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Peter Eisentraut
Heikki Linnakangas wrote:

> Why does the application LOCK pg_auth?

It does it with NOWAIT to determine if some other connection had already 
locked it (because it was modifying some roles) in order not to lock up 
the program.  This (or something like it, because this doesn't work, 
after all) is unfortunately necessary because schema changes don't obey 
ordinary snapshot rules.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Heikki Linnakangas

Andrew Sullivan wrote:

On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:
It's effectively the same as manually issuing a ROLLBACK PREPARED. It 
will brake the atomicity of the global transaction, if some branches of 
that global transaction in other resource managers have already been 
committed.


But how do you know which file to delete?  Is it keyed to the
transaction identifier or something?


The xid is encoded in the filename. If you can't start up the database 
and look at pg_locks, you can't do much other than guess.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Andrew Sullivan
On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote:
> 
> The xid is encoded in the filename. If you can't start up the database 
> and look at pg_locks, you can't do much other than guess.

So then in this sort of case, it isn't _really_ safe to delete those
files, because the commitment you made before crash when you accepted
a PREPARE TRANSACTION is going to be gone, which violates the 2PC
rules.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
When my information changes, I alter my conclusions.  What do you do sir?
--attr. John Maynard Keynes

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] minor compiler warning on OpenBSD

2007-07-11 Thread Stefan Kaltenbrunner
Tom Lane wrote:
> Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
>> while looking at some other stuff I noticed that we have the following
>> compiler warning on OpenBSD 4.0/amd64 with the OS supplied compiler:
>> initdb.c:2187: warning: `%x' yields only last 2 digits of year in some
>> locales
> 
> Yeah, mine complains about that too.  Peter's response was "get a newer
> compiler" --- apparently the gcc folks thought better of this warning
> after the Y2K panic subsided.

hmm ok - but at least on openbsd we will have to accept that warning for
 a few years to go (4.1 shipped with 3.3.5 and it seems that the
upcoming 4.2 is not getting an upgraded compiler either)


Stefan

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] "Running readonly queried on PITR slaves" statusupdate

2007-07-11 Thread Simon Riggs
On Tue, 2007-07-10 at 02:41 +0200, Florian G. Pflug wrote:

> After struggling with understanding xlog.c and friends far enough to
> be able to refactor StartupXLOG to suit the needs of concurrent recovery,
> I think I've finally reached a workable (but still a bit hacky) solution.

Sounds like great progress.

> My design is centered around the idea of a bgreplay process that
> takes over the role of the bgwriter in readonly mode, and continously
> replays WALs as they arrive. But since recovery during startup is
> still necessary (We need to bring a filesystem-level backup into a
> consistent state - past minRecoveryLoc - before allowing connections),
> this means doing recovery in two steps, from two different processes.
> 
> I've changed StartupXLOG to only recover up to minRecoveryLoc in readonly
> mode, and to skip all steps that are not required if no writes to
> the database will be done later (Especially creating a checkpoint at
> the end of recovery). Instead, it posts the pointer to the last recovered
> xlog record to shared memory.

The split of processing sounds correct. I think we need to consider
whether the startup process should stay around longer or we go for
bgreplay. Having said that, it isn't important right now, so lets bypass
that and carry on with the investigation into other unknowns.

> bgreplay than uses that pointer for an initial call to ReadRecord to
> setup WAL reading for the bgreplay process. Afterwards, it repeatedly
> calls ReplayXLOG (new function), which always replays at least
> one record (If there is one, otherwise it returns false), until
> it reaches a safe restart point.

Sounds good.

> Currently, in my test setup, I can start a slave in readonly mode and
> it will do initial recovery, bring postgres online, and continously
> recover from inside bgreplay. There isn't yet any locking between
> wal replay and queries.

I think we need a way of signalling to backends that the recovery is
still in progress or not. New transactions should then be readonly or
readwrite as appropriate. I think maybe you've thought of this already
from your earlier posts?

> I'll add that locking during the new few days, which should result
> it a very early prototype. The next steps will then be finding a way
> to flush backend caches after replaying code that modified system
> tables, and (related) finding a way to deal with the flatfiles.

Seems like the replay should do that and then send out a signal when
replay comes to a halt. Sounds like the major area of complexity, so
stay away from the minefield.

> I'd appreciate any comments on this, especially those pointing
> out problems that I overlooked.

Right now, we're looking to uncover problems not solve them.

You're blazing a trail here, so let's isolate problems and return to
them later. The faster we get to the point that we can run a real SELECT
query the better this will be. That is the half-way point, so get there
as fast as you can and then we can re-evaluate the issues that causes.
Much better to achieve the goal with a long list of caveats than to fall
short, yet have solved a number of smaller problems. We may need to
start again from scratch for the final version. PITR as originally
committed was version 5, and we're probably around version 20 now.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Heikki Linnakangas

Andrew Sullivan wrote:

On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote:
The xid is encoded in the filename. If you can't start up the database 
and look at pg_locks, you can't do much other than guess.


So then in this sort of case, it isn't _really_ safe to delete those
files, because the commitment you made before crash when you accepted
a PREPARE TRANSACTION is going to be gone, which violates the 2PC
rules.


Yes, though if neither you nor the transaction manager can connect, you 
don't have much choice.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Simon Riggs
On Tue, 2007-07-10 at 10:41 -0400, Tom Lane wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > The following command sequence appears to lock up the database system:
> > BEGIN;
> > LOCK pg_authid;
> > PREPARE TRANSACTION 'foo';
> > \q
> 
> > After that you can't connect anymore, even in single-user mode.  The
> > only way I could find is to clear out the pg_twophase directory, but
> > I'm not sure whether it is safe to do that.
> 
> > Should this be prevented somehow, and is there a better recovery path?
> 
> AFAICS this is just one of many ways in which a superuser can shoot
> himself in the foot; I'm not eager to try to prevent it.
> 
> Right offhand, clearing pg_twophase while the system is stopped should
> be safe enough.

Safe from the perspective of the rest of the system. The prepared
transactions will clearly be lost and that might be worth millions.

I'm concerned that this advice will lead to clearing pg_twophase every
time that the system won't start properly.

I'd be much more comfortable if LOCK TABLE caused a message to the log
if it is executed on any system table. I can't really see a reason to
allow a user the ability to explicitly lock out a system table and would
prefer if that were banned completely. It's DoS if nothing else. A
simple check on LOCK TABLE won't cost much in the normal execution path.

There seems like a number of ways that unresolved prepared transactions
can cause problems. We really need to have startup mention how many
prepared transactions there are, so we have some chance of understanding
and resolving potential problems. Without such a message we might well
experience downtimes of many hours before somebody thinks to check
pg_twophase and that runs against our goal of higher availability.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Simon Riggs
On Wed, 2007-07-11 at 22:33 +0100, Heikki Linnakangas wrote:
> Andrew Sullivan wrote:
> > On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote:
> >> The xid is encoded in the filename. If you can't start up the database 
> >> and look at pg_locks, you can't do much other than guess.
> > 
> > So then in this sort of case, it isn't _really_ safe to delete those
> > files, because the commitment you made before crash when you accepted
> > a PREPARE TRANSACTION is going to be gone, which violates the 2PC
> > rules.
> 
> Yes, though if neither you nor the transaction manager can connect, you 
> don't have much choice.

True, but I'm worried that this discussion will lead, via Google, to the
impression that if you are having connection problems the best thing to
do is to delete everything in pg_twophase. There are hundreds of other
issues that might prevent connection and it would require significant
expertise to isolate this as the error. I would prefer to explicitly
avoid this kind of error, so that we can return to the idea that
removing pg_twophase is never a requirement.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Count(*) throws error

2007-07-11 Thread Simon Riggs
On Wed, 2007-07-11 at 17:42 -0400, Tom Lane wrote:
> "Jasbinder Singh Bali" <[EMAIL PROTECTED]> writes:
> > I'm using the following statement in my plpgsql function
> 
> > SELECT INTO no_rows COUNT(*) FROM tbl_concurrent;
> 
> > I have decalred no_rows int4 and initialized it to zero
> 
> > Running the function throws the following error:
> 
> > ERROR:  syntax error at or near "(" at character 13
> > QUERY:  SELECT   $1 (*) FROM tbl_concurrent
> 
> I'll bet a nickel you have a local variable named "count" in that
> function, and plpgsql is blindly trying to substitute its value into
> the SQL query.  The replacement of "COUNT" by " $1 " in the query
> text is the tip-off.

I came across a boat load of these the other day. Seems fairly naff that
we substitute variables blindly.

Seems like we could be slightly more friendly without too much bother:
at least only substitute after the VALUES clause in INSERT. We really
shouldn't substitute "var = var" to "$n = $n" either; am I right in
thinking the latter would happen silently and cause potential error?

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes:
> I'd be much more comfortable if LOCK TABLE caused a message to the log
> if it is executed on any system table.

Enabled by "set training_wheels = on", perhaps?

This is really pretty silly to be getting worked up about.  The command
in question wouldn't have been allowed at all except to a superuser,
and there are plenty of ways to catastrophically destroy your database
when you are superuser; most of which we will never consider blocking
for the same reasons that Unix systems have never tried to block root
from doing "rm -rf /".  I'd say the real design flaw in Peter's
referenced application is that they're running it as superuser.

> There seems like a number of ways that unresolved prepared transactions
> can cause problems. We really need to have startup mention how many
> prepared transactions there are, so we have some chance of understanding
> and resolving potential problems.

While I have no particular objection to such a log entry, I doubt it
will fix anything; how many people will really think to look in the
postmaster log?  In any case, most of the problems I've personally run
into with prepared xacts have nothing to do with crashes and so nothing
like that would ever get emitted.  (The typical way I get bitten is to
interrupt the regression tests because I changed my mind about
something, and manage to do this just while the prepared_xacts test has
some open prepared xacts.)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Count(*) throws error

2007-07-11 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes:
> Seems like we could be slightly more friendly without too much bother:
> at least only substitute after the VALUES clause in INSERT.

Surely you jest.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] minor compiler warning on OpenBSD

2007-07-11 Thread Alvaro Herrera
Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
> > Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
> >> while looking at some other stuff I noticed that we have the following
> >> compiler warning on OpenBSD 4.0/amd64 with the OS supplied compiler:
> >> initdb.c:2187: warning: `%x' yields only last 2 digits of year in some
> >> locales
> > 
> > Yeah, mine complains about that too.  Peter's response was "get a newer
> > compiler" --- apparently the gcc folks thought better of this warning
> > after the Y2K panic subsided.
> 
> hmm ok - but at least on openbsd we will have to accept that warning for
>  a few years to go (4.1 shipped with 3.3.5 and it seems that the
> upcoming 4.2 is not getting an upgraded compiler either)

My local manpage for strftime says that we can get around this warning
by overloading it with something like

size_t
my_strftime(char *s, size_t max, const char *fmt,
const struct tm *tm)
{
return strftime(s, max, fmt, tm);
}


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] doubt

2007-07-11 Thread Greg Smith

On Wed, 11 Jul 2007, Narasimha Rao P.A wrote:


Does postgreSQL support distributive query processing


Not internally.  It's possible in some situations to split queries up 
across multiple nodes using add-on software.  pgpool-II, available at 
http://pgfoundry.org/projects/pgpool/ provides an implementation of 
distributed queries if your table has a type of key such that you split 
across it, but it's relatively immature software and you would have to 
look at it very carefully to see if that parallel query implementation 
could fit your needs.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] minor compiler warning on OpenBSD

2007-07-11 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> My local manpage for strftime says that we can get around this warning
> by overloading it with something like

> size_t
> my_strftime(char *s, size_t max, const char *fmt,
>   const struct tm *tm)
> {
>   return strftime(s, max, fmt, tm);
> }

Hey, that works nicely.  On my version of gcc, it suppresses the warning
even if my_strftime is marked "static inline", which should mean that
there's no runtime penalty.

I've committed the patch to HEAD --- Stefan, would you check if it
silences your version of gcc?

Now if we could only get rid of those flex-induced warnings in ecpg...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Count(*) throws error

2007-07-11 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes:
> Seems like we could be slightly more friendly without too much bother:

Actually, rather than get into that sort of AI-complete project,
it strikes me that the most useful response is user education.  There
ought to be a section in the plpgsql documentation explaining exactly
how the substitution works and showing a couple examples of the pitfalls.
I'll try to draft something up.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] "tuple concurrently updated" during index deletion

2007-07-11 Thread Joe Conway
On cvs head, I can get "tuple concurrently updated" if two separate 
transactions are both trying to drop the same index:


8<

  contrib_regression=# create table t(f1 int);
  CREATE TABLE
  contrib_regression=# create index idx1 on t(f1);
  CREATE INDEX
  contrib_regression=# begin;
  BEGIN
  contrib_regression=# drop index idx1;
  DROP INDEX


  contrib_regression=# drop index idx1;


  contrib_regression=# commit;
  COMMIT


  ERROR:  tuple concurrently updated
8<


The backtrace for session #2 looks like:
8<
#0  errfinish (dummy=0) at elog.c:293
#1  0x006fb15d in elog_finish (elevel=20, fmt=0x73bda1 "tuple 
concurrently updated") at elog.c:937
#2  0x0046c25f in simple_heap_delete (relation=0x2aac4990, 
tid=0xba8e44) at heapam.c:2006
#3  0x004c15cb in recursiveDeletion (object=0x7fffb9f7e5e0, 
behavior=DROP_RESTRICT, msglevel=18,
callingObject=0x0, oktodelete=0xba90a0, depRel=0x2aac4990, 
alreadyDeleted=0x0) at dependency.c:657
#4  0x004c0be5 in performDeletion (object=0x7fffb9f7e5e0, 
behavior=DROP_RESTRICT) at dependency.c:177
#5  0x00533863 in RemoveIndex (relation=0xba8d78, 
behavior=DROP_RESTRICT) at indexcmds.c:1133
#6  0x006474d5 in ProcessUtility (parsetree=0xb7dbd0, 
queryString=0xba8ce8 "drop index idx1;", params=0x0,
isTopLevel=1 '\001', dest=0xb7dc68, completionTag=0x7fffb9f7eb40 
"") at utility.c:608

#7  0x00645bf7 in PortalRunUtility (portal=0xbb0d08,
 [...]
8<

The comments for simple_heap_delete say:

/*
 *  simple_heap_delete - delete a tuple
 *
 * This routine may be used to delete a tuple when concurrent updates of
 * the target tuple are not expected (for example, because we have a
 * lock on the relation associated with the tuple).  Any failure is
 * reported via ereport().
 */

So the question is -- is the comment wrong, or is this a bug, or am I 
not just plain misunderstanding something ;-)


The reason I ask is that someone contacted me who is seeing this on a 
production system, and the abort on session #2 is rolling back a large 
bulkload.


Thanks,

Joe

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] "tuple concurrently updated" during index deletion

2007-07-11 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> On cvs head, I can get "tuple concurrently updated" if two separate 
> transactions are both trying to drop the same index:

This seems related to the discussions we had awhile back about how
deletion needs to take locks *before* it starts doing anything.
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00937.php
http://archives.postgresql.org/pgsql-bugs/2007-03/msg00143.php

Notice that recursiveDeletion() tries to clean out pg_depend before
it actually deletes the target object, and in the current code that
object-specific subroutine is the only thing that takes any sort of lock.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] "tuple concurrently updated" during index deletion

2007-07-11 Thread Jonah H. Harris

On 7/11/07, Tom Lane <[EMAIL PROTECTED]> wrote:

Notice that recursiveDeletion() tries to clean out pg_depend before
it actually deletes the target object, and in the current code that
object-specific subroutine is the only thing that takes any sort of lock.


In the past 4-6 months, we've seen 4 cases that look like logical
catalog corruption in the area of dependencies and, likely, concurrent
activity.  I don't recall the exact releases off-hand, but 1 case was
on 8.1 and the others were on 8.2.  In two of the cases, there were
DDL changes while running Slony (which we originally attributed to
Slony).  The other two cases were normal DDL including ALTER TABLE ADD
COLUMN, CREATE OR REPLACE VIEW, and REINDEX.

Unfortunately, we were not able to acquire a forensic copy of the
databases for further analysis.  In all cases, the hardware behaved
and tested fine.  But, it looks dependency-related as in all cases,
dependency records existed for views and indexes which no longer
existed in pg_class (the files and records were gone)

Sorry I don't have more information available right now, but this is
an area that requires further investigation.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] 2PC-induced lockup

2007-07-11 Thread Florian G. Pflug

Tom Lane wrote:

"Simon Riggs" <[EMAIL PROTECTED]> writes:

I'd be much more comfortable if LOCK TABLE caused a message to the log
if it is executed on any system table.


Enabled by "set training_wheels = on", perhaps?

This is really pretty silly to be getting worked up about.  The command
in question wouldn't have been allowed at all except to a superuser,
and there are plenty of ways to catastrophically destroy your database
when you are superuser; most of which we will never consider blocking
for the same reasons that Unix systems have never tried to block root
from doing "rm -rf /".  I'd say the real design flaw in Peter's
referenced application is that they're running it as superuser.


Yeah.. though "lock pg_auth; prepare" looks quite innocent, much more
than say "delete from pg_database" or "rm -rf whatever".
At least to the untrained eye.

I fully agree that that special-casing this particular way to shoot yourself
in the foot is not worth it - but maybe pursuing a more general solution
would be worthwile? Maybe superuser-connections could e.g. ignore
any errors that occur while reading a system table, together with
a big, fat warning, but still allow a logon? That of course depends
on the assumption that basic authentication is possible using just
the information from the flatfiles and pg_hba.conf, which I'm not
sure about.

greetings, Florian Pflug


---(end of broadcast)---
TIP 6: explain analyze is your friend