Re: [HACKERS] small parallel restore optimization

2009-03-09 Thread Magnus Hagander
Alvaro Herrera wrote:
 Tom Lane wrote:
 
 Worst case, we could say that parallel restore isn't supported on
 mingw.  I'm not entirely sure why we bother with that platform at all...
 
 I think you're confusing it with cygwin ...

Yeah. Much as I hate working around the quirks of mingw, I definitely
think we need to keep that platform.

(yes, cygwin is another story, but let's not repeat that discussion now)

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-09 Thread Magnus Hagander
Andrew Dunstan wrote:
 
 
 Alvaro Herrera wrote:
 Andrew Dunstan wrote:

  
 I have found the source of the problem I saw. dumputils.c:fmtId()
 uses a  static PQExpBuffer which it initialises the first time it's
 called. This  gets clobbered by simultaneous calls by Windows threads.

 I could just make it auto and set it up on each call, but that could 
 result in a non-trivial memory leak ... it's probably called a great 
 many times. Or I could provide a parallel version where we pass in a 
 PQExpBuffer that we create, one per thread, and is used by anything 
 called by the parallel code. That seems like a bit of a potential 
 footgun, though.
 

 Could you use a different static PQExpBuffer on each thread?
 pthread_getspecific sort of thing, only to be used on Windows.
   
 
 For MSVC we could declare it with _declspec(thread) and it would be
 allocated in thread-local storage, but it looks like this isn't
 supported on Mingw.

Yeah, _declspec(thread) would be the easy way to do it. But you can also
use the TlsAlloc(), TlsSetValue() and friends functions directly. We do
this to use TLS in ecpg.

It requires some coding around, but for ecpg we did a macro that makes
it behave like the posix functions (see
src/interfaces/ecpg/include/ecpg-pthread-win32.h)

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-09 Thread Andrew Dunstan



Magnus Hagander wrote:

Andrew Dunstan wrote:
  

Alvaro Herrera wrote:


Andrew Dunstan wrote:

 
  

I have found the source of the problem I saw. dumputils.c:fmtId()
uses a  static PQExpBuffer which it initialises the first time it's
called. This  gets clobbered by simultaneous calls by Windows threads.

I could just make it auto and set it up on each call, but that could 
result in a non-trivial memory leak ... it's probably called a great 
many times. Or I could provide a parallel version where we pass in a 
PQExpBuffer that we create, one per thread, and is used by anything 
called by the parallel code. That seems like a bit of a potential 
footgun, though.



Could you use a different static PQExpBuffer on each thread?
pthread_getspecific sort of thing, only to be used on Windows.
  
  

For MSVC we could declare it with _declspec(thread) and it would be
allocated in thread-local storage, but it looks like this isn't
supported on Mingw.



Yeah, _declspec(thread) would be the easy way to do it. But you can also
use the TlsAlloc(), TlsSetValue() and friends functions directly. We do
this to use TLS in ecpg.

It requires some coding around, but for ecpg we did a macro that makes
it behave like the posix functions (see
src/interfaces/ecpg/include/ecpg-pthread-win32.h)


  



Yeah, we'll just have to call TlsAlloc to set the thread handle 
somewhere near program start, but that shouldn't be too intrusive.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-08 Thread Andrew Dunstan



Tom Lane wrote:
I've seen a recent error that suggests we are clobbering memory 
somewhere in the parallel code, as well as Olivier Prennant's reported 
error that suggests the same thing, although I'm blessed if I can see 
where it might be. Maybe some more eyeballs on the code would help.



Can you put together even a weakly reproducible test case?  Something
that only fails every tenth or hundredth time would still help.


  


I have found the source of the problem I saw. dumputils.c:fmtId() uses a 
static PQExpBuffer which it initialises the first time it's called. This 
gets clobbered by simultaneous calls by Windows threads.


I could just make it auto and set it up on each call, but that could 
result in a non-trivial memory leak ... it's probably called a great 
many times. Or I could provide a parallel version where we pass in a 
PQExpBuffer that we create, one per thread, and is used by anything 
called by the parallel code. That seems like a bit of a potential 
footgun, though.


Has anyone got a better plan?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-08 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I have found the source of the problem I saw. dumputils.c:fmtId() uses a 
 static PQExpBuffer which it initialises the first time it's called. This 
 gets clobbered by simultaneous calls by Windows threads.

Ugh.  But that doesn't explain the original trouble report on Unixware.

 I could just make it auto and set it up on each call, but that could 
 result in a non-trivial memory leak ... it's probably called a great 
 many times. Or I could provide a parallel version where we pass in a 
 PQExpBuffer that we create, one per thread, and is used by anything 
 called by the parallel code. That seems like a bit of a potential 
 footgun, though.

I think we should try hard to keep this localized to fmtId(), rather
than changing the callers --- the latter would be a huge readability
hit.  Is there a reasonable way to have fmtId use thread-local storage
for its PQExpBuffer pointer on Windows?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-08 Thread Alvaro Herrera
Andrew Dunstan wrote:

 I have found the source of the problem I saw. dumputils.c:fmtId() uses a  
 static PQExpBuffer which it initialises the first time it's called. This  
 gets clobbered by simultaneous calls by Windows threads.

 I could just make it auto and set it up on each call, but that could  
 result in a non-trivial memory leak ... it's probably called a great  
 many times. Or I could provide a parallel version where we pass in a  
 PQExpBuffer that we create, one per thread, and is used by anything  
 called by the parallel code. That seems like a bit of a potential  
 footgun, though.

Could you use a different static PQExpBuffer on each thread?
pthread_getspecific sort of thing, only to be used on Windows.

BTW does fmtQualifiedId have the same problem?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-08 Thread Andrew Dunstan



Alvaro Herrera wrote:

Andrew Dunstan wrote:

  
I have found the source of the problem I saw. dumputils.c:fmtId() uses a  
static PQExpBuffer which it initialises the first time it's called. This  
gets clobbered by simultaneous calls by Windows threads.


I could just make it auto and set it up on each call, but that could  
result in a non-trivial memory leak ... it's probably called a great  
many times. Or I could provide a parallel version where we pass in a  
PQExpBuffer that we create, one per thread, and is used by anything  
called by the parallel code. That seems like a bit of a potential  
footgun, though.



Could you use a different static PQExpBuffer on each thread?
pthread_getspecific sort of thing, only to be used on Windows.
  


For MSVC we could declare it with _declspec(thread) and it would be 
allocated in thread-local storage, but it looks like this isn't 
supported on Mingw.



BTW does fmtQualifiedId have the same problem?

  


Yes, but it's not called in threaded code - it's only in pg_dump and 
only pg_restore is threaded.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-08 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Alvaro Herrera wrote:
 Could you use a different static PQExpBuffer on each thread?
 pthread_getspecific sort of thing, only to be used on Windows.

 For MSVC we could declare it with _declspec(thread) and it would be 
 allocated in thread-local storage, but it looks like this isn't 
 supported on Mingw.

Some googling suggests that thread-local storage is supported on latest
release (not clear if it's exactly the same syntax though :-().

Worst case, we could say that parallel restore isn't supported on
mingw.  I'm not entirely sure why we bother with that platform at all...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-08 Thread Alvaro Herrera
Tom Lane wrote:

 Worst case, we could say that parallel restore isn't supported on
 mingw.  I'm not entirely sure why we bother with that platform at all...

I think you're confusing it with cygwin ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-07 Thread Tom Lane
o...@pyrenet.fr writes:
 the only thing I could come  with is a calloc(1,12) that seems to alloc 
 mem for filename, in that case sdewitte.dmp; so  the alloc is not counting 
 the null char at the end.

Where do you see that?  The memtool dump you sent doesn't show it AFAICS.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-07 Thread Andrew Dunstan



Tom Lane wrote:

o...@pyrenet.fr writes:
  
the only thing I could come  with is a calloc(1,12) that seems to alloc 
mem for filename, in that case sdewitte.dmp; so  the alloc is not counting 
the null char at the end.



Where do you see that?  The memtool dump you sent doesn't show it AFAICS.


  


And the only copying of the filename that I can find is done with strdup().

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-06 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Here's a little optimization for the parallel restore code, that 
 inhibits reopening the archive file unless the worker will actually need 
 to read from the file (i.e. a data member). It seems to work OK on both 
 Linux and Windows, and I propose to apply it in a day or two.

I think you should close the file immediately at fork if you're not
going to reopen it --- otherwise it's a foot-gun waiting to fire.
IOW, not this, but something more like

if (te-section == SECTION_DATA)
(AH-ReopenPtr) (AH);
else
(AH-ClosePtr) (AH);

... worker task ...

if (te-section == SECTION_DATA)
(AH-ClosePtr) (AH);

 I've seen a recent error that suggests we are clobbering memory 
 somewhere in the parallel code, as well as Olivier Prennant's reported 
 error that suggests the same thing, although I'm blessed if I can see 
 where it might be. Maybe some more eyeballs on the code would help.

Can you put together even a weakly reproducible test case?  Something
that only fails every tenth or hundredth time would still help.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-06 Thread Guillaume Smet
On Fri, Mar 6, 2009 at 6:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Can you put together even a weakly reproducible test case?  Something
 that only fails every tenth or hundredth time would still help.

It seems that Olivier can reproduce the problem at will on Unixware. I
don't know if it's easy to find useful information to debug the
problem on this platform though.

See http://archives.postgresql.org/pgsql-hackers/2009-03/msg00201.php

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small parallel restore optimization

2009-03-06 Thread ohp

On Fri, 6 Mar 2009, Guillaume Smet wrote:


Date: Fri, 6 Mar 2009 18:58:58 +0100
From: Guillaume Smet guillaume.s...@gmail.com
To: Tom Lane t...@sss.pgh.pa.us
Cc: Andrew Dunstan and...@dunslane.net,
PostgreSQL-development pgsql-hackers@postgresql.org, o...@pyrenet.fr
Subject: Re: [HACKERS] small parallel restore optimization

On Fri, Mar 6, 2009 at 6:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Can you put together even a weakly reproducible test case?  Something
that only fails every tenth or hundredth time would still help.

not sure, none of my tests did fail at the same place.
the only thing I could come  with is a calloc(1,12) that seems to alloc 
mem for filename, in that case sdewitte.dmp; so  the alloc is not counting 
the null char at the end.

not sure it could explain everything though
  

It seems that Olivier can reproduce the problem at will on Unixware. I
don't know if it's easy to find useful information to debug the
problem on this platform though.

See http://archives.postgresql.org/pgsql-hackers/2009-03/msg00201.php




--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges+33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: o...@pyrenet.fr
--
Make your life a dream, make your dream a reality. (St Exupery)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers