Re: [HACKERS] Transaction-controlled robustness for replication

2008-07-23 Thread Markus Wanner

Hi,

Jens-Wolfhard Schicke wrote:

* Does WAL get forced to disk on primary at commit time?
* Does WAL get forced across link to standby at commit time?
* Does WAL get forced to disk on standby at commit time?

* Does WAL get applied [and synced] to disk on standby at commit time?


I think that's what Simon means by his question no 3. It wouldn't make 
much sense to me otherwise.


I'm assuming the standby node has it's own physical format, so the 
changes from the remote WAL need to be transformed to a local WAL, which 
then needs to be written to disc. For Postgres, this pretty much means 
applying the changes and committing them. You never need to store the 
remote WAL on physical storage, what would that be good for?



I think that questions 2 and 3 are trivially bundled together. Once the
user can specify 2, implementing 3 should be trivial and vice versa.


That might well be, yes. The code to collect changes from a transaction 
and then apply them remotely is pretty much the same, no matter when it 
is being executed. But it certainly makes a difference in the balance 
between performance and availability, which is a decision the user 
should be able to make for his specific application (or even better, per 
transaction, as proposed here and in Postgres-R).



I am not even convinced that these need to be two different parameters.


Consider a standby heavily loaded (i/o) with some OLAP queries. Why 
should the master wait until the standby has written anything to disk 
for him?



Also please note that an answer of yes to 3 means that 2 must also
be answered yes.


Agreed. There's no 'AS' mode possible, only 'SS', 'SA' and 'SS'.


How about creating named modes? This would give the user the ability to
define more fine-grained control especially in larger clusters of 
fail-over/read-only
servers without totally clogging the parameter space and application code.
Whether this should be done SQL-style or in some config file is not so clear to 
me,
although I'd prefer SQL-style like

CREATE SYNCHRONIZING MODE immediate_readonly AS
  LOCALSYNCHRONOUS APPLY
  192.168.0.10 SYNCHRONOUS APPLY-- read-only slave
  192.168.0.11 SYNCHRONOUS APPLY-- read-only slave
  192.168.0.20 SYNCHRONOUS SHIP -- backup-server
  192.168.0.21 SYNCHRONOUS SHIP -- backup-server
  192.168.0.30 SYNHCRONOUS FSYNC-- backup-server with fast disks
;


Hm.. that's an interesting idea. Especially considering the number of 
options that arise with more than two or three nodes, where you maybe 
also want to specify how many nodes must have written the changes to 
disk before confirming the commit.


In Postgres-R, I've added a TRANSACTION REPLICATION LEVEL, which can be 
either SYNC, EAGER or LAZY. Maybe that's not quite sufficient. On the 
other hand, I don't think any other option here makes any sense. (Above, 
you yourself doubt that sync is different enough from eager).


Regards

Markus


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]-hash index improving

2008-07-23 Thread Simon Riggs

On Wed, 2008-07-23 at 10:57 +0800, Xiao Meng wrote:
 Well, I'll do it after I finish my second patch.
 Hash index should be more efficient than btree when N is big enough.
 It seems meaningful to find how big N is in an experiment way.

Agreed.

We should also examine the basic thinking of the index.

My understanding is that it dynamically resizes hash as the index grows.
If we already believe the only benefit would come when the index is
large, having special handling for small tables seems like a waste of
time because we will never use it in those contexts. 

So starting the hash at a fairly large size makes more sense than it
might otherwise seem to.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-23 Thread Simon Riggs

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:
 1. Issues with applying the patch to CVS HEAD:

Sounds awful. Thanks for the review, will fix.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres-R: internal messaging

2008-07-23 Thread Markus Wanner

Hi,

As you certainly know by now, Postgres-R introduces an additional
manager process. That one is forked from the postmaster, so are all
backends, no matter if they are processing local or remote transactions.
That led to a communication problem, which has originally (i.e. around
Postgres-R for 6.4) been solved by using unix pipes. I didn't like that
approach for various reasons: first, AFAIK there are portability issues,
second it eats file descriptors and third it involves copying around the
messages several times. As the replication manager needs to talk to the
backends, but they both need to be forked from the postmaster, pipes
would also have to go through the postmaster process.

Trying to be as portable as Postgres itself and still wanting an
efficient messaging system, I came up with that imessages stuff, which
I've already posted to -patches before [1]. It uses shared memory to
store and 'transfer' the messages and signals to notify other processes
(the so far unused SIGUSR2, IIRC). Of course this implies having a hard
limit on the total size of messages waiting to be delivered, due to the
fixed size of the shared memory area.

Besides the communication between the replication manager and the
backends, which is currently done by using these imessages, the
replication manager also needs to communicate with the postmaster: it
needs to be able to request new helper backends and it wants to be
notified upon termination (or crash) of such a helper backend (and other 
backends as well...). I'm currently doing this with imessages as well, 
which violates the rule that the postmaster may not to touch shared 
memory. I didn't look into ripping that out, yet. I'm not sure it can be 
done with the existing signaling of the postmaster.


Let's have a simple example: consider a local transaction which changes 
some tuples. Those are being collected into a change set, which gets 
written to the shared memory area as an imessage for the replication 
manager. The backend then also signals the manager, which then awakes 
from its select(), checks its imessages queue and processes the message, 
delivering it to the GCS. It then removes the imessage from the shared 
memory area again.


My initial design features only a single doubly linked list as the 
message queue, holding all messages for all processes. An imessages lock 
blocks concurrent writing acces. That's still what's in there, but I 
realize that's not enough. Each process should better have its own 
queue, and the single lock needs to vanish to avoid contention on that 
lock. However, that would require dynamically allocatable shared memory...


As another side node: I've had to write methods similar to those in 
libpq, which serialize and deserialize integers or strings. The libpq 
functions were not appropriate because they cannot write shared memory, 
instead they are designed to flush to a socket, if I understand 
correctly. Maybe, these could be extended or modified to be usable there 
as well? I've been hesitating and rather implemented separate methods in 
src/backed/storage/ipc/buffer.c.


Comments?

Regards

Markus Wanner

[1]: last time I published IMessage stuff on -patches, WIP:
http://archives.postgresql.org/pgsql-patches/2007-01/msg00578.php


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres-R: internal messaging

2008-07-23 Thread Alexey Klyukin
Markus Wanner wrote:
 Besides the communication between the replication manager and the
 backends, which is currently done by using these imessages, the
 replication manager also needs to communicate with the postmaster: it
 needs to be able to request new helper backends and it wants to be
 notified upon termination (or crash) of such a helper backend (and other  
 backends as well...). I'm currently doing this with imessages as well,  
 which violates the rule that the postmaster may not to touch shared  
 memory. I didn't look into ripping that out, yet. I'm not sure it can be  
 done with the existing signaling of the postmaster.

In Replicator we avoided the need for postmaster to read/write backend's
shmem data by using it as a signal forwarder. When a backend wants to
inform a special process (i.e. queue monitor) about replication-related
event (such as commit) it sends SIGUSR1 to Postmaster with a related
reason flag and the postmaster upon receiving this signal forwards
it to the destination process. Termination of backends and special
processes are handled by the postmaster itself.


 Let's have a simple example: consider a local transaction which changes  
 some tuples. Those are being collected into a change set, which gets  
 written to the shared memory area as an imessage for the replication  
 manager. The backend then also signals the manager, which then awakes  
 from its select(), checks its imessages queue and processes the message,  
 delivering it to the GCS. It then removes the imessage from the shared  
 memory area again.

Hm...what would happen with the new data under heavy load when the queue 
would eventually be filled with messages, the relevant transactions
would be aborted or they would wait for the manager to release the queue
space occupied by already processed messages? ISTM that having a fixed
size buffer limits the maximum transaction rate.


 My initial design features only a single doubly linked list as the  
 message queue, holding all messages for all processes. An imessages lock  
 blocks concurrent writing acces. That's still what's in there, but I  
 realize that's not enough. Each process should better have its own  
 queue, and the single lock needs to vanish to avoid contention on that  
 lock. However, that would require dynamically allocatable shared 
 memory...

What about keeping the per-process message queue in the local memory of
the process, and exporting only the queue head to the shmem, thus having
only one message per-process there. When the queue manager gets a
message from the process it may signal that process to copy the next
message from the process local memory into the shmem. To keep a
correct ordering of queue messages an additional shared memory queue of
pid_t can be maintained, containing one pid per each message. 

-- 
Alexey Klyukin http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] plproxy v2

2008-07-23 Thread Marko Kreen
On 7/22/08, Marko Kreen [EMAIL PROTECTED] wrote:
 On 7/22/08, Tom Lane [EMAIL PROTECTED] wrote:
   Marko Kreen [EMAIL PROTECTED] writes:
 Also, plroxy does
 _nothing_ with passwords.  That means the password for remote
 connection must be in postgres user's .pgpass,
  
  
   That seems *exactly* backwards, because putting the password in postgres
user's .pgpass is as good as disabling password auth altogether.
Consider that it would also hand all the keys to the kingdom over to
someone who had access to dblink on the same machine (not even the same
cluster, so long as it was run by the same postgres user!).

 Good point.

I happened to take a look at dblink and I'm not convinced anymore.

Any untrusted PL can be used for remote calls and in all cases,
including PL/Proxy, the superuser must create the function that
specifies both connect string and query.  To allow regular user
specify either connect string or query, the superuser must take
explicit action by coding the function that way.

But for dblink, the *normal* mode of action is that regular user
can (heh, *must*) specify both connect string and query...
And the require-password-hack for dblink makes things only worse
as it obfuscates the security implications of using it.

So indeed, there is a hole a truck can drive through and
it's called 'dblink'.  And I don't I should make life miserable
for PL/Proxy users just because core has merged insecure module.

Currently the PL/Proxy acts the same as any other libpq-using PL,
using .pgpass is quite natural for them and I don't see any reason
to lock it down further for no good reason.

I know you had a fiasco with dblink security already, but PL/Proxy
is not in dblink camp.  It's in untrusted-PL camp.

So, now we have clear technical argument for immediate merge -
as a replacement for dblink, as it's more secure and better designed.


And only thing PL/Proxy really needs is more documentation, there needs
to be list of various ways to set it up and security implications of each.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] \ef function in psql

2008-07-23 Thread Abhijit Menon-Sen
At 2008-07-17 18:28:19 -0400, [EMAIL PROTECTED] wrote:

 It wouldn't take a whole lot to convince me that a pg_get_functiondef
 would be useful, although I don't foresee either of those applications
 wanting to use it because of their backward-compatibility constraints.

If the code lives in psql (as with my patch), then it has some chance of
working with older servers, but if you're happy with pg_get_functiondef,
then I'm happy enough to use it to get \ef working. I agree that pg_dump
wouldn't want to use it, of course, but I guess it doesn't matter very
much if \ef doesn't work on older servers.

What would the function return? CREATE OR REPLACE FUNCTION ...? Would
that be good enough for everyone who might want to call it?

(BTW, psql from 8.3 is already somewhat broken with 8.1:

archiveopteryx= \d access_keys
ERROR:  column i.indisvalid does not exist

And 8.2 as well:

archiveopteryx= \d access_keys
ERROR:  column t.tgconstraint does not exist
LINE 3: WHERE t.tgrelid = '16847' AND t.tgconstraint = 0
  ^
Oh, I see they've both been fixed in CVS. Sorry for the noise.)

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres-R: internal messaging

2008-07-23 Thread Markus Wanner

Hi Alexey,

thanks for your feedback, these are interesting points.

Alexey Klyukin wrote:

In Replicator we avoided the need for postmaster to read/write backend's
shmem data by using it as a signal forwarder. When a backend wants to
inform a special process (i.e. queue monitor) about replication-related
event (such as commit) it sends SIGUSR1 to Postmaster with a related
reason flag and the postmaster upon receiving this signal forwards
it to the destination process. Termination of backends and special
processes are handled by the postmaster itself.


Hm.. how about larger data chunks, like change sets? In Postgres-R, 
those need to travel between the backends and the replication manager, 
which then sends it to the GCS.


Hm...what would happen with the new data under heavy load when the queue 
would eventually be filled with messages, the relevant transactions

would be aborted or they would wait for the manager to release the queue
space occupied by already processed messages? ISTM that having a fixed
size buffer limits the maximum transaction rate.


That's why the replication manager is a very simple forwarder, which 
does not block messages, but consumes them immediately from shared 
memory. It already features a message cache, which holds messages it 
cannot currently forward to a backend, because all backends are busy.


And it takes care to only send change sets to helper backend which are 
not busy and can consume the process the remote transaction immediately. 
That way, I don't think the limit on shared memory is the bottleneck. 
However, I didn't measure.


WRT waiting vs aborting: I think at the moment I don't handle this 
situation gracefully. I've never encountered it. ;-)  But I think the 
simpler option is letting the sender wait until there is enough room in 
the queue for its message. To avoid deadlocks, each process should 
consume its messages, before trying to send one. (Which is done 
correctly only for the replication manager ATM, not for the backends, IIRC).



What about keeping the per-process message queue in the local memory of
the process, and exporting only the queue head to the shmem, thus having
only one message per-process there.


The replication manager already does that with its cache. No other 
process needs to send (large enough) messages which cannot be consumed 
immediately. So such a local cache does not make much sense for any 
other process.


Even for the replication manager, I find it dubious to require such a 
cache, because it introduces an unnecessary copying of data within memory.



When the queue manager gets a
message from the process it may signal that process to copy the next
message from the process local memory into the shmem. To keep a
correct ordering of queue messages an additional shared memory queue of
pid_t can be maintained, containing one pid per each message.


The replication manager takes care of the ordering for cached messages.

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-23 Thread Simon Riggs

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

 1. Issues with applying the patch to CVS HEAD:

For me, the patch applies cleanly to CVS HEAD.

I do notice that there are two files standby.sgml and
pgstandby.sgml. I can't see where standby.sgml comes from, but I
haven't created it; perhaps it is a relic of the SGML build process.
I've recreated my source tree since I wrote the patch also. Weird.

I'll redo the patch so it points at pgstandby.sgml, which is the one
thats listed as being in the main source tree.

 2. Missing description for new command-line options in pgstandby.sgml
 
 - no description of the proposed new command-line options -h and -p?

These are done. The patch issues have missed those hunks.

 3. No coding style issues seen
 
 Just one comment: the logic that selects the actual restore command to
 be used has moved from CustomizableInitialize() to main() -- a matter
 of personal taste, perhaps.  But in my view the:
 + the #ifdef WIN32/HAVE_WORKING_LINK logic has become easier to read

Thanks

 4. Issue: missing break in switch, silent override of '-l' argument?
 
 This behaviour has been in there before 

Well spotted. I don't claim to test this for Windows.

 5. Minor wording issue in usage message on new '-p' option
 
 I was wondering if the always in the usage text
  fprintf(stderr,   -p   always uses GNU compatible 'cp' command 
 on all platforms\n);
 is too strong, since multiple restore command options overwrite each
 other, e.g. -p -c applies Windows's copy instead of Gnu's cp.

I was assuming you don't turn the switch off again immediately
afterwards.

 6. Minor code comment suggestion
 
 Unrelated to this patch, I wonder if the code comments on all four
 time-related vars better read seconds instead of amount of time:
 int sleeptime = 5;  /* amount of time to sleep between file 
 checks */
 int holdtime = 0;   /* amount of time to wait once file appears 
 full */
 int waittime = -1;  /* how long we have been waiting, -1 no wait
   * yet */
 int maxwaittime = 0;/* how long are we prepared to wait for? */

As you say, unrelated to the patch.

 7. Question: benefits of separate holdtime option from sleeptime?
 
 Simon Riggs wrote:
   * provide holdtime delay, default 0 (on all platforms)
 
 Going back on the hackers+patches emails and parsing the code
 comments, I'm sorry if I missed that, but I'm not sure I've understood
 the exact tuning benefits that introducing the new holdtime option
 provides over using the existing sleeptime, as it's been the case
 (just on Win32 only).

This is central to the patch, since the complaint was about the delay
introduced by doing that previously.

 8. Unresolved question of implementing now/later a cp replacement

The patch implements what's been agreed. 

I'm not rewriting cp, for reasons already discussed.



Not a comment to you Martin, but it's fairly clear that I'm not
maintaining this correctly for Windows. I've never claimed to have
tested this on Windows, and only included Windows related items as
requested by others. I need to make it clear that I'm not going to
maintain it at all, for Windows. If others wish to report Windows issues
then they can suggest appropriate fixes and test them also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Teodor Sigaev

Let me know of any other changes that are needed.


Looks like ready to commit, but documentation is needed.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Oleg Bartunov

btw, is it intentional to have '' in headline ?

=# select ts_headline('1 2 3 4 5 1 2 3 1','14'::tsquery,'MaxFragments=1');
   ts_headline
-
 ... b4/b 5 b1/b



Oleg

On Wed, 23 Jul 2008, Teodor Sigaev wrote:


Let me know of any other changes that are needed.


Looks like ready to commit, but documentation is needed.




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Teodor Sigaev

once, for regular VACUUM I think you really have to call it within
each bulkdelete operation.  


Exactly what I did in last patch.


There's probably no point in optimizing
it away in VACUUM FULL either, since surely it'll be fast to call
index_cleanup when there's nothing in the pending list?


Sure, with empty pending list insertcleanup will just lock/unlock metapage.


Yeah, I was going to complain about that next :-).  Autovacuum isn't
going to trigger as a result of INSERT operations; somehow we have
to teach it what to do for GIN indexes.  I remember we discussed this
at PGCon but I don't think we decided exactly what to do...
So, may be we just move insertcleanup call to   ginbulkdelete/ginvacuumcleanup 
but leave aminsertcleanup field in pg_proc for a future.



I've already made a number of changes to the patch; let me keep working
on it and send it back to you later.

ok

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WITH RECUSIVE patches 0723

2008-07-23 Thread Tatsuo Ishii
 Hi,
 
 Here is the lastest WITH RECURSIVE patches against 2007/07/17 CVS (CVS
 HEAD won't compile for me).
 
 This version includes regression tests and is almost ready for commit
 IMO.

I pulled fresh CVS HEAD and it seems the problem is gone.  Here is the
lastest WITH RECURSIVE patches against CVS HEAD.

Reviewers, please let me know if you find problems with the
patches. If none, I would like to commit this weekend.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


recursive_query.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-07-23 Thread Peter Eisentraut
Am Tuesday, 22. July 2008 schrieb Tom Lane:
 Zdenek Kotala [EMAIL PROTECTED] writes:
  Yeah, I looked deeply on our solaris build script and problem is with
  following configure setup:
 
  ./configure --prefix=/tmp/pg --with-tcl --with-tclconfig=/usr/sfw/lib
 
  It found tclconfig, but not tclsh

 Ah.  So actually there is a bug in our configure: if you've set
 --with-tcl, and it fails to find tclsh, it should error out instead
 of allowing an incorrect path to be substituted into the pltcl_*mod
 scripts.  The configure code is assuming that the only thing it
 really needs tclsh for is to find tclConfig.sh, but that's not so.

Yeah, the configure code was orignally set up to find Tcl linking information, 
and it does so either by running tclsh or taking the tclConfig.sh file.  That 
was all; no tclsh was actually necessary.

The fact that the pltcl_*mod programs use the discovered tclsh setting as well 
was most likely an afterthought that was not made fully robust in the fact of 
all the ways that configure could be called.

By the way, these programs start with

package require Pgtcl

but we don't provide that library.  Should that bother us?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-07-23 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 By the way, these programs start with
 package require Pgtcl
 but we don't provide that library.  Should that bother us?

Hmm.  The scripts actually depend on both pltcl and Pgtcl, so just
pushing them out to the Pgtcl package wouldn't really improve matters.
I think it's fine to leave them where they are ... though we should
document the dependency.

Actually it looks like it's been a very long time since these scripts
got any love anyway.  There's no reason anymore to split modules into
multiple rows (not since TOAST...) and they're not schema-safe either.
Anybody feel like cleaning them up?  Or should we leave 'em as-is
for compatibility reasons?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Peter Eisentraut
Am Wednesday, 23. July 2008 schrieb Tom Lane:
 As soon as a Fedora release happens, I'm
 constrained by compatibility issues as to what I can put into that
 branch.  RHEL releases ten times more so.  I gather that Debian, for
 instance, is even more paranoid than Red Hat about upstream version
 bumps.

Debian and Ubuntu have backport repositories that users can selectively refer 
to.  SUSE has the openSUSE build service, which serves a similar function.  
So for these platforms, the infrastructure is there, and given infinite 
packaging hands (which we would need under any scheme, of course), all the 
packages in all the necessary versions can be provided through the right 
channels (defined as, where a user of the environment would look).  So I 
don't think having our own repository is a problem or even desirable for 
these OS/distributions.

And for Red Hat, we have pgsqlrpms.org, which already covers what you 
describe.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] \ef function in psql

2008-07-23 Thread Tom Lane
Abhijit Menon-Sen [EMAIL PROTECTED] writes:
 At 2008-07-17 18:28:19 -0400, [EMAIL PROTECTED] wrote:
 It wouldn't take a whole lot to convince me that a pg_get_functiondef
 would be useful, although I don't foresee either of those applications
 wanting to use it because of their backward-compatibility constraints.

 What would the function return? CREATE OR REPLACE FUNCTION ...? Would
 that be good enough for everyone who might want to call it?

I think I'd go with CREATE FUNCTION for simplicity.  It would be easy
enough for something like \ef to splice in OR REPLACE before shipping
the command back to the server.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECUSIVE patches 0723

2008-07-23 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 Reviewers, please let me know if you find problems with the
 patches. If none, I would like to commit this weekend.

Has this patch actually been reviewed yet?  The only reports I've
seen are from testing; nothing from anyone actually reading the
code.  I know I've not looked at it yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] \ef function in psql

2008-07-23 Thread Marko Kreen
On 7/23/08, Tom Lane [EMAIL PROTECTED] wrote:
 Abhijit Menon-Sen [EMAIL PROTECTED] writes:
  At 2008-07-17 18:28:19 -0400, [EMAIL PROTECTED] wrote:
   It wouldn't take a whole lot to convince me that a pg_get_functiondef
   would be useful, although I don't foresee either of those applications
   wanting to use it because of their backward-compatibility constraints.

  What would the function return? CREATE OR REPLACE FUNCTION ...? Would
   that be good enough for everyone who might want to call it?

 I think I'd go with CREATE FUNCTION for simplicity.  It would be easy
  enough for something like \ef to splice in OR REPLACE before shipping
  the command back to the server.

Please make it use full qualified names (schema.name) for both
function name and result types.  Current search_path juggling
the pg_dump does is major PITA.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres-R: internal messaging

2008-07-23 Thread Tom Lane
Alexey Klyukin [EMAIL PROTECTED] writes:
 Markus Wanner wrote:
 I'm currently doing this with imessages as well,  
 which violates the rule that the postmaster may not to touch shared  
 memory. I didn't look into ripping that out, yet. I'm not sure it can be  
 done with the existing signaling of the postmaster.

 In Replicator we avoided the need for postmaster to read/write backend's
 shmem data by using it as a signal forwarder.

You should also look at the current code for communication between
autovac launcher and autovac workers.  That seems to be largely a
similar problem, and it's been solved in a way that seems to be
safe enough with respect to the postmaster vs shared memory issue.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 So, may be we just move insertcleanup call to   
 ginbulkdelete/ginvacuumcleanup 
 but leave aminsertcleanup field in pg_proc for a future.

I'd be inclined not to add the extra AM call if we aren't going to use
it now.  There's no very good reason to think that a definition we
settled on today would be exactly the right thing for whatever future
need might appear.  Better to wait till we have a concrete example to
design around.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-07-23 Thread Zdenek Kotala

Tom Lane napsal(a):

Peter Eisentraut [EMAIL PROTECTED] writes:

By the way, these programs start with
package require Pgtcl
but we don't provide that library.  Should that bother us?


Hmm.  The scripts actually depend on both pltcl and Pgtcl, so just
pushing them out to the Pgtcl package wouldn't really improve matters.
I think it's fine to leave them where they are ... though we should
document the dependency.

Actually it looks like it's been a very long time since these scripts
got any love anyway.  There's no reason anymore to split modules into
multiple rows (not since TOAST...) and they're not schema-safe either.
Anybody feel like cleaning them up?  Or should we leave 'em as-is
for compatibility reasons?


Just a dumb question, does we need this functionality? Does anybody use it?

Zdenek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] \ef function in psql

2008-07-23 Thread Tom Lane
Marko Kreen [EMAIL PROTECTED] writes:
 [ re pg_get_functiondef ]

 Please make it use full qualified names (schema.name) for both
 function name and result types.  Current search_path juggling
 the pg_dump does is major PITA.

Qualifying the function name seems like a good idea, but I'd advise
against tinkering with the datatype references.  It'll be hard to
do correctly and it will make things very substantially uglier.
Do you really want to show, eg, pg_catalog.int4 rather than integer?

If you leave the backend code do what it wants to do here, the only
way that there would be a problem is if someone changed their
search_path in between pg_get_functiondef and trying to re-load the
function definition.  Which certainly ain't gonna happen for \ef,
and it seems a bit implausible for any other use-case either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-07-23 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 Tom Lane napsal(a):
 Actually it looks like it's been a very long time since these scripts
 got any love anyway.  There's no reason anymore to split modules into
 multiple rows (not since TOAST...) and they're not schema-safe either.
 Anybody feel like cleaning them up?  Or should we leave 'em as-is
 for compatibility reasons?

 Just a dumb question, does we need this functionality? Does anybody use it?

Well, autoloading Tcl scripts is an extremely standard thing to do in
the Tcl world.  It makes sense to me for pltcl to provide a way of
autoloading code out of the database instead of some random search path
or other --- particularly for trusted pltcl, which shouldn't allow
access to the server filesystem at all.

Whether these particular scripts are the best possible implementation of
the concept is another argument, of course.  But I wouldn't agree with
just ripping 'em out.  Note that my complaints above don't bear on
functionality, at least not unless someone is working in an environment
where the search_path varies a lot.  So the lack of maintenance effort
doesn't indicate that they're not getting used.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 Do you want Tom to
 a) spend a month improving the optimizer
 b) get him to review already working code so we can package things

Actually, if the alternative is having the pieces outside of core where
Tom never sees them, I'd vote for (b), as the optimizer already kicks ass
but having Tom review other code is pretty invaluable.

Code outside of core, is, in reality, less reviewed, less likely to work
well with recent PG versions, and more likely to cause problems. It's also
less likely to be found by people, less likely to be used by people, and
less likely to be included by distros. Not to say that everything should get
shoved into core, of course, but there are strong arguments for both sides.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200807231145
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8




-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkiHUlgACgkQvJuQZxSWSshURACg2MIfdH0cJOTf75HmuGEzlxo6
OBQAn21sqZ+rBEel1cf2dAIYpoWPHwW5
=Pj7J
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]-hash index improving

2008-07-23 Thread Kenneth Marshall
On Tue, Jul 22, 2008 at 08:36:34PM -0700, Dann Corbit wrote:
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
  [EMAIL PROTECTED] On Behalf Of Xiao Meng
  Sent: Tuesday, July 22, 2008 7:57 PM
  To: Simon Riggs
  Cc: pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] [PATCH]-hash index improving
  
  Well, I'll do it after I finish my second patch.
  Hash index should be more efficient than btree when N is big enough.
  It seems meaningful to find how big N is in an experiment way.
 
 The savings will depend on many factors.  Another thing (besides volume) 
 which is important is the sort of key data being indexed.
 
 Consider a unique key on six varchar(40) fields:
 
 1.  Country
 2.  State/Province
 3.  City
 4.  Division
 5.  Branch
 6.  Office
 
 Typically, a key such as this will have lots of clumps of similar data, only 
 being differentiated with the final segment.  This sort of index is often 
 used for reporting purposes.  To determine a unique entry, it is not unlikely 
 that more than 200 characters will be traversed.  A hash index gets a special 
 boost here because a much smaller data signature is stored.  Even a 64 bit 
 hash occupies only 8 bytes.  On the other hand, consider an index on a field 
 consisting of a single character.  Here, the pages of the b-tree will have a 
 huge volume of entries per page, requiring fewer pages to search, and the 
 hash index is many times larger and hence more pages will have to be loaded.
 
 These things make a hash index desirable:
 1. Unique index
 2. Long keys
 3. Huge data cardinality
 4. Equality search
 
 These things make a hash index undesirable:
 1.  Non-unique index
 2.  Short keys
 3.  Small data sets
 
I mentioned in a previous E-mail, adding some additional informational settings
that can be used like fill-factor to improve the layout and performance of a 
hash
index. They are roughly:
   - number of elements
   - maximum number of elements
   - multiplicity - estimate of element repetition in a non-unique index
Knowing the number of elements in advance can allow the index to be pre-created
in the optimal layout and disk footprint. For every multiple of 256, you can
reduce the space needed by the hash value stored by 8-bits. For large indexes
you can store a 64-bit hash in the same space as the 32-bit hash in a small
index. This will allow for the use of larger hash values which will result in
better data distribution between the buckets and more O(1) like behavior.

 These things render a hash index as worthless (except in COMBINATION with a 
 b-tree type index):
 1.  Need for range searches like BETWEEN
 2.  Need for ORDER BY on the column(s)
 
 As an aside:
 I guess it will also be nice if you can CLUSTER both index and data values on 
 the hash.  It may need a different algorithm than a b-tree clustering 
 concept.  I know that Rdb uses different kinds of storage areas for hashed 
 indexes verses b-tree indexes.
 
Clustering a hash index will allow for much smaller indexes through the use
of prefix-compression of the common heap tuple id's. Now an entry in the hash
index would need sizeof(hash) + sizeof(heap tuple id) which is 4 + 6 = 10bytes
before clustering. After clustering and for large indexes, this could drop to
4bytes per entry plus a constant value.

 This effort to create hashed indexes is very valuable.  Because it becomes 
 more and more dominant as the data scales up, right at the point when things 
 get tougher is when it becomes the most helpful.  If you have a tiny table, 
 it does not even matter if you index it, because (for instance) 10 rows will 
 probably always stay in memory and iteration will find what is needed 
 instantly.  But if you have hundreds of millions of rows or billions of rows, 
 now is when performance really matters.  So when the data scales to 
 preposterous size (which it has an uncanny ability to do) the boost of 
 performance becomes even more valuable.

Although it is a clear theoretical benefit from the O(1) lookup for large 
indexes,
I think that the cross-over point between btree and hash indexes may take place
for smaller indexes than might be expected due to the possibly smaller memory 
footprint
needed for the hash index. Of course, this will all need to be tested.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: DTrace probes (merged version)

2008-07-23 Thread Peter Eisentraut
This patch looked OK to me, but the commit fest comment says that it does not 
include comments from the reviewdemo.postgresql.org.  But I don't find any 
comments there.  The latest version of the patch there appears to be the same 
as here.  Zdenek, could you clarify?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane
I wrote:
 Yeah, I was going to complain about that next :-).  Autovacuum isn't
 going to trigger as a result of INSERT operations; somehow we have
 to teach it what to do for GIN indexes.  I remember we discussed this
 at PGCon but I don't think we decided exactly what to do...

One simple idea is to call aminsertcleanup (probably renamed to
something else like amanalyzehook) during ANALYZE.  This seems a bit
grotty, but it has the very attractive property that we don't need to
give the autovacuum control logic any special knowledge about GIN
indexes.  Either inserts or updates will lead it to trigger either
auto-ANALYZE or auto-VACUUM, and either way GIN gets a cleanup
opportunity.

A possible argument against this is that if we later fix things so
that VACUUM and ANALYZE can happen concurrently on the same table,
amanalyzehook could get called concurrently with ambulkdelete or
other vacuum-support operations.  So the AM author would have to
take care to interlock that safely.  But this doesn't seem like
a big deal to me --- interlocks against regular inserts/updates
are probably a harder problem anyway.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Sushant Sinha
I guess it is more readable to add cover separator at the end of a fragment
than in the front. Let me know what you think and I can update it.

I think the right place for cover separator is in the structure
HeadlineParsedText just like startsel and stopsel. This will enable users to
specify their own cover separators. But this will require changes to the
structure as well as to the generateHeadline function. This option will not
also play well with the default headline generation function.

The default MaxWords = 35 seems a bit high for this headline generation
function and 20 seems to be more reasonable. Any thoughts?

-Sushant.

On Wed, Jul 23, 2008 at 7:44 AM, Oleg Bartunov [EMAIL PROTECTED] wrote:

 btw, is it intentional to have '' in headline ?

 =# select ts_headline('1 2 3 4 5 1 2 3 1','14'::tsquery,'MaxFragments=1');
   ts_headline
 -
  ... b4/b 5 b1/b



 Oleg

 On Wed, 23 Jul 2008, Teodor Sigaev wrote:

  Let me know of any other changes that are needed.


 Looks like ready to commit, but documentation is needed.



Regards,
Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: [EMAIL PROTECTED], 
 http://www.sai.msu.su/~megera/http://www.sai.msu.su/%7Emegera/
 phone: +007(495)939-16-83, +007(495)939-23-83



Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Tom Lane
Greg Sabino Mullane [EMAIL PROTECTED] writes:
 Code outside of core, is, in reality, less reviewed, less likely to work
 well with recent PG versions, and more likely to cause problems. It's also
 less likely to be found by people, less likely to be used by people, and
 less likely to be included by distros. Not to say that everything should get
 shoved into core, of course, but there are strong arguments for both sides.

These are all true statements, of course, but ISTM they should be looked
on as problems to be solved.  Pushing stuff into core instead of solving
these problems is not a scalable long-term answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-07-23 Thread Peter Eisentraut
Am Tuesday, 22. July 2008 schrieb Zdenek Kotala:
 By main opinion main problem is in build process which does not fail and
 also dependency on tclsh is hidden by exec command.

Fixed.  Now, configure will fail if no tcl shell is found.  You can specify 
one with the TCLSH variable.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Matthew T. O'Connor

Tom Lane wrote:

Greg Sabino Mullane [EMAIL PROTECTED] writes:

Code outside of core, is, in reality, less reviewed, less likely to work
well with recent PG versions, and more likely to cause problems. It's also
less likely to be found by people, less likely to be used by people, and
less likely to be included by distros. Not to say that everything should get
shoved into core, of course, but there are strong arguments for both sides.


These are all true statements, of course, but ISTM they should be looked
on as problems to be solved.  Pushing stuff into core instead of solving
these problems is not a scalable long-term answer.


A few random thoughts...

The application that comes to mind first for me when you talk plugins is 
Firefox.  They make it very easy to browse for plugins and to install, 
update, remove them.  Their plug-in system also tries to account for 
Firefox version and OS platform which we would need to do also.


Perhaps one thing that would help PostgreSQL plug-ins is a nice GUI 
plug-in browser and management application.  The logical place to add 
this IMHO is PGAdmin since it is GUI, already talks to the DB and is 
cross platform.  I'm not saying a GUI should be required to manage 
plug-ins, a fully CLI option should be made available too.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres-R: internal messaging

2008-07-23 Thread Markus Wanner

Hi,

Tom Lane wrote:

You should also look at the current code for communication between
autovac launcher and autovac workers.  That seems to be largely a
similar problem, and it's been solved in a way that seems to be
safe enough with respect to the postmaster vs shared memory issue.


Oh yeah, thanks for reminding me. Back when it was added I thought I 
might find some helpful insights in there. But I didn't ever take the 
time to read through it...


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Marko Kreen
On 7/23/08, Greg Sabino Mullane [EMAIL PROTECTED] wrote:
   Do you want Tom to
   a) spend a month improving the optimizer
   b) get him to review already working code so we can package things

 Actually, if the alternative is having the pieces outside of core where
  Tom never sees them, I'd vote for (b), as the optimizer already kicks ass
  but having Tom review other code is pretty invaluable.

  Code outside of core, is, in reality, less reviewed, less likely to work
  well with recent PG versions, and more likely to cause problems. It's also
  less likely to be found by people, less likely to be used by people, and
  less likely to be included by distros. Not to say that everything should get
  shoved into core, of course, but there are strong arguments for both sides.

Agreed.

But PL/Proxy has one additional problem.

First, it's a small  simple clustering solution that plays very well
on Postgres strengths - transparent plan cache in functions, transactional
DDL, etc.  It allows significantly higher uptime and query-per-sec than
any plain sql solution.

But it has serious weakness - it is not transparent, user needs to change
it's coding style to function-based one.  (This is related to the
small-and-simple property.)

So, for it to be useful, the users need to be aware of it as early in
development as possible.  And the idea to turn pgfoundry into CPAN
is pointless.  An user may willing to throw random modules to his
random perl script, but not to his whole db architecture.  So it needs
to be either in main distro or nearby it.

OTOH, the most serious argument against PL/Proxy merge is that when we
do remote tables/views based on SQL-MED, it's quite natural to extend
that on functions, thus making plproxy redundant.

OTOH^2, there has not been any serious thinking on that direction AFAICS,
except how dbi-link can push down WHERE clause,  This suggests it wont
appear before 2011...   Not that its a argument for merge, but maybe
pushing it to an all-presentable-extensions package and having proper
review done would be a good idea.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-23 Thread Heikki Linnakangas

Simon Riggs wrote:

On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:

8. Unresolved question of implementing now/later a cp replacement


The patch implements what's been agreed. 


I'm not rewriting cp, for reasons already discussed.

Not a comment to you Martin, but it's fairly clear that I'm not
maintaining this correctly for Windows. I've never claimed to have
tested this on Windows, and only included Windows related items as
requested by others. I need to make it clear that I'm not going to
maintain it at all, for Windows. If others wish to report Windows issues
then they can suggest appropriate fixes and test them also.


Hmm. I just realized that replacing the cp command within pg_standby 
won't help at all. The problem is with the command that copies the files 
*to* the archivelocation that pg_standby polls, not with the copy 
pg_standby does from archivelocation to pg_xlog. And we don't have much 
control over that.


We really need a more reliable way of detecting that a file has been 
fully copied. One simple improvement would be to check the xlp_magic 
field of the last page, though it still wouldn't be bullet-proof.


Do the commands that preallocate the space keep the file exclusively 
locked during the copy? If they do, shouldn't we get an error in trying 
to run the restore copy command, and retry after the 1s sleep in 
RestoreWALFileForRecovery? Though if the archive location is a samba 
mount or something, I guess we can't rely on Windows-style exclusive 
locking.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: DTrace probes (merged version)

2008-07-23 Thread Zdenek Kotala

Peter Eisentraut napsal(a):
This patch looked OK to me, but the commit fest comment says that it does not 
include comments from the reviewdemo.postgresql.org.  But I don't find any 
comments there.  The latest version of the patch there appears to be the same 
as here.  Zdenek, could you clarify?


I'm sorry. I forgot to publish them :( (new tool). It is fixed now. I also 
upload latest patch version and I will review it tomorrow.


Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-07-23 Thread Zdenek Kotala

Peter Eisentraut napsal(a):

Am Tuesday, 22. July 2008 schrieb Zdenek Kotala:

By main opinion main problem is in build process which does not fail and
also dependency on tclsh is hidden by exec command.


Fixed.  Now, configure will fail if no tcl shell is found.  You can specify 
one with the TCLSH variable.


Thanks. Is it fixed only on head or do you plan to backported to older branch as 
well?


Thanks Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-23 Thread Dimitri Fontaine

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

As a potential user of the solution, I'd very much like to have  
plproxy into -core if possible and sensible. Knowing nothing about the  
sensible part, I'd vote for inclusion.


But whether -core vote for or against inclusion, I'd appreciate to  
have a module or package notion into PostgreSQL and a tool to easily  
install existing extensions, which would build on PGXS extension  
system to build on the fly code version compatible with current major  
PG version :

  pg_pkg add-mirror http://packages.postgresql.org/
  pg-pkg list [remote | available]
  pg_pkg add plproxy prefix citext
  pg_pkg install plproxy mydatabase
  pg_pkg uninstall [--force] plproxy mydatabase
  ..

Of course details about PostgreSQL module/package management belongs  
to some other thread, I'll try to browse our archives to see where we  
are on this point and to propose a summary and some ideas if  
necessary. Any reader willing to share good starting points? :)


I think having something to easily manage PostgreSQL modules/packages  
(including contribs ones) would change the matter here. If it was easy  
to fetch a list of -core reviewed or supported extensions and to  
install them on ones databases, having plproxy not included in -core  
would be an *easy* decision to make.


Le 23 juil. 08 à 19:54, Marko Kreen a écrit :

appear before 2011...   Not that its a argument for merge, but maybe
pushing it to an all-presentable-extensions package and having  
proper

review done would be a good idea.


Now, it seems to me we already have a place where to distribute  
reviewed code, maintained by non-core hackers and integrated into  
distributions and documentation of PostgreSQL: contrib.


Maybe contrib (plans to get a better name ongoing? extra, extension,  
anything less remote then current naming) would fit the bill here as a  
good compromise?


Sorry to raise unwanted subjects, please do not feed the trolls (in  
this thread at least) :)

- --
Dimitri Fontaine


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkiHgCUACgkQlBXRlnbh1blP8ACgmKWAN4PyOSUQdl9hM+vZV0xK
PJYAn1OmTreVxrqjDxsTcjGiNFO/30ok
=SYGB
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-07-23 Thread Zdenek Kotala

Tom Lane napsal(a):

Zdenek Kotala [EMAIL PROTECTED] writes:

Tom Lane napsal(a):

Actually it looks like it's been a very long time since these scripts
got any love anyway.  There's no reason anymore to split modules into
multiple rows (not since TOAST...) and they're not schema-safe either.
Anybody feel like cleaning them up?  Or should we leave 'em as-is
for compatibility reasons?



Just a dumb question, does we need this functionality? Does anybody use it?


Well, autoloading Tcl scripts is an extremely standard thing to do in
the Tcl world.  It makes sense to me for pltcl to provide a way of
autoloading code out of the database instead of some random search path
or other --- particularly for trusted pltcl, which shouldn't allow
access to the server filesystem at all.


I see.


Whether these particular scripts are the best possible implementation of
the concept is another argument, of course.  But I wouldn't agree with
just ripping 'em out.  Note that my complaints above don't bear on
functionality, at least not unless someone is working in an environment
where the search_path varies a lot.  So the lack of maintenance effort
doesn't indicate that they're not getting used.


I understand. However I have another dumb idea/question - It seems to me that it 
is client code. I think that it should be integrated into psql command. It has 
several advantages - remove dependency on tclsh, remove tree commands, works 
fine on system where tcl is not present.


thanks Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-23 Thread Kevin Grittner
 Heikki Linnakangas [EMAIL PROTECTED] wrote: 
 
 We really need a more reliable way of detecting that a file has been

 fully copied. 
 
In our scripts we handle this by copying to a temp directory on the
same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed.  I think
that this even works on Windows.  Could that just be documented as a
strong recommendation for the archive script?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 Updated: http://www.sigaev.ru/misc/fast_insert_gin-0.9.gz

Here is the GIN fast-insert patch back again.  Changes:

* Sync with CVS HEAD
* Clean up documentation and some of the code comments
* Fix up custom reloptions code
* Suppress some compiler warnings

I didn't get much further than that because I got discouraged after
looking at the locking issues around the pending-insertions list.
It's a mess:

* shiftList() holds an exclusive lock on metapage throughout its run,
which means that it's impossible for two of them to run concurrently.
So why bother with concurrent deletion detection?

* shiftList does LockBufferForCleanup, which means that it can be blocked
for an indefinitely long time by a concurrent scan, and since it's holding
exclusive lock on metapage no new scans or insertions can start meanwhile.
This is not only horrid from a performance standpoint but it very probably
can result in deadlocks --- which will be deadlocks on LWLocks and thus
not even detected by the system.

* GIN index scans release lock and pin on one pending-list page before
acquiring pin and lock on the next, which means there's a race condition:
shiftList could visit and delete the next page before we get to it,
because there's a window where we're holding no buffer lock at all.
I think this isn't fatal in itself, since presumably the data in the next
page has been moved into the main index and we can scan it later, but the
scan code isn't checking whether the page has been deleted out from under
it.

* It seems also possible that once a list page has been marked
GIN_DELETED, it could be re-used for some other purpose before a
scan-in-flight reaches it -- reused either as a regular index page or as a
new list page.  Neither case is being defended against.  It might be that
the new-list-page case isn't a problem, or it might not.

* There is a bigger race condition, which is that after a scan has
returned a tuple from a pending page, vacuum could move the index entry
into the main index structure, and then that same scan could return that
same index entry a second time.  This is a no-no, and I don't see any easy
fix.

I haven't really finished reviewing this code, but I'm going to bounce it
back to you to see if you can solve the locking problems.  Unless that can
be made safe there is no point doing any more work on this patch.

regards, tom lane



bineMnFgM4K6g.bin
Description: fast_insert_gin-0.10.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Research/Implementation of Nested Loop Join optimization

2008-07-23 Thread Manoel Henrique
Hi! I`m a researcher from PUC-Rio (Brazil) and we`re studying about an
Joins, and we`d like to implement an optimization on the Nested Loop Join,
this optimization consists on while scanning the inner table, the iteration
would go from up-down then backwards(down-up) to take advantage of the
buffer pages in memory.
We`d work with MaterialScan and only NestedLoop (we`re dropping all indexes
and keys to make it this way). The research objective is to show some
students how a DBMS works.

Does PostgreSQL already works this way?
Is it possible to implement such thing? Is it easy? how hard?


Thank you in advance,
Manoel Henrique Souza.


Re: [HACKERS] pltcl_*mod commands are broken on Solaris 10

2008-07-23 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 I understand. However I have another dumb idea/question - It seems to me that 
 it 
 is client code. I think that it should be integrated into psql
 command.

That doesn't seem like a particularly appropriate thing to do ... nor
do I see the argument for calling it client-side code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Research/Implementation of Nested Loop Join optimization

2008-07-23 Thread Dann Corbit
When you install the source tree (e.g. in folder \postgresql-8.3.x) you
will want to examine nodeMergejoin.c typically found in a path similar
to this:

\postgresql-8.3.x\src\backend\executor\nodeMergejoin.c

 

Here are the comments from the version on my machine:

 

/*

 * INTERFACE ROUTINES

 *ExecMergeJoin
mergejoin outer and inner relations.

 *ExecInitMergeJoin
creates and initializes run time states

 *ExecEndMergeJoin
cleans up the node.

 *

 * NOTES

 *

 *Merge-join is done by joining the inner
and outer tuples satisfying

 *join clauses of the form ((= outerKey
innerKey) ...).

 *The join clause list is provided by the
query planner and may contain

 *more than one (= outerKey innerKey) clause
(for composite sort key).

 *

 *However, the query executor needs to know
whether an outer

 *tuple is greater/smaller than an inner
tuple so that it can

 *synchronize the two relations. For
example, consider the following

 *relations:

 *

 *outer: (0
^1 1 2 5 5 5 6 6 7)current tuple: 1

 *inner: (1
^3 5 5 5 5 6)   current tuple: 3

 *

 *To continue the merge-join, the executor
needs to scan both inner

 *and outer relations till the matching
tuples 5. It needs to know

 *that currently inner tuple 3 is greater
than outer tuple 1 and

 *therefore it should scan the outer
relation first to find a

 *matching tuple and so on.

 *

 *Therefore, rather than directly executing
the merge join clauses,

 *we evaluate the left and right key
expressions separately and then

 *compare the columns one at a time (see
MJCompare).  The planner

 *passes us enough information about the
sort ordering of the inputs

 *to allow us to determine how to make the
comparison.  We may use the

 *appropriate btree comparison function,
since Postgres' only notion

 *of ordering is specified by btree
opfamilies.

 *

 *

 *Consider the above relations and suppose
that the executor has

 *just joined the first outer 5 with the
last inner 5. The

 *next step is of course to join the second
outer 5 with all

 *the inner 5's. This requires
repositioning the inner cursor

 *to point at the first inner 5. This is
done by marking the

 *first inner 5 so we can restore the
cursor to it before joining

 *with the second outer 5. The access method
interface provides

 *routines to mark and restore to a tuple.

 *

 *

 *Essential operation of the merge join
algorithm is as follows:

 *

 *Join {

 *get initial outer and
inner tuples
INITIALIZE

 *do forever {

 *while
(outer != inner) {
SKIP_TEST

 *
if (outer  inner)

 *
advance outer
SKIPOUTER_ADVANCE

 *
else

 *
advance inner
SKIPINNER_ADVANCE

 *}

 *mark inner
position
SKIP_TEST

 *do forever
{

 *
while (outer == inner) {

 *
join tuples
JOINTUPLES

 *
advance inner position
NEXTINNER

 *
}

 *
advance outer position
NEXTOUTER

 *
if (outer == mark)
TESTOUTER

 *
restore inner position to mark   TESTOUTER

 *
else

 *
break// return to top of outer loop

 *}

 *}

 *}

 *

 *The merge join operation is coded in the
fashion

 *of a state machine.  At each state, we do
something and then

 *proceed to another state.  This state is
stored in the node's

 *execution state information and is
preserved across calls to

 *ExecMergeJoin. -cim 10/31/89

 */

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL 

Re: [HACKERS] Postgres-R: internal messaging

2008-07-23 Thread Markus Wanner

Hi,

what follows are some comments after trying to understand how the 
autovacuum launcher works and thoughts on how to apply this to the 
replication manager in Postgres-R.


The initial comments in autovacuum.c say:


If the fork() call fails in the postmaster, it sets a flag in the shared
memory area, and sends a signal to the launcher.  


I note that the shmem area that the postmaster is writing to is pretty 
static and not dependent on any other state stored in shmem. That 
certainly makes a difference compared to my imessages approach, where a 
corruption in the shmem for imessages could also confuse the postmaster.


Reading on, the 'can_launch' flag in the launcher's main loop makes sure 
that only one worker is requested concurrently, so that the launcher 
doesn't miss a failure or success notice from either the postmaster or 
the newly started worker. The replication manager currently shamelessly 
requests as many helper backend as it wants. I think I can change that 
without much trouble. Would certainly make sense.


Notifications of the replication manager after termination or crashes of 
a helper backend remain. Upon normal errors (i.e. elog(ERROR... ), the 
backend processes themselves should take care of notifying the 
replication manager. But crashes are more difficult. IMO the replication 
manager needs to stay alive during this reinitialization, to keep the 
GCS connection. However, it can easily detach from shared memory 
temporarily (the imessages stuff is the only shmem place it touches, 
IIRC). However, a more difficult aspect is: it must be able to tell if a 
backend has applied its transaction *before* it died or not. Thus, after 
all backends have been killed, the postmaster needs to wait with 
reinitializing shared memory, until the replication manager has consumed 
all its messages. (Otherwise we would risk losing local transactions, 
probably also remote ones).


So, yes, after thinking about it, detaching the postmaster from shared 
memory seems doable for Postgres-R (in the sense of the postmaster does 
not rely on possibly corrupted data in shared memory). Reinitialization 
needs some more thoughts, but in general that seems like the way to go.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Oleg Bartunov

On Wed, 23 Jul 2008, Sushant Sinha wrote:


I guess it is more readable to add cover separator at the end of a fragment
than in the front. Let me know what you think and I can update it.


FragmentsDelimiter should *separate* fragments and that says all. 
Not very difficult algorithmic problem, it's like  perl's

join(FragmentsDelimiter, @array)



I think the right place for cover separator is in the structure
HeadlineParsedText just like startsel and stopsel. This will enable users to
specify their own cover separators. But this will require changes to the
structure as well as to the generateHeadline function. This option will not
also play well with the default headline generation function.


As soon as we introduce FragmentsDelimiter we should make it
configurable.



The default MaxWords = 35 seems a bit high for this headline generation
function and 20 seems to be more reasonable. Any thoughts?


I think we should not change default value because it could change
behaviour of existing applications. I'm not sure if it'd be useful and
possible to define default values in CREATE TEXT SEARCH PARSER



-Sushant.

On Wed, Jul 23, 2008 at 7:44 AM, Oleg Bartunov [EMAIL PROTECTED] wrote:


btw, is it intentional to have '' in headline ?

=# select ts_headline('1 2 3 4 5 1 2 3 1','14'::tsquery,'MaxFragments=1');
  ts_headline
-
 ... b4/b 5 b1/b



Oleg

On Wed, 23 Jul 2008, Teodor Sigaev wrote:

 Let me know of any other changes that are needed.




Looks like ready to commit, but documentation is needed.




   Regards,
   Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], 
http://www.sai.msu.su/~megera/http://www.sai.msu.su/%7Emegera/
phone: +007(495)939-16-83, +007(495)939-23-83





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Alvaro Herrera
Tom Lane wrote:
 Teodor Sigaev [EMAIL PROTECTED] writes:

 I didn't get much further than that because I got discouraged after
 looking at the locking issues around the pending-insertions list.
 It's a mess:

These are rather severe problems.  Maybe there's a better solution, but
perhaps it would be good enough to lock out concurrent access to the
index while the bulkinsert procedure is working.




-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Research/Implementation of Nested Loop Join optimization

2008-07-23 Thread Manoel Henrique
The nodeMergejoin.c is the code for the Merge Join isn`t it? I am trying to
find a way to change the Nested Loop Join, It would be more like on
nodeNestloop.c when rescanning the inner plan, (second time scanning the
inner plan and so on) he`d change the scan direction, If the scan direction
was from first tuple to last tuple it would go backwards, if it was from
last to first it would go forward... The code I`m looking atm is from 8.3.1
, seems to have some kind of direction manager but doesn`t seems to be in
use.

--Manoel

On Wed, Jul 23, 2008 at 5:23 PM, Dann Corbit [EMAIL PROTECTED] wrote:

  When you install the source tree (e.g. in folder \postgresql-8.3.x) you
 will want to examine nodeMergejoin.c typically found in a path similar to
 this:

 \postgresql-8.3.x\src\backend\executor\nodeMergejoin.c



 Here are the comments from the version on my machine:



 /*

  * INTERFACE ROUTINES

  *ExecMergeJoin
 mergejoin outer and inner relations.

  *ExecInitMergeJoin
 creates and initializes run time states

  *ExecEndMergeJoin
 cleans up the node.

  *

  * NOTES

  *

  *Merge-join is done by joining the inner and
 outer tuples satisfying

  *join clauses of the form ((= outerKey
 innerKey) ...).

  *The join clause list is provided by the query
 planner and may contain

  *more than one (= outerKey innerKey) clause
 (for composite sort key).

  *

  *However, the query executor needs to know
 whether an outer

  *tuple is greater/smaller than an inner
 tuple so that it can

  *synchronize the two relations. For example,
 consider the following

  *relations:

  *

  *outer: (0 ^1
 1 2 5 5 5 6 6 7)current tuple: 1

  *inner: (1 ^3
 5 5 5 5 6)   current tuple: 3

  *

  *To continue the merge-join, the executor
 needs to scan both inner

  *and outer relations till the matching tuples
 5. It needs to know

  *that currently inner tuple 3 is greater
 than outer tuple 1 and

  *therefore it should scan the outer relation
 first to find a

  *matching tuple and so on.

  *

  *Therefore, rather than directly executing the
 merge join clauses,

  *we evaluate the left and right key
 expressions separately and then

  *compare the columns one at a time (see
 MJCompare).  The planner

  *passes us enough information about the sort
 ordering of the inputs

  *to allow us to determine how to make the
 comparison.  We may use the

  *appropriate btree comparison function, since
 Postgres' only notion

  *of ordering is specified by btree opfamilies.

  *

  *

  *Consider the above relations and suppose that
 the executor has

  *just joined the first outer 5 with the last
 inner 5. The

  *next step is of course to join the second
 outer 5 with all

  *the inner 5's. This requires repositioning
 the inner cursor

  *to point at the first inner 5. This is done
 by marking the

  *first inner 5 so we can restore the cursor
 to it before joining

  *with the second outer 5. The access method
 interface provides

  *routines to mark and restore to a tuple.

  *

  *

  *Essential operation of the merge join
 algorithm is as follows:

  *

  *Join {

  *get initial outer and inner
 tuples
 INITIALIZE

  *do forever {

  *while (outer
 != inner) {
 SKIP_TEST

  *
 if (outer  inner)

  *
 advance
 outer
 SKIPOUTER_ADVANCE

  *
 else

  *
 advance
 inner
 SKIPINNER_ADVANCE

  *}

  *mark inner
 position
 SKIP_TEST

  *do forever {

  *
 while (outer == inner) {

  *
 join
 tuples
 JOINTUPLES

  *
 advance inner position
 NEXTINNER

  *
 }

  *
 advance outer
 position
 NEXTOUTER

  *
 if (outer 

Re: [HACKERS] Postgres-R: internal messaging

2008-07-23 Thread Tom Lane
Markus Wanner [EMAIL PROTECTED] writes:
 ... crashes are more difficult. IMO the replication 
 manager needs to stay alive during this reinitialization, to keep the 
 GCS connection. However, it can easily detach from shared memory 
 temporarily (the imessages stuff is the only shmem place it touches, 
 IIRC). However, a more difficult aspect is: it must be able to tell if a 
 backend has applied its transaction *before* it died or not. Thus, after 
 all backends have been killed, the postmaster needs to wait with 
 reinitializing shared memory, until the replication manager has consumed 
 all its messages. (Otherwise we would risk losing local transactions, 
 probably also remote ones).

I hope you're not expecting the contents of shared memory to still be
trustworthy after a backend crash.  If the manager is working strictly
from its own local memory, then it would be reasonable to operate
as above.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-23 Thread Andrew Dunstan



Kevin Grittner wrote:
Heikki Linnakangas [EMAIL PROTECTED] wrote: 

 
  

We really need a more reliable way of detecting that a file has been



  
fully copied. 

 
In our scripts we handle this by copying to a temp directory on the

same mount point as the archive directory and doing a mv to the
archive location when the copy is successfully completed.  I think
that this even works on Windows.  Could that just be documented as a
strong recommendation for the archive script?
 



  


Needs testing at least. If it does in fact work then we can just adjust 
the docs and be done - or maybe provide a .bat file or perl script that 
would work as na archive_command on Windows.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL extensions packaging

2008-07-23 Thread Dimitri Fontaine

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I promised to have an in-depth look at the archives before to spend  
time on my ideas for $subject, but failed to do so. Here are the ideas  
(user level design if you will) :)


As a PostgreSQL extension developer (see http://pgfoundry.org/projects/prefix) 
 I'd like to benefit from a dump and restore facility. We don't offer  
any easy way for the DBA to restore a dump which happen to depends on  
external modules. This proposal tries to solve this by providing a  
notion of package.


A module is currently, as far as I understand it, a .so file installed  
into some superuser (postgres) owned filesystem place. I'm unsure if  
the related .sql file (needed to expose the module functions) is a  
part of the module notion, but I don't think so.


A package is a namespace much comparable to a schema, to be found at  
the same level (in a database can live many packages), and allowed to  
have any SQL object under it. A package can also host variables, which  
visibility are package global: any SQL into the package can refer  
directly to package variables.

And a package can host modules dependancies (not the .so code itself).

Let's try with an example of an imaginary package declaration:

 create or replace package prefix_range
   with (version = 0.3, parameter = value, ...)
   as $pkg$
 declare
   prefix_range_global_var text := 'init value';
   prefix_range_syntax_error exception;
 module
  prefix.so;
 begin
   create schema prefix;

   create or replace function prefix_range_in(cstring) ...;
   create or replace function prefix_range_out(prefix_range) ...;
   create type prefix_range;
   create function ...

   create operator ...
   create operator class ...

   -- private hidden things?
   create role prefix nologin;
   create schema prefix_private owner to prefix;
   create table prefix_private.relname ...;
   revoke all privileges on prefix_private to public;

   -- private stuff ...
   -- create table, create index, etc ...
   --
   -- need some though as to how to allow this from SQL objects
   -- declared into the package *only*
 end;
 $pkg$;

The parameters in the with clause are visible inside the package body  
and allow package installer to tune the installation: we could use  
this for tablespace creation needs, e.g., and version at least should  
be displayed from \dP associated command (is this one free?).


This package creation SQL command would fail if any contained SQL is  
wrong, of course, but also if one of the declared modules were not  
registered/known/found by the server.


We would certainly want to add a package scope construct to existing  
CREATE commands, in order to be able to add a function to a package  
without re-running the entire create package command, but this could  
come at a later date:

  CREATE FUNCTION ... PACKAGE prefix ...

Given this infrastructure, pg_dump would (have to) be able to dump the  
SQL and pg_restore to complain when the module dependancies are not  
met, error'ing out a list of modules to install.


Now, what would be really good to have would be this pg_pkg command I  
was dreaming about in another -hacker mail:

 pg_pkg add-mirror http://packages.postgresql.org/
 pg-pkg list [remote | available]
 pg_pkg add plproxy prefix citext
 pg_pkg install plproxy mydatabase
 pg_pkg uninstall [--force] plproxy mydatabase
 pg_pkg remove package ...
 ...

First, we could have a packages.postgresql.org infrastructure where to  
provide source code packages depending on PostgreSQL major version.  
Those packages would be known to have received code review and -core  
acceptance, so would be as trustworthy as PostgreSQL itself is.
And ideally, any developer could prepare his own PostgreSQL packaging  
facility where to propose his own packages, this time out of -core  
acceptance, but still integrated into the extension system.


pg_pkg add package ... would fetch a source code archive (last  
version available, or maybe given version with pg_pkg add prefix=0.3  
if we really want this feature) and compile and install it with PGXS.  
So you would need to have installed server development support to  
benefit from package distribution...


Then pg_pkg install would install given package into given database,  
running its CREATE OR REPLACE PACKAGE sql script, responsible of  
package object creation and variable, tables, etc initialisation.


The uninstall command would get rid of the package, only to produce  
errors if some object existing in the target database had some  
dependancy to the package, the -f would force a DROP PACKAGE pkgname  
CASCADE;


The remove would get rid of the installed files (modules and .sql),  
only when the package is no more in use in any database of the cluster.


With this command set and pg_restore giving a list of missing modules  
for each package of a given dump file, it would become easy to restore  
a database containing extensions.

  $ pg_restore ...
  ERROR: 

Re: [HACKERS] [PATCHES] GIN improvements

2008-07-23 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 It's a mess:

 These are rather severe problems.  Maybe there's a better solution, but
 perhaps it would be good enough to lock out concurrent access to the
 index while the bulkinsert procedure is working.

Ugh...

The idea I was toying with was to not allow GIN scans to stop on
pending-insertion pages; rather, they should suck out all the matching
tuple IDs into backend-local memory as fast as they can, and then return
the TIDs to the caller one at a time from that internal array.  Then,
when the scan is later visiting the main part of the index, it could
check each matching TID against that array to see if it'd already
returned the TID.  (So it might be an idea to sort the TID array after
gathering it, to make those subsequent checks fast via binary search.)

This would cost in backend-local memory, of course, but hopefully not
very much.  The advantages are the elimination of the deadlock risk
from scan-blocks-insertcleanup-blocks-insert, and fixing the race
condition when a TID previously seen in the pending list is moved to
the main index.  There were still a number of locking issues to fix
but I think they're all relatively easy to deal with.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Research/Implementation of Nested Loop Join optimization

2008-07-23 Thread Dann Corbit
 

From: Manoel Henrique [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 23, 2008 1:47 PM
To: Dann Corbit
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Research/Implementation of Nested Loop Join
optimization

 

The nodeMergejoin.c is the code for the Merge Join isn`t it? I am trying
to find a way to change the Nested Loop Join, It would be more like on
nodeNestloop.c when rescanning the inner plan, (second time scanning the
inner plan and so on) he`d change the scan direction, If the scan
direction was from first tuple to last tuple it would go backwards, if
it was from last to first it would go forward... The code I`m looking
atm is from 8.3.1 , seems to have some kind of direction manager but
doesn`t seems to be in use.

 

You are right.  You want file: nodeNestloop.c

 

 



Re: [HACKERS][PATCHES] odd output in restore mode

2008-07-23 Thread Simon Riggs

On Wed, 2008-07-23 at 21:38 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Tue, 2008-07-22 at 17:19 -0700, Martin Zaun wrote:
  8. Unresolved question of implementing now/later a cp replacement
  
  The patch implements what's been agreed. 
  
  I'm not rewriting cp, for reasons already discussed.
  
  Not a comment to you Martin, but it's fairly clear that I'm not
  maintaining this correctly for Windows. I've never claimed to have
  tested this on Windows, and only included Windows related items as
  requested by others. I need to make it clear that I'm not going to
  maintain it at all, for Windows. If others wish to report Windows issues
  then they can suggest appropriate fixes and test them also.
 
 Hmm. I just realized that replacing the cp command within pg_standby 
 won't help at all. The problem is with the command that copies the files 
 *to* the archivelocation that pg_standby polls, not with the copy 
 pg_standby does from archivelocation to pg_xlog. And we don't have much 
 control over that.
 
 We really need a more reliable way of detecting that a file has been 
 fully copied. One simple improvement would be to check the xlp_magic 
 field of the last page, though it still wouldn't be bullet-proof.
 
 Do the commands that preallocate the space keep the file exclusively 
 locked during the copy? If they do, shouldn't we get an error in trying 
 to run the restore copy command, and retry after the 1s sleep in 
 RestoreWALFileForRecovery? Though if the archive location is a samba 
 mount or something, I guess we can't rely on Windows-style exclusive 
 locking.

With respect, I need to refer you back to the my last paragraph above.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECUSIVE patches 0723

2008-07-23 Thread Tatsuo Ishii
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  Reviewers, please let me know if you find problems with the
  patches. If none, I would like to commit this weekend.
 
 Has this patch actually been reviewed yet?  The only reports I've
 seen are from testing; nothing from anyone actually reading the
 code.  I know I've not looked at it yet.

The reviewer registered at the Wiki is David Fetter and I believe he
is reading the patches. Michael Makes has contributed the ecpg
part. So apparently he is knowing the ecpg part at least.

I know the patch is huge. Reviewers, please let me know if you have
any question about the code. I would like to do anything for helping
the review.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL extensions packaging

2008-07-23 Thread Tom Dunstan
Hi!

On Wed, Jul 23, 2008 at 5:08 PM, Dimitri Fontaine
[EMAIL PROTECTED] wrote:
 I promised to have an in-depth look at the archives before to spend time on
 my ideas for $subject, but failed to do so.

I guess that means you missed both the original discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and
my initial patch in that direction and subsequent discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then
:(.

There were two core components to my idea of modules/packages:
 - Separation of installation at an OS level (RPM/yum, deb/dpkg, MSI
installer etc) and installation into a database. The intention was a)
to standardize package installation generally so that users didn't
have to read n different sets of installation instructions for n
different packages, and b) so that a db owner could install into their
own database any module that had been installed on the system, even if
that might include e.g. C functions that they otherwise would not be
able to install without being a superuser.

- Have dependency tracking so that pg_dump could emit e.g. LOAD
MODULE foo; rather than all the different instructions to recreate
the module.

So the proposed installation procedure would be more along the lines of:

yum install postgresql-module-postgis
echo load module postgis | psql mydb

My intention was to use whatever native package manager was
appropriate for your distro rather than trying to recreate CPAN,
although some people in the original discussion wanted to go down that
route.

The patch that I provided didn't do any of the dependency stuff yet -
I had been investigating various ways to do it automagically, although
I haven't worked on it for a little while. It may be that the straight
forward explicit declaration that you have here is a better way to do
it.

I didn't have versioning and interdependencies between modules yet,
although it's an obvious extension to the idea.

 A package can also host variables, which visibility are
 package global: any SQL into the package can refer directly to package
 variables.

That was way out of scope for my more modest suggestion - I certainly
wasn't going to change pl/pgsql semantics. For example, how do those
variables behave upon a transaction rollback?

 Now, what would be really good to have would be this pg_pkg command I was
 dreaming about in another -hacker mail:

This turns into recreating CPAN. I like the idea of a blessed set of
packages, but would rather not require all postgresql users to have a
full build environment (especially on windows) and have to replace
their native packaging solution. It seems that you agree that
fetching/installing should be separate from loading/installing into
the database. Good. Some posters on the original thread were
suggesting that the fetch/install step should somehow do the database
installation as well, which sounded like a huge can of worms.


I think that we can come up with a package/module format that allows
installation at the OS level without demanding a whole set of download
/ build machinery. If someone then wants to build that and have it
install packages, then fine, but we definitely should not require it
to be able to install stuff.

Look forward to your comments

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] WITH RECUSIVE patches 0723

2008-07-23 Thread David Fetter
On Wed, Jul 23, 2008 at 10:59:20AM -0400, Tom Lane wrote:
 Tatsuo Ishii [EMAIL PROTECTED] writes:
  Reviewers, please let me know if you find problems with the
  patches. If none, I would like to commit this weekend.
 
 Has this patch actually been reviewed yet?  The only reports I've
 seen are from testing; nothing from anyone actually reading the
 code.  I know I've not looked at it yet.

I've read the code, for what that's worth, which isn't much.  I just
tried out this patch on a fresh checkout of CVS TIP and found:

EXPLAIN WITH RECURSIVE t(i) AS (VALUES(1::int) UNION ALL SELECT i+1 FROM t 
WHERE i  5) SELECT * FROM t AS t1 JOIN t AS t2 USING(i);
 QUERY PLAN 
 
-
 Hash Join  (cost=0.08..0.16 rows=2 width=4)
   Hash Cond: (t1.i = t2.i)
   -  Recursion on t1  (cost=0.00..0.06 rows=2 width=4)
 -  Append  (cost=0.00..0.04 rows=2 width=4)
   -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4)
   -  Recursive Scan on t  (cost=0.00..0.00 rows=1 width=4)
 Filter: (i  5)
   -  Hash  (cost=0.06..0.06 rows=2 width=4)
 -  Recursion on t2  (cost=0.00..0.06 rows=2 width=4)
   -  Append  (cost=0.00..0.04 rows=2 width=4)
 -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 
width=4)
 -  Recursive Scan on t  (cost=0.00..0.00 rows=1 width=4)
   Filter: (i  5)
(13 rows)

When I try to execute the query without the EXPLAIN, having attached a debugger
to the back-end, I get.

(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x08192dcd in ExecQual (qual=0xa183824, econtext=0xa183230, resultForNull=0 
'\0') at execQual.c:4513
4513expr_value = ExecEvalExpr(clause, econtext, isNull, NULL);
(gdb) i s
#0  0x08192dcd in ExecQual (qual=0xa183824, econtext=0xa183230, resultForNull=0 
'\0') at execQual.c:4513
#1  0x08199b23 in ExecScan (node=0xa1831a4, accessMtd=0x81a6bb0 
RecursivescanNext) at execScan.c:131
#2  0x081a6ba9 in ExecRecursiveScan (node=0xa1831a4) at nodeRecursivescan.c:48
#3  0x08192320 in ExecProcNode (node=0xa1831a4) at execProcnode.c:380
#4  0x081a6923 in RecursionNext (node=0xa17fe18) at nodeRecursion.c:68
#5  0x08199a83 in ExecScan (node=0xa17fe18, accessMtd=0x81a6840 
RecursionNext) at execScan.c:68
#6  0x081a6839 in ExecRecursion (node=0xa17fe18) at nodeRecursion.c:116
#7  0x081923e0 in ExecProcNode (node=0xa17fe18) at execProcnode.c:339
#8  0x081a1c13 in MultiExecHash (node=0xa17fcc4) at nodeHash.c:94
#9  0x081a28b9 in ExecHashJoin (node=0xa17b654) at nodeHashjoin.c:159
#10 0x081922d8 in ExecProcNode (node=0xa17b654) at execProcnode.c:395
#11 0x081901db in standard_ExecutorRun (queryDesc=0xa15c334, 
direction=ForwardScanDirection, count=0) at execMain.c:1271
#12 0x08240dc4 in PortalRunSelect (portal=0xa15631c, forward=1 '\001', count=0, 
dest=0xa1733d8) at pquery.c:937
#13 0x082420e6 in PortalRun (portal=0xa15631c, count=2147483647, isTopLevel=1 
'\001', dest=0xa1733d8, altdest=0xa1733d8, 
completionTag=0xbfcacaea ) at pquery.c:793
#14 0x0823d0a7 in exec_simple_query (
query_string=0xa12fc9c WITH RECURSIVE t(i) AS (VALUES(1::int) UNION ALL 
SELECT i+1 FROM t WHERE i  5) SELECT * FROM t AS t1 JOIN t AS t2 USING(i);) 
at postgres.c:977
#15 0x0823e84c in PostgresMain (argc=4, argv=0xa0d0dac, username=0xa0d0d7c 
shackle) at postgres.c:3559
#16 0x0820957f in ServerLoop () at postmaster.c:3238
#17 0x0820a4e0 in PostmasterMain (argc=3, argv=0xa0ced50) at postmaster.c:1023
#18 0x081b69d6 in main (argc=3, argv=0xa0ced50) at main.c:188

What other information could help track down this problem?

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] WITH RECUSIVE patches 0723

2008-07-23 Thread Tatsuo Ishii
 On Wed, Jul 23, 2008 at 10:59:20AM -0400, Tom Lane wrote:
  Tatsuo Ishii [EMAIL PROTECTED] writes:
   Reviewers, please let me know if you find problems with the
   patches. If none, I would like to commit this weekend.
  
  Has this patch actually been reviewed yet?  The only reports I've
  seen are from testing; nothing from anyone actually reading the
  code.  I know I've not looked at it yet.
 
 I've read the code, for what that's worth, which isn't much.  I just
 tried out this patch on a fresh checkout of CVS TIP and found:
 
 EXPLAIN WITH RECURSIVE t(i) AS (VALUES(1::int) UNION ALL SELECT i+1 FROM t 
 WHERE i  5) SELECT * FROM t AS t1 JOIN t AS t2 USING(i);
  QUERY PLAN   

 -
  Hash Join  (cost=0.08..0.16 rows=2 width=4)
Hash Cond: (t1.i = t2.i)
-  Recursion on t1  (cost=0.00..0.06 rows=2 width=4)
  -  Append  (cost=0.00..0.04 rows=2 width=4)
-  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4)
-  Recursive Scan on t  (cost=0.00..0.00 rows=1 width=4)
  Filter: (i  5)
-  Hash  (cost=0.06..0.06 rows=2 width=4)
  -  Recursion on t2  (cost=0.00..0.06 rows=2 width=4)
-  Append  (cost=0.00..0.04 rows=2 width=4)
  -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 
 width=4)
  -  Recursive Scan on t  (cost=0.00..0.00 rows=1 width=4)
Filter: (i  5)
 (13 rows)
 
 When I try to execute the query without the EXPLAIN, having attached a 
 debugger
 to the back-end, I get.

Thanks for the report. We will look into this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

 (gdb) continue
 Continuing.
 
 Program received signal SIGSEGV, Segmentation fault.
 0x08192dcd in ExecQual (qual=0xa183824, econtext=0xa183230, resultForNull=0 
 '\0') at execQual.c:4513
 4513expr_value = ExecEvalExpr(clause, econtext, isNull, NULL);
 (gdb) i s
 #0  0x08192dcd in ExecQual (qual=0xa183824, econtext=0xa183230, 
 resultForNull=0 '\0') at execQual.c:4513
 #1  0x08199b23 in ExecScan (node=0xa1831a4, accessMtd=0x81a6bb0 
 RecursivescanNext) at execScan.c:131
 #2  0x081a6ba9 in ExecRecursiveScan (node=0xa1831a4) at nodeRecursivescan.c:48
 #3  0x08192320 in ExecProcNode (node=0xa1831a4) at execProcnode.c:380
 #4  0x081a6923 in RecursionNext (node=0xa17fe18) at nodeRecursion.c:68
 #5  0x08199a83 in ExecScan (node=0xa17fe18, accessMtd=0x81a6840 
 RecursionNext) at execScan.c:68
 #6  0x081a6839 in ExecRecursion (node=0xa17fe18) at nodeRecursion.c:116
 #7  0x081923e0 in ExecProcNode (node=0xa17fe18) at execProcnode.c:339
 #8  0x081a1c13 in MultiExecHash (node=0xa17fcc4) at nodeHash.c:94
 #9  0x081a28b9 in ExecHashJoin (node=0xa17b654) at nodeHashjoin.c:159
 #10 0x081922d8 in ExecProcNode (node=0xa17b654) at execProcnode.c:395
 #11 0x081901db in standard_ExecutorRun (queryDesc=0xa15c334, 
 direction=ForwardScanDirection, count=0) at execMain.c:1271
 #12 0x08240dc4 in PortalRunSelect (portal=0xa15631c, forward=1 '\001', 
 count=0, dest=0xa1733d8) at pquery.c:937
 #13 0x082420e6 in PortalRun (portal=0xa15631c, count=2147483647, isTopLevel=1 
 '\001', dest=0xa1733d8, altdest=0xa1733d8, 
 completionTag=0xbfcacaea ) at pquery.c:793
 #14 0x0823d0a7 in exec_simple_query (
 query_string=0xa12fc9c WITH RECURSIVE t(i) AS (VALUES(1::int) UNION ALL 
 SELECT i+1 FROM t WHERE i  5) SELECT * FROM t AS t1 JOIN t AS t2 USING(i);) 
 at postgres.c:977
 #15 0x0823e84c in PostgresMain (argc=4, argv=0xa0d0dac, username=0xa0d0d7c 
 shackle) at postgres.c:3559
 #16 0x0820957f in ServerLoop () at postmaster.c:3238
 #17 0x0820a4e0 in PostmasterMain (argc=3, argv=0xa0ced50) at postmaster.c:1023
 #18 0x081b69d6 in main (argc=3, argv=0xa0ced50) at main.c:188
 
 What other information could help track down this problem?
 
 Cheers,
 David.
 -- 
 David Fetter [EMAIL PROTECTED] http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: [EMAIL PROTECTED]
 
 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL extensions packaging

2008-07-23 Thread Tom Dunstan
Oops, sent with wrong from header...


-- Forwarded message --
From: Tom Dunstan [EMAIL PROTECTED]
To: Dimitri Fontaine [EMAIL PROTECTED]
Date: Wed, 23 Jul 2008 19:40:30 -0400
Subject: Re: [HACKERS] PostgreSQL extensions packaging
Hi!

On Wed, Jul 23, 2008 at 5:08 PM, Dimitri Fontaine
[EMAIL PROTECTED] wrote:
 I promised to have an in-depth look at the archives before to spend time on
 my ideas for $subject, but failed to do so.

I guess that means you missed both the original discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and
my initial patch in that direction and subsequent discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then
:(.

There were two core components to my idea of modules/packages:
 - Separation of installation at an OS level (RPM/yum, deb/dpkg, MSI
installer etc) and installation into a database. The intention was a)
to standardize package installation generally so that users didn't
have to read n different sets of installation instructions for n
different packages, and b) so that a db owner could install into their
own database any module that had been installed on the system, even if
that might include e.g. C functions that they otherwise would not be
able to install without being a superuser.

- Have dependency tracking so that pg_dump could emit e.g. LOAD
MODULE foo; rather than all the different instructions to recreate
the module.

So the proposed installation procedure would be more along the lines of:

yum install postgresql-module-postgis
echo load module postgis | psql mydb

My intention was to use whatever native package manager was
appropriate for your distro rather than trying to recreate CPAN,
although some people in the original discussion wanted to go down that
route.

The patch that I provided didn't do any of the dependency stuff yet -
I had been investigating various ways to do it automagically, although
I haven't worked on it for a little while. It may be that the straight
forward explicit declaration that you have here is a better way to do
it.

I didn't have versioning and interdependencies between modules yet,
although it's an obvious extension to the idea.

 A package can also host variables, which visibility are
 package global: any SQL into the package can refer directly to package
 variables.

That was way out of scope for my more modest suggestion - I certainly
wasn't going to change pl/pgsql semantics. For example, how do those
variables behave upon a transaction rollback?

 Now, what would be really good to have would be this pg_pkg command I was
 dreaming about in another -hacker mail:

This turns into recreating CPAN. I like the idea of a blessed set of
packages, but would rather not require all postgresql users to have a
full build environment (especially on windows) and have to replace
their native packaging solution. It seems that you agree that
fetching/installing should be separate from loading/installing into
the database. Good. Some posters on the original thread were
suggesting that the fetch/install step should somehow do the database
installation as well, which sounded like a huge can of worms.


I think that we can come up with a package/module format that allows
installation at the OS level without demanding a whole set of download
/ build machinery. If someone then wants to build that and have it
install packages, then fine, but we definitely should not require it
to be able to install stuff.

Look forward to your comments

Cheers

Tom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] issues/experience with building postgres on Windows

2008-07-23 Thread Martin Zaun


Hi,

trying to build postgres on Windows, I ran into a number of
problems not covered by the FAQs or the PostgreSQL 8.3.3
documentation:
  http://www.postgresql.org/docs/8.3/interactive/install-win32-full.html

In short, it's not been an easy experience so far:
- build errors with latest Visual Studio Express C++ 2008
- ok with VS Express 2005 but integration issues with Platform SDK
- issues locating the 14 required software packages:
  - no luck getting Bison 1.875 or 2.2 Windows binaries
  - linkage errors with the latest ActivePerl/TCL libraries
  - locating a OSSP-UUID binary package for Windows
(.dll but no uuid.h with Hiroshi Saito's binary package?)

Below are more details / questions / suggestions with the goal to
help for a better build experience on Windows.

Any hints or comments welcome!

Regards,
Martin


1) Build errors with Visual Studio Express C++ 2008

   For building with Visual Studio Express 2005, the Microsoft
   Platform SDK needs to be downloaded, installed, and integrated
   into the INCLUDE/LIB/PATHs.

   However, since Visual Studio Express 2008 already comes with
   the integrated SDK, I tried this one first.  Ran into a number
   of VCBuild errors:

vcbuild.exe : error VCBLD0010: Project 'G:\olsen\pg\pgsql\postgres.vcproj' requ
ires upgrade. Use 'vcbuild /upgrade' or 'devenv /upgrade' to upgrade the projec
t.

   Couldn't get past these errors by executing vcbuild /upgrade
   on the generated .vcproj files.  The VS Release Notes and
   other sites describe this issue a bit but to not much help.

   == Anyone knowing how to build postgres with VC++ 2008?


2) Back to Visual Studio Express C++ 2005  MS Platform SDK

   The postgres documentation wasn't specific but I installed
 Microsoft Platform SDK for Windows Server 2003 R2
 
http://www.microsoft.com/downloads/details.aspx?FamilyID=0baf2b35-c656-4969-ace8-e4c0c0716adbDisplayLang=en
   for I guessed this the basis for the Postgres documentation.

   There's a newer version, though, now called Windows SDK:
 Windows SDK for Windows Server 2008 and .NET Framework 3.5
 
http://www.microsoft.com/downloads/details.aspx?FamilyID=e6e1c3df-a74f-4207-8586-711ebe331cdcDisplayLang=en

   == Anyone tried building PG with latest Windows SDK (2008)?


3) Issues adding Platform SDK paths to VC++ 2005

   The README in pgsql/src/tools/msvc/ informs about the steps
   needed to integrate the Platform SDK with VC++ 2005 by editing
   the Include, Library and Path tags in the files
 vcprojectengine.dll.express.config
 vcprojectengine.dll.config
   under the VC++ directory vc\vcpackages.

   The README's following assertion is not correct, I think:
 This should work for both GUI and commandline builds, but a
 restart may be necessary.

   While the Include/Lib/Path changes appear to be in effect for
   the GUI they are NOT in the Visual Studio 2005 Command Prompt!
   (even after restart)

   The following MS pages
 http://www.microsoft.com/express/2005/platformsdk/default.aspx
 http://www.microsoft.com/express/2005/platformsdk/default.aspx
   describe additional steps, notably to also delete the file
 vccomponents.dat
   located in
 %USERPROFILE%\Local Settings\Application Data\Microsoft\VCExpress\8.0
   before restarting VC++ Express Edition.

   But this didn't change the paths in the VS 2005 Command Prompt
   either.  So I had to change the user INCLUDE/LIB/PATH
   environment manually.

   Actually, I'm using the Command Prompt from the MS Platform
   SDK's Build Environment selection, not the VS 2005 start menu,
   for other important environment settings (target OS, 32/64,
   debug/release etc).

   == Does the README's assertion need to be changed/corrected?


4) Issues with locating the 14 required software packages

   - ActiveState Perl, ActiveState TCL
 Had to click around to find out that I don't want the
 ActivePerl Pro Studio Free Trial software but the
 ActivePerl.  Same with ActiveTcl.

 Problem: found out later that PG doesn't build with the
 latest ActivePerl/TCL language packs!

   - Bison, Flex
 The PG documentation states: only Bison 1.875 or versions
 2.2 and later will work.

 Problem: The GnuWin32 website only offers Bison 2.1!

   - Diff, Gettext
 No problems.

   - MIT Kerberos
 No problems but unsure how to answer some of the
 installation questions.

   - libxml2, libxslt, iconv
 No problems.

   - openssl
 No problems but unsure how to answer some of the
 installation questions.

   - ossp-uuid
 Downloaded source from http://www.ossp.org/pkg/lib/uuid/.

 Problem: from where to get a ossp-uuid Windows binary?

   - Python
 No problems.

   - zlib
 No problems.

   == Can't we provide a package for download with all/most of
  the libraries/tools needed for building PG on Windows?


5) Configuring pgsql/src/tools/msvc/config.pl, buildenv.pl

   A typo in my config.pl had the build aborting 

Re: [HACKERS] issues/experience with building postgres on Windows

2008-07-23 Thread Andrew Dunstan



Martin Zaun wrote:


Hi,

trying to build postgres on Windows, I ran into a number of
problems not covered by the FAQs or the PostgreSQL 8.3.3
documentation:
  http://www.postgresql.org/docs/8.3/interactive/install-win32-full.html

In short, it's not been an easy experience so far:
- build errors with latest Visual Studio Express C++ 2008


Nobody ever said this was supported AFAIK. VC++ Express 2005 is the 
supported compiler so far.



- ok with VS Express 2005 but integration issues with Platform SDK


Right.


- issues locating the 14 required software packages:
  - no luck getting Bison 1.875 or 2.2 Windows binaries


bison 1.875 is available here: 
http://sourceforge.net/project/showfiles.php?group_id=23617package_id=22822



  - linkage errors with the latest ActivePerl/TCL libraries


Ugh.

What would be helpful would be a build log. something like: 
http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=mastodondt=2008-07-24%20010001stg=make 
or an extract from it.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] WITH RECUSIVE patches 0723

2008-07-23 Thread Tatsuo Ishii
 On Wed, Jul 23, 2008 at 10:59:20AM -0400, Tom Lane wrote:
  Tatsuo Ishii [EMAIL PROTECTED] writes:
   Reviewers, please let me know if you find problems with the
   patches. If none, I would like to commit this weekend.
  
  Has this patch actually been reviewed yet?  The only reports I've
  seen are from testing; nothing from anyone actually reading the
  code.  I know I've not looked at it yet.
 
 I've read the code, for what that's worth, which isn't much.  I just
 tried out this patch on a fresh checkout of CVS TIP and found:
 
 EXPLAIN WITH RECURSIVE t(i) AS (VALUES(1::int) UNION ALL SELECT i+1 FROM t 
 WHERE i  5) SELECT * FROM t AS t1 JOIN t AS t2 USING(i);
  QUERY PLAN   

 -
  Hash Join  (cost=0.08..0.16 rows=2 width=4)
Hash Cond: (t1.i = t2.i)
-  Recursion on t1  (cost=0.00..0.06 rows=2 width=4)
  -  Append  (cost=0.00..0.04 rows=2 width=4)
-  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4)
-  Recursive Scan on t  (cost=0.00..0.00 rows=1 width=4)
  Filter: (i  5)
-  Hash  (cost=0.06..0.06 rows=2 width=4)
  -  Recursion on t2  (cost=0.00..0.06 rows=2 width=4)
-  Append  (cost=0.00..0.04 rows=2 width=4)
  -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 
 width=4)
  -  Recursive Scan on t  (cost=0.00..0.00 rows=1 width=4)
Filter: (i  5)
 (13 rows)
 
 When I try to execute the query without the EXPLAIN, having attached a 
 debugger
 to the back-end, I get.
 
 (gdb) continue
 Continuing.
 
 Program received signal SIGSEGV, Segmentation fault.

Thanks for the report. Here is the new patches from Yoshiyuki. It
appeared that addRangeTableEntryForRecursive() needs to do deep copy
for the subquery and ref in the RangeTblEntry to avoid double free
bug (remember that your example is a self join case). 

Also I added your query to the regression test case with minor
modifications.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


recursive_query.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Research/Implementation of Nested Loop Join optimization

2008-07-23 Thread Tom Lane
Manoel Henrique [EMAIL PROTECTED] writes:
 The nodeMergejoin.c is the code for the Merge Join isn`t it? I am trying to
 find a way to change the Nested Loop Join, It would be more like on
 nodeNestloop.c when rescanning the inner plan, (second time scanning the
 inner plan and so on) he`d change the scan direction, If the scan direction
 was from first tuple to last tuple it would go backwards, if it was from
 last to first it would go forward... The code I`m looking atm is from 8.3.1
 , seems to have some kind of direction manager but doesn`t seems to be in
 use.

I find this a bit dubious.  If the inner rel is small enough to fit in
memory then it buys nothing.  If not, then you win only to the extent
that a pretty large fraction of the inner rel fits in memory.  In any
case you are relying on the assumption that backwards scan is just as
efficient as forward scan, which seems to me to be a pretty large
assumption --- we expect forward seqscans to get a performance boost
from kernel readahead, but I'd be surprised if the kernel recognized
what was happening in a backwards scan.

Note also that backwards scan doesn't work at all in some plan
node types (cf ExecSupportsBackwardScan).  You'd need to check
what the inner input node was before trying this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers