Re: [HACKERS] [ADMIN] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Thursday, January 17, 2008 01:12:54 -0500 Tom Lane <[EMAIL PROTECTED]> 
wrote:

> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
>> [EMAIL PROTECTED] (Mischa Sandberg) writes:
>>> Unfortunately, with multiple jails running PG servers and (due to app
>>> limitations) all servers having same PGPORT, you get the situation that
>>> when jail#2 (,jail#3,...) server comes up, it:
>>> - detects that there is a shm seg with ipc key 5432001
>>> - checks whether the associated postmaster process exists (with kill -0)
>>> - overwrites the segment created and being used by jail #1
>
>> Easiest fix: change the UID of the user running the postmaster (ie. pgsql) so
>> that each runs as a distinct UID (instead of distinct PGPORT) ... been doing
>> this since moving to FreeBSD 6.x ... no patches required ...
>
> Sure, but in the spirit of "belt and suspenders too", I'd think that
> doing that *and* something like Mischa's proposal wouldn't be bad.

No arguments here, just pointing out that changing PGPORT isn't/wasnt' the only 
way of addressing this problem ... if we can do something more 'internal', it 
would definitely make life alot easier ...

- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHjvMR4QvfyHIvDvMRAjJuAKCAYGkyvDOMkA6wqeQ7nAqoA1mkRQCdG+5n
aD1uG+zUtevdJGJ3BsqeDAs=
=Y0DY
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [ADMIN] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Tom Lane
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> [EMAIL PROTECTED] (Mischa Sandberg) writes:
>> Unfortunately, with multiple jails running PG servers and (due to app
>> limitations) all servers having same PGPORT, you get the situation that
>> when jail#2 (,jail#3,...) server comes up, it:
>> - detects that there is a shm seg with ipc key 5432001
>> - checks whether the associated postmaster process exists (with kill -0)
>> - overwrites the segment created and being used by jail #1

> Easiest fix: change the UID of the user running the postmaster (ie. pgsql) so
> that each runs as a distinct UID (instead of distinct PGPORT) ... been doing 
> this since moving to FreeBSD 6.x ... no patches required ...

Sure, but in the spirit of "belt and suspenders too", I'd think that
doing that *and* something like Mischa's proposal wouldn't be bad.

(BTW, as far as I saw the original post only went to -hackers
... there's something messed up about your reply.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Marc G. Fournier
[EMAIL PROTECTED] (Mischa Sandberg) writes:

>Unfortunately, with multiple jails running PG servers and (due to app
>limitations) all servers having same PGPORT, you get the situation that
>when jail#2 (,jail#3,...) server comes up, it:
>- detects that there is a shm seg with ipc key 5432001
>- checks whether the associated postmaster process exists (with kill -0)
>- overwrites the segment created and being used by jail #1

Easiest fix: change the UID of the user running the postmaster (ie. pgsql) so 
that each runs as a distinct UID (instead of distinct PGPORT) ... been doing 
this since moving to FreeBSD 6.x ... no patches required ...
-- 

Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Postgresql Materialized views

2008-01-16 Thread Andreas Pflug

Simon Riggs wrote:


My thinking was if you load a 1000 rows and they all have the same key
in your summary table then you'll be doing 1000 updates on a single row.
  
This is true because the statement level triggers are still rudimentary, 
with no OLD and NEW support. A single AFTER statement trigger execution 
could maintain the summary table with much less effort.


Regards,
Andreas


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Renaming a constraint's index

2008-01-16 Thread Andrew Dunstan



Tom Lane wrote:

There was some discussion last week on -bugs about how renaming an index
that belongs to a unique or primary key constraint is allowed, but can
lead to situations that can't be dumped/restored properly.  This isn't
really pg_dump's fault, IMHO.  We should rather make the backend enforce
that the index's name stays in sync with the constraint's name.  (Well,
I guess we could imagine making pg_dump deal with this by issuing
ALTER TABLE ADD CONSTRAINT and then ALTER INDEX RENAME, but ... ick.)

There seem to be three things we could do:

1. Make ALTER INDEX RENAME fail if the index belongs to a constraint.
This is trivial code-wise, but doesn't seem especially helpful to users.

2. Make ALTER INDEX RENAME automatically rename the constraint, too.
This would take a few dozen lines of code but is certainly not hard.

3. Invent an ALTER TABLE RENAME CONSTRAINT command, and have it also
rename the underlying index.  This would take more code than would be
reasonable to add to 8.3 at this late date, I think, but it would
add more functionality since you could also rename constraints of
other types.

Now, doing either #1 or #2 today would not foreclose doing #3 later
(actually, we *must* do either #1 or #2 together with #3 in order to
meet the goal of not letting the names diverge).

I'm thinking about doing #2 for 8.3 and leaving #3 as a TODO item.
Comments?


  


+1

cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Renaming a constraint's index

2008-01-16 Thread Tom Lane
There was some discussion last week on -bugs about how renaming an index
that belongs to a unique or primary key constraint is allowed, but can
lead to situations that can't be dumped/restored properly.  This isn't
really pg_dump's fault, IMHO.  We should rather make the backend enforce
that the index's name stays in sync with the constraint's name.  (Well,
I guess we could imagine making pg_dump deal with this by issuing
ALTER TABLE ADD CONSTRAINT and then ALTER INDEX RENAME, but ... ick.)

There seem to be three things we could do:

1. Make ALTER INDEX RENAME fail if the index belongs to a constraint.
This is trivial code-wise, but doesn't seem especially helpful to users.

2. Make ALTER INDEX RENAME automatically rename the constraint, too.
This would take a few dozen lines of code but is certainly not hard.

3. Invent an ALTER TABLE RENAME CONSTRAINT command, and have it also
rename the underlying index.  This would take more code than would be
reasonable to add to 8.3 at this late date, I think, but it would
add more functionality since you could also rename constraints of
other types.

Now, doing either #1 or #2 today would not foreclose doing #3 later
(actually, we *must* do either #1 or #2 together with #3 in order to
meet the goal of not letting the names diverge).

I'm thinking about doing #2 for 8.3 and leaving #3 as a TODO item.
Comments?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] VACUUM FULL out of memory

2008-01-16 Thread Simon Riggs
On Wed, 2008-01-16 at 21:53 +0100, Michael Omotayo Akinde wrote:

> As far as I can see, the ulimits are set up as they should; and on a
> 64-bit machine with 16GB RAM, I don't see there should be a problem
> with allocating 2 GB maintenance work memory. In any case, I have
> serious difficulty believing that the ulimits can be the problem since
> PostgreSQL reacts the same even with maintenance work memory set very
> low.
> 
> Basically, all of the tests we've run over the past couple of weeks
> end the same. VACUUM FULL on the tables runs out of memory and
> crashes. Ordinary VACUUM runs fine (albeit slowly) - but recommends
> that one runs VACUUM FULL. 

Not surprising: AFAIK VACUUM FULL does not limit itself to
maintenance_work_mem. Or put another way, VF ignores both kinds of
work_mem completely.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Postgresql Materialized views

2008-01-16 Thread Simon Riggs
On Wed, 2008-01-16 at 11:16 +, Heikki Linnakangas wrote:
> Merlin Moncure wrote:
> > On Jan 12, 2008 4:19 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> >> On Sat, 2008-01-12 at 13:27 +0100, Jean-Michel Pouré wrote:
> >>
> >>> Please pick-up this important issue for developpers. There is no need to
> >>> concentrate on complex issues, when handling materialized views could
> >>> boost somme web apps. by a factor of 10 or more.
> >> It's more complex than you think, but the main reason was that HOT was a
> >> prerequisite for making summary tables work efficiently, which is only
> >> now just about to go live into 8.3
> > 
> > +1
> 
> I don't quite agree with that. HOT certainly speeds up UPDATEs on small 
> tables, like you a summary table, but there's a lot of use cases like 
> data warehousing, where the summary tables are not updated that often 
> for the updates to become a bottleneck.

I should have said that was *my* reason for not doing it sooner.

My thinking was if you load a 1000 rows and they all have the same key
in your summary table then you'll be doing 1000 updates on a single row.
With HOT that will cause the block to fill up and then we wipe out the
previously updated rows, since they are inserted and deleted in same
transaction. So all 1000 updates can occur without going beyond that
single block. Much faster, no mess. 

Before HOT, large loads were worse, since there was no chance of
VACUUMing the table between updates. 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] to_char incompatibility

2008-01-16 Thread Bruce Momjian
Peter Eisentraut wrote:
> Am Donnerstag, 10. Januar 2008 schrieb Roberts, Jon:
> > > On PostgreSQL:
> > >
> > > select to_date('31-DEC-200700:00:00', 'dd-mon- hh24:mi:ss');
> > >to_date
> > > --
> > >  200700-12-31
> 
> > Oracle removes all white spaces in the date you pass in and the date
> > format.
> 
> I don't have a strong opinion on the whitespace handling, but then I wonder
> 
> 1. If I put four , why does it create a six-digit year?
> 

If we didn't print +4 digits for  we would need new patterns for >4
digit years, like 5 and 6-digit years.  Our documentation is at least
clear:


year (4 and more digits)

> 2. If it does create a six digit year, the rest of the pattern doesn't match 
> anymore, so it should error.
> 
> A further example shows that to_date seems to have little error checking 
> altogether:
> 
> select to_date('17.12.1978', '-MM-DD');
>   to_date
> 
>  0017-12-19

Yea, I can't find any way to suppress those leading zeros, except by
using the proper number of Y's.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] COPY encoding

2008-01-16 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Yes, it is.  Not sure if it's adequately documented.

> Will this cover the case?

Text looks OK.  I think it might fit better a bit further up, adjacent
to the para about DateStyle which is a somewhat comparable
consideration.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] to_char incompatibility

2008-01-16 Thread Bruce Momjian
Andrew Dunstan wrote:
> 
> 
> Tom Lane wrote:
> > Peter Eisentraut <[EMAIL PROTECTED]> writes:
> >   
> >> A further example shows that to_date seems to have little error checking 
> >> altogether:
> >> 
> >
> > Yeah, that's been one of the main knocks on that code since day one.
> > Somebody needs to spend a whole lot of time on it, and the original
> > author has left the project ...
> >
> > 
> >   
> 
> TODO list item?

We have something on hold for 8.4:

http://momjian.us/mhonarc/patches_hold/msg00319.html

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Greg Smith

On Wed, 16 Jan 2008, Kevin Grittner wrote:


I haven't seen any benchmarks on the list or in our environment
where the separate spindles gave more than a 1% increase in
performance when using a good-quality BBC controller.


Well, even 1% isn't nothing, which is the main point I was making--it 
doesn't completely remove the gain, just reduce it a lot.  If you wanted 
to see a bigger difference you could simulate a workload with lots of 
clients doing short transactions.


The biggest gain in having a separate WAL isn't as obvious in gross 
performance measurements.  It's what happens to worst-case performance for 
the transactions just after a checkpoint, when there is a burst of more 
full page writes (they normally settle down as the most popular pages get 
written).  That's the spot where you're most likely to run into a WAL 
bottleneck that just having a BBC doesn't completely eliminate.



Do you have results that show more of a difference?  Can you share them?


I wasn't trying to quantify this particular number and it would take a bit 
just to figure out what I could and couldn't share.  Expanding on the 
above, though, if you look some of the recent public benchmarks like 
http://www.kaltenbrunner.cc/blog/index.php?/archives/21-8.3-vs.-8.2-a-simple-benchmark.html 
you'll see Stefan was able to hit around 4000 TPS on that test system. 
Now, if you had a typical 256MB BBC (it's 512MB there) and full pages 
writes are dumping 8K each, that means you can fit 32768 of them before 
you blow your cache and the disks really have to keep up--and the WAL 
doesn't get the whole cache to itself.  The first 10-20 seconds after a 
checkpoint on such a system are kind of interesting to zoom in on.  If the 
WAL has to fight for seek time with database reads during that period (DB 
writes will still be mostly cached by the OS just after a checkpoint) it 
can be messy compared to what you get with a dedicated WAL.  But that will 
average out to a minimal effect on TPS over the course of the test.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] VACUUM FULL out of memory

2008-01-16 Thread Michael Omotayo Akinde
Just to conclude on the issue we had here.

As far as I can see, the ulimits are set up as they should; and on a
64-bit machine with 16GB RAM, I don't see there should be a problem with
allocating 2 GB maintenance work memory. In any case, I have serious
difficulty believing that the ulimits can be the problem since
PostgreSQL reacts the same even with maintenance work memory set very low.

Basically, all of the tests we've run over the past couple of weeks end
the same. VACUUM FULL on the tables runs out of memory and crashes.
Ordinary VACUUM runs fine (albeit slowly) - but recommends that one runs
VACUUM FULL.

Unfortunately, we will not have resources to run further test runs on
this for a while. If we get time next month, I may try to create a small
test case that demonstrates the problem (shouldn't be too hard to do - I
expect throwing in some gigabytes of objects should do the trick), if
anyone is interested.

Our solution to the issue will have to be simply to warn users/adopters
of our system to never run VACUUM FULL on the largeobject table.

Regards,

Michael Akinde
Database Architect, Met.no

 Michael wrote: 
Thanks for the explanation on the ulimits; I can see how that could turn
out a problem in some cases.

Following Tom's suggestion, here is the startup script I used:
#!/bin/sh
ulimit -a > $PGHOST/server.ulimit
pg_ctl start -l $PGHOST/server.log

The ulimits seem to be the same, though:
$> cat server.ulimit
core file size  (blocks, -c) 1
data seg size   (kbytes, -d) unlimited
max nice(-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) unlimited
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Regards,

Michael A.

Tom Lane wrote:
> Andrew Sullivan <[EMAIL PROTECTED]> writes:
>   
>> On Tue, Jan 08, 2008 at 05:27:16PM +0100, Michael Akinde wrote:
>> 
>>> Those are the ulimits of the db_admin account (i.e., the user that set 
>>> up and runs the DB processes). Is Postgres limited by other settings?
>>>   
>> On one system I used many years ago, /bin/sh wasn't what I thought it
was,
>> and so the ulimit that I got when logged in was not what the
postmaster was
>> starting under.  Took me many days to figure out what was up.
>> 
>
> The only thing I find convincing is to insert "ulimit -a >someplace"
> into the script that starts the postmaster, adjacent to where it does
> so, and then reboot.  There are too many systems on which daemons are
> launched under settings different from what interactive shells use
> (a policy that's often a good one, too).
>
>   regards, tom lane
>   


Content-Type: multipart/alternative;
 boundary="06080901070209000406"


--06080901070209000406
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Thanks for the explanation on the ulimits; I can see how that could turn 
out a problem in some cases.

Following Tom's suggestion, here is the startup script I used:
#!/bin/sh
ulimit -a > $PGHOST/server.ulimit
pg_ctl start -l $PGHOST/server.log

The ulimits seem to be the same, though:
$> cat server.ulimit
core file size  (blocks, -c) 1
data seg size   (kbytes, -d) unlimited
max nice(-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) unlimited
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Regards,

Michael A.

Tom Lane wrote:
> Andrew Sullivan <[EMAIL PROTECTED]> writes:
>   
>> On Tue, Jan 08, 2008 at 05:27:16PM +0100, Michael Akinde wrote:
>> 
>>> Those are the ulimits of the db_admin account (i.e., the user that set 
>>> up and runs the DB processes). Is Postgres limited by other settings?
>>>   
>> On one system I used many years ago, /bin/sh wasn't what I thought it was,
>> and so the ulimit that I got when logged in was not what the postmaster was
>> starting under.  Took me many days to figure out what was up.
>> 
>
> The only thing I find convincing is to insert "ulimit -a >someplace"
> into the scri

Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Heikki Linnakangas escribió:
For more usefulness, we'd need to keep databases more separate from each 
other than we do now. Databases would need to have their own transaction 
counters, for example.


Hmm, why?  Perhaps you are right but I don't see the reason.


If each database was stand-alone, you would need only one base backup 
and WAL per database to restore, instead of base backup and WAL of the 
database, and base backup and WAL of shared stuff. You could backup one 
database in cluster, restore it somewhere else, and later copy it back 
to the original cluster. You could back up one database at a time, and 
restore the whole cluster from the N per-database backups.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Alvaro Herrera
Heikki Linnakangas escribió:

> I don't think it's going to work too well, though, not without major 
> changes at least. What would happen when you restore a PITR backup of just 
> one database? Would the other databases still be there in the restored 
> cluster? What state would they be in? After restoring one database, and 
> doing some stuff on it, could you ever "merge" those changes with the rest 
> of the cluster?

Well, a PITR slave, after you change it, cannot be brought in sync with
the master.  This is not different.

If you replicate a single database's stream, the other databases should
not be there.  My idea is that a slave could request multiple databases'
streams.  The ability to do it is needed anyway, to follow both the
basic database stream and the shared stream.

> Mind you, there's more things shared between databases than the shared 
> catalogs. clog for example.

Sure --- my original proposal mentioned the use of the shared WAL stream
for global objects (though I didn't mention pg_clog, but surely it had
better be there).

> For more usefulness, we'd need to keep databases more separate from each 
> other than we do now. Databases would need to have their own transaction 
> counters, for example.

Hmm, why?  Perhaps you are right but I don't see the reason.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> I've got a couple of concerns about this-

> #1: Having the shared memory be global is a rather large problem when it
> comes to something like PG which can have a fair bit of data going
>   through that area that could be sensitive.

Well, you'd have to talk to the FreeBSD kernel hackers about changing
that, but I imagine it's still true that userid permissions checking
applies.  Whether to run the postmasters that are in different jails
under different userids is a separate questions.

> #3: At least in the linux-equivilant to jails (linux-vservers, imv
>   anyway), they started w/o an init process and eventually decided it
>   made sense to have one, so I'm not sure that this test will always
>   work and the result might catch someone by suprise at some later
>   date.  Is there a better/more explicit test?

We could just leave out the kill(1,0) part.  In fact I wonder whether
we shouldn't do something like this on all platforms not only FreeBSD.
Quite aside from any considerations of jails, it seems like a pretty
bad idea to try to zap a shmem segment that has any attached processes.

Consider a system that normally runs multiple postmasters, in which one
postmaster has died but left orphaned backends behind, and we are trying
to start an unrelated postmaster.  The current code seems capable of
deciding to zap the segment with those orphaned backends attached.
This'd require a shmem key collision which seems pretty improbable given
our key assignments, but not quite impossible.  If it did happen then
the net effect would be to clear the segment's ID (since it can't
actually go away till the connected processes do).  The bad thing about
that is that if the dead postmaster were then restarted, it wouldn't
recognize the segment as being its own, and would happily start up
despite the orphaned backends.  Result: exactly the kind of conflicts
and data corruption that all these interlocks are trying to prevent.

So unless I'm missing something here, adding a check for nattch = 0
is a good idea, quite aside from making FreeBSD jails safer.

I think the worrisome question that follows on from Stephen's is really
whether FreeBSD will ever decide to lie about nattch (ie, exclude
processes in other jails from that count).

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] COPY encoding

2008-01-16 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  
In helping someome on IRC it has become apparent that unless I am 
mistaken "COPY foo from 'filename'" is reading the file according to the 
client encoding.



  

Is that the expected behaviour?



Yes, it is.  Not sure if it's adequately documented.
  


Will this cover the case?

diff -c -r1.80 copy.sgml
*** copy.sgml   18 Apr 2007 02:28:22 -  1.80
--- copy.sgml   16 Jan 2008 20:44:02 -
***
*** 363,368 
--- 363,376 
 happened well into a large copy operation. You might wish to invoke
 VACUUM to recover the wasted space.

+
+
+ Input data is interpreted according to the current client encoding,
+ and output data is encoded in the the current client encoding, even
+ if the data does not pass through the client but is read from or
+ written to a file.
+
+
  
 
  



cheers

andrew



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Guillaume Smet
On Jan 16, 2008 7:41 PM, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
> I don't think it's going to work too well, though, not without major
> changes at least.

Well, I know it's really not doable with the current behaviour of WAL.
I just wanted to point this feature request because we had it a few
times and having one cluster per database is not really smart and it
wasn't too far from the subject.

> What would happen when you restore a PITR backup of
> just one database? Would the other databases still be there in the
> restored cluster?

In a perfect world, you should be able to trigger the PITR for only
one database of the cluster with the other databases still running.
It's especially interesting on a shared database server.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Gokulakannan Somasundaram
>
>
> For more usefulness, we'd need to keep databases more separate from each
> other than we do now. Databases would need to have their own transaction
> counters, for example. Shared relations would obviously need major
> changes for that to work. If we ultimately could separate databases so
> that you could take a filesystem copy of a single database, and restore
> it to another cluster, then per-database WAL and PITR would work.
>
> I agree to the fact that we can't have a separate WAL per database. Looks
like it makes more sense to create a seperate database cluster, instead of
adding one more database, if we want to make better use of available horse
power and if we don't have cross database queries.

Thanks,
Gokul.


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Heikki Linnakangas

Guillaume Smet wrote:

On Jan 16, 2008 6:12 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

Tom Lane escribió:

Possibly true, but if that's the underlying hardware then there's no
performance benefit in breaking WAL up at all, no?

Selective PITR shipping.


If it was possible to launch a PITR only on a given database, that
could be a great feature too. We have at least one customer who runs
every database in a separate cluster to be able to do PITR on only one
database if needed (for example if someone executed a DROP TABLE by
mistake).


Yeah, it sure would be nice.

I don't think it's going to work too well, though, not without major 
changes at least. What would happen when you restore a PITR backup of 
just one database? Would the other databases still be there in the 
restored cluster? What state would they be in? After restoring one 
database, and doing some stuff on it, could you ever "merge" those 
changes with the rest of the cluster?


Mind you, there's more things shared between databases than the shared 
catalogs. clog for example.


It might be useful for creating read-only copies of a master database, 
but I don't see it being very useful/possible in general.


For more usefulness, we'd need to keep databases more separate from each 
other than we do now. Databases would need to have their own transaction 
counters, for example. Shared relations would obviously need major 
changes for that to work. If we ultimately could separate databases so 
that you could take a filesystem copy of a single database, and restore 
it to another cluster, then per-database WAL and PITR would work.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Kevin Grittner
>>> On Wed, Jan 16, 2008 at 11:40 AM, in message
<[EMAIL PROTECTED]>, Greg Smith
<[EMAIL PROTECTED]> wrote: 
> On Wed, 16 Jan 2008, Alvaro Herrera wrote:
> 
>> Keep in mind that there are claims that a write-cache-enabled
>> battery-backed RAID controller negates the effect of a separate spindle.
> 
> "Negates" is a bit strong; there's still some performance advantage on 
> systems that write a serious amount of data.  It's certainly true that a 
> BCC controller greatly reduces the need for a separate spindle.
 
I haven't seen any benchmarks on the list or in our environment
where the separate spindles gave more than a 1% increase in
performance when using a good-quality BBC controller.  Do you have
results that show more of a difference?  Can you share them?
 
-Kevin
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Stephen Frost
* Mischa Sandberg ([EMAIL PROTECTED]) wrote:
> Here (@sophos.com) we run machine cluster tests using FreeBSD jails. A
> jail is halfway between a chroot and a VM. Jails blow a number of
> assumptions about a unix environment: sysv ipc's are global to all
> jails; but a process can only "see" other processes also running in the
> jail. In fact, the quickest way to tell whether you're running in a jail
> is to test for process 1.

I've got a couple of concerns about this-

#1: Having the shared memory be global is a rather large problem when it
comes to something like PG which can have a fair bit of data going
through that area that could be sensitive.
#2: Isn't there already a uid check that's done?  Wouldn't this make
more sense anyway (and hopefully minimize the impact of a bad person
getting control of the PG database/user in a given jail)?
#3: At least in the linux-equivilant to jails (linux-vservers, imv
anyway), they started w/o an init process and eventually decided it
made sense to have one, so I'm not sure that this test will always
work and the result might catch someone by suprise at some later
date.  Is there a better/more explicit test?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Greg Smith

On Wed, 16 Jan 2008, Alvaro Herrera wrote:


Keep in mind that there are claims that a write-cache-enabled
battery-backed RAID controller negates the effect of a separate spindle.


"Negates" is a bit strong; there's still some performance advantage on 
systems that write a serious amount of data.  It's certainly true that a 
BCC controller greatly reduces the need for a separate spindle.


It can be handy to keep it seperate anyway because it makes it trivial to 
track WAL I/O vs. database I/O.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Tom Lane
Mischa Sandberg <[EMAIL PROTECTED]> writes:
> Quoting Tom Lane <[EMAIL PROTECTED]>:
>> Mischa Sandberg <[EMAIL PROTECTED]> writes:
>>> +   if (kill(1,0) && errno == ESRCH && 
>>> PGSharedMemoryIsInUse(0,NextShmemSegID))
>>> +   continue;
>> 
>> Isn't the last part of that test backward?  If it isn't, I don't
>> understand what it's for at all.

> Serious blush here. Yes.

Actually, after re-reading what PGSharedMemoryIsInUse does, I don't
think you want to use it: it goes to considerable lengths to avoid
returning a false positive, whereas in this context I believe we
*do* need to avoid segments that belong to other data directories.
So you probably need a separate chunk of code that only does the
nattch test.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Guillaume Smet
On Jan 16, 2008 6:12 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> Tom Lane escribió:
> > Possibly true, but if that's the underlying hardware then there's no
> > performance benefit in breaking WAL up at all, no?
>
> Selective PITR shipping.

If it was possible to launch a PITR only on a given database, that
could be a great feature too. We have at least one customer who runs
every database in a separate cluster to be able to do PITR on only one
database if needed (for example if someone executed a DROP TABLE by
mistake).

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> Heikki Linnakangas escribió:
>>> Another issue is that reading WAL is inherently not very scalable. There's 
>>> only one WAL for the whole cluster, and it needs to be read sequentially, 
>>> so it can easily become a bottleneck on large systems.
>
>> I have wondered why do we do it this way.  Is there a problem with
>> having one WAL per database, and another for general operations?  This
>> last WAL would have changes to shared tables, as well as global stuff
>> like "create database" or "create tablespace".
>
> It would only be useful to have one per spindle-dedicated-to-WAL, so
> tying the division to databases doesn't seem like it'd be a good idea.

I think one-per-database would help if you had a very particular type of
application which had a lot of equally busy databases. In general to eliminate
the bottleneck I think you would need to be able to break them up by process.
So two processes writing to the same table would be able to write to different
WAL logs.

That sounds hard but I'm not sure. It may not be as bad as it sounds.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Mischa Sandberg
Quoting Tom Lane <[EMAIL PROTECTED]>:

> Mischa Sandberg <[EMAIL PROTECTED]> writes:
> > +   /* In a FreeBSD jail, you can't "kill -0" a
> postmaster
> > +* running in a different jail, so the shm seg
> might
> > +* still be in use. Safer to test nattch ?
> > +*/
> > +   if (kill(1,0) && errno == ESRCH &&
> PGSharedMemoryIsInUse(0,NextShmemSegID))
> > +   continue;
> 
> Isn't the last part of that test backward?  If it isn't, I don't
> understand what it's for at all.

Serious blush here. Yes.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Alvaro Herrera
Tom Lane escribió:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Tom Lane escribió:
> >> It would only be useful to have one per spindle-dedicated-to-WAL, so
> >> tying the division to databases doesn't seem like it'd be a good idea.
> 
> > Keep in mind that there are claims that a write-cache-enabled
> > battery-backed RAID controller negates the effect of a separate spindle.
> 
> Possibly true, but if that's the underlying hardware then there's no
> performance benefit in breaking WAL up at all, no?

Selective PITR shipping.

> > My point, rather, is that with this sort of setup it would be easier to
> > do per-database PITR shipping, and one database's WAL activity would not
> > affect another's (thus hosting providers are happier -- high-rate
> > customer A need not affect low-budget customer B).
> 
> You won't get far with that because of the shared catalogs.  In
> particular, most DDL operations these days touch pg_shdepend ...

That's why you log shared activity to another WAL stream, and ship that
to everyone, while the other databases' WAL streams are shipped only to
the interested slaves.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Tom Lane
Mischa Sandberg <[EMAIL PROTECTED]> writes:
> +   /* In a FreeBSD jail, you can't "kill -0" a postmaster
> +* running in a different jail, so the shm seg might
> +* still be in use. Safer to test nattch ?
> +*/
> +   if (kill(1,0) && errno == ESRCH && 
> !PGSharedMemoryIsInUse(0,NextShmSegID))
> +   continue;

Isn't the last part of that test backward?  If it isn't, I don't
understand what it's for at all.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] to_char incompatibility

2008-01-16 Thread Andrew Dunstan



Tom Lane wrote:

Peter Eisentraut <[EMAIL PROTECTED]> writes:
  
A further example shows that to_date seems to have little error checking 
altogether:



Yeah, that's been one of the main knocks on that code since day one.
Somebody needs to spend a whole lot of time on it, and the original
author has left the project ...


  


TODO list item?

cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane escribió:
>> It would only be useful to have one per spindle-dedicated-to-WAL, so
>> tying the division to databases doesn't seem like it'd be a good idea.

> Keep in mind that there are claims that a write-cache-enabled
> battery-backed RAID controller negates the effect of a separate spindle.

Possibly true, but if that's the underlying hardware then there's no
performance benefit in breaking WAL up at all, no?

> My point, rather, is that with this sort of setup it would be easier to
> do per-database PITR shipping, and one database's WAL activity would not
> affect another's (thus hosting providers are happier -- high-rate
> customer A need not affect low-budget customer B).

You won't get far with that because of the shared catalogs.  In
particular, most DDL operations these days touch pg_shdepend ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Transaction Snapshot Cloning

2008-01-16 Thread Tom Lane
"Florian G. Pflug" <[EMAIL PROTECTED]> writes:
> I seem to remember there being some pushback to the idea of changing the 
> semantics of "set transaction isolation read only" from "soft" to "hard" 
> semantics though - on the basis that it might break existing 
> applications. If that has changed (or my memory tricks me ;-) ) I'd 
> volunteer to create a patch for 8.4 to make "set transaction read only" 
> a hard constraint.

AFAICT it would violate the SQL spec as well as breaking backward
compatibility.  SQL99 4.32 saith

 An SQL-transaction has an access mode that is either read-only
 or read-write. The access mode may be explicitly set by a  before the start of an SQL-transaction or
 by the use of a  to start an SQL-
 transaction; otherwise, it is implicitly set to the default access
 mode for the SQL-session before each SQL-transaction begins. If no
  has set the default access
 mode for the SQL-session, then the default access mode for the SQL-
 session is read-write. The term read-only applies only to viewed
 tables and persistent base tables.

That last sentence is the basis for exempting temp tables from the
read-only restriction.

I'm not sure what the most convenient user API would be for an on-demand
hard-read-only mode, but we can't use SET TRANSACTION READ ONLY for it.
It'd have to be some other syntax.  Maybe just use a GUC variable
instead of bespoke syntax?  SET TRANSACTION is really just syntactic
sugar for GUC SET operations anyway ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] postgresql in FreeBSD jails: proposal

2008-01-16 Thread Mischa Sandberg
Here (@sophos.com) we run machine cluster tests using FreeBSD jails. A
jail is halfway between a chroot and a VM. Jails blow a number of
assumptions about a unix environment: sysv ipc's are global to all
jails; but a process can only "see" other processes also running in the
jail. In fact, the quickest way to tell whether you're running in a jail
is to test for process 1.

PGSharedMemoryCreate chooses/reuses an ipc key in a reasonable way to
cover previous postmasters crashing and leaving a shm seg behind,
possibly with some backends still running.

Unfortunately, with multiple jails running PG servers and (due to app
limitations) all servers having same PGPORT, you get the situation that
when jail#2 (,jail#3,...) server comes up, it:
- detects that there is a shm seg with ipc key 5432001
- checks whether the associated postmaster process exists (with kill -0)
- overwrites the segment created and being used by jail #1

There's a workaround (there always is) other than this patch, involving
NAT translation so that the postmasters listen on different ports, but
the outside world sees them each listening on 5432. But that seems
somewhat circuitous.

I've hacked sysv_shmem.c (in PG 8.0.9) to handle this problem. Given the
trouble that postmaster goes to, to stop shm seg leakage, I'd like to
solicit any opinions on the wisdom of this edge case. If this patch IS
useful, what would be the right level of compile-time restriction
("#ifdef __FreeBSD__" ???)

@@ -319,7 +319,8 @@
 
if (makePrivate)/* a standalone backend
shouldn't do this */
continue;
-
+   /* In a FreeBSD jail, you can't "kill -0" a postmaster
+* running in a different jail, so the shm seg might
+* still be in use. Safer to test nattch ?
+*/
+   if (kill(1,0) && errno == ESRCH &&
!PGSharedMemoryIsInUse(0,NextShmSegID))
+   continue;
if ((memAddress = PGSharedMemoryAttach(NextShmemSegID,
&shmid)) == NULL)
continue;   /* can't attach,
not one of mine */
 
End of Patch.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Alvaro Herrera
Tom Lane escribió:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Heikki Linnakangas escribi�:
> >> Another issue is that reading WAL is inherently not very scalable. There's 
> >> only one WAL for the whole cluster, and it needs to be read sequentially, 
> >> so it can easily become a bottleneck on large systems.
> 
> > I have wondered why do we do it this way.  Is there a problem with
> > having one WAL per database, and another for general operations?  This
> > last WAL would have changes to shared tables, as well as global stuff
> > like "create database" or "create tablespace".
> 
> It would only be useful to have one per spindle-dedicated-to-WAL, so
> tying the division to databases doesn't seem like it'd be a good idea.

Keep in mind that there are claims that a write-cache-enabled
battery-backed RAID controller negates the effect of a separate spindle.
Also, these days people is recommending keeping WAL in a mirrored disk,
so you would have to have a mirrored pair for every WAL stream, which
starts to sound unworkable.

My point, rather, is that with this sort of setup it would be easier to
do per-database PITR shipping, and one database's WAL activity would not
affect another's (thus hosting providers are happier -- high-rate
customer A need not affect low-budget customer B).

A totally separate consideration is that of LWLock contention.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Heikki Linnakangas escribió:
>> Another issue is that reading WAL is inherently not very scalable. There's 
>> only one WAL for the whole cluster, and it needs to be read sequentially, 
>> so it can easily become a bottleneck on large systems.

> I have wondered why do we do it this way.  Is there a problem with
> having one WAL per database, and another for general operations?  This
> last WAL would have changes to shared tables, as well as global stuff
> like "create database" or "create tablespace".

It would only be useful to have one per spindle-dedicated-to-WAL, so
tying the division to databases doesn't seem like it'd be a good idea.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Password policy

2008-01-16 Thread Roberts, Jon

> -Original Message-
> From: D'Arcy J.M. Cain [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 16, 2008 9:39 AM
> To: Andrew Dunstan
> Cc: Roberts, Jon; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Password policy
> 
> On Wed, 16 Jan 2008 08:32:12 -0500
> Andrew Dunstan <[EMAIL PROTECTED]> wrote:
> > >> I need to set a basic password policy for accounts but I don't see
> any
> > > Look at my chkpass type in contrib.  There is a function to verify the
> > > password.  It is just a placeholder now but you can modify it to do
> all
> > > your checking.
> >
> > I assumed he was asking about Postgres level passwords rather than
> > passwords maintained by an application. chkpass is only for the latter.
> 
> Could be.  I saw "accounts" and thought Unix shell or ISP accounts.
> 

I was referring to PostgreSQL accounts.


Jon

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Transaction Snapshot Cloning

2008-01-16 Thread Florian G. Pflug

Tom Lane wrote:

Chris Browne <[EMAIL PROTECTED]> writes:

Note that we required that the "provider transaction" have the
attributes IsXactIsoLevelSerializable and XactReadOnly both being
true, so we have the mandates that the resultant backend process:



a) Is in read only mode, and
b) Is in serializable mode.


If XactReadOnly were a "hard" read only constraint, that argument
might be worth the electrons it's written on.  I quote TFM:

Now I think someone was looking into a "hard" read only mode for
use in doing read-only queries against a PITR slave; if that
ever happens it might be adaptable to serve this purpose too.
But we haven't got it today.


That would haven been me then ;-)

I think that lazy xid assignment actually got us 90% of the way towards
a "hard" transaction read-only constraint - nearly all data-modfying 
operation surely depend on the xact having an xid assigned, no? (The 
only exception might be nextval() and friends).


I seem to remember there being some pushback to the idea of changing the 
semantics of "set transaction isolation read only" from "soft" to "hard" 
semantics though - on the basis that it might break existing 
applications. If that has changed (or my memory tricks me ;-) ) I'd 
volunteer to create a patch for 8.4 to make "set transaction read only" 
a hard constraint.


regards, Florian Pflug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Password policy

2008-01-16 Thread D'Arcy J.M. Cain
On Wed, 16 Jan 2008 08:32:12 -0500
Andrew Dunstan <[EMAIL PROTECTED]> wrote:
> >> I need to set a basic password policy for accounts but I don't see any
> > Look at my chkpass type in contrib.  There is a function to verify the
> > password.  It is just a placeholder now but you can modify it to do all
> > your checking.
> 
> I assumed he was asking about Postgres level passwords rather than 
> passwords maintained by an application. chkpass is only for the latter.

Could be.  I saw "accounts" and thought Unix shell or ISP accounts.

> ( Slightly OT - chkpass uses crypt(). Maybe that should be upgraded to 
> use md5 or some more modern hashing function. )

Yes, I have said many times that other encryption types could easily be
dropped in. It could even be changed to handle either as long as there
was some way to set the default.  However, these things haven't yet
been a requirement for me so I have not bothered yet.

-- 
D'Arcy J.M. Cain <[EMAIL PROTECTED]> |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] WAL logging of hash indexes

2008-01-16 Thread Kenneth Marshall
On Tue, Jan 15, 2008 at 07:18:17PM -0800, John Smith wrote:
> http://archives.postgresql.org/pgsql-hackers/2007-10/msg01468.php,
> 
> "... I very much want to encourage authors of new Resource Managers and it
> looks like we may be getting at least 3 new RMs that produce WAL
> records: hash indexes (currently not WAL-logged), bitmap indexes and
> clustered indexes for 8.4. We should be realistic that new bugs probably
> will occur in recovery code for existing and new RMs."
> 
> 
> 
> Can someone clarify if WAL-logging of hash indexes will be supported
> in the 8.4 release?
> 
> - John
> 
John,

You can check the mailing list archives to follow recent discussions
about addressing the performance problems of hash indexes in PostgreSQL.
I think that we have some good ideas that may pan out. If the performance
problems can be fixed, it is my goal to have support for WAL logged,
unique hash indexes ready for 8.4. But as the thread describes, we need
the performance first or doing the rest makes no sense if btree indexes
work better.

Cheers,
Ken

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] to_char incompatibility

2008-01-16 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> A further example shows that to_date seems to have little error checking 
> altogether:

Yeah, that's been one of the main knocks on that code since day one.
Somebody needs to spend a whole lot of time on it, and the original
author has left the project ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Gokulakannan Somasundaram
>
>
> I haven't been paying close attention to this thread, but there is a
> couple general issues with using the WAL for this kind of things. First
> of all, one extremely cool feature of PostgreSQL is that transaction
> size is not limited by WAL space, unlike on many other DBMSs. I think
> many of the proposed ideas of reading WAL would require us to keep all
> WAL available back to the beginning of the oldest running transaction.


Initially i thought this may be required. But the current idea is Vacuum is
going to maintain a DSM per relation and it will update it, once the WAL
segement is switched. so if the WAL logging is happening at segment 2, then
the first segment will be scanned to update the DSM.

>
> Another issue is that reading WAL is inherently not very scalable.
> There's only one WAL for the whole cluster, and it needs to be read
> sequentially, so it can easily become a bottleneck on large systems.


Let me try to understand what would become a problem here. We are going to
have only one process, which would open this WAL (one segment at a time) and
update the DSMs. The limitation would be that we should have completed
reading the log before the WAL segment round-up. What else do you think
would be the problem?

Thanks,
Gokul.


[HACKERS] Thick indexes - a look at count(1) query

2008-01-16 Thread Gokulakannan Somasundaram
Hi,
   I have submitted a new patch against thick indexes(indexes with snapshot)
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php.

I did look closely at improving the performance of count(1) queries. It
worked well, when we are selecting a sub-set of the result-set. But when i
did a query like "select count(1) from table", it showed a improvement in
response time, but not to the extent, i wanted it to be. Let's have a look
at the stats.

gokul=# explain analyze select count(1) from dd;
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!   0.277208 elapsed 0.275457 user 0.000148 system sec
!   [1.128422 user 0.004976 sys total]
!   0/0 [0/0] filesystem blocks in/out
!   0/0 [0/0] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent
!   0/39 [5/160] voluntary/involuntary context switches
! buffer usage stats:
!   Shared blocks:   1024 Logical Reads,  0 Physical
Reads,  0 written, buffer hit rate = 100.00%
!   Local  blocks:  0 read,  0 written, buffer
hit rate = 0.00%
!   Direct blocks:  0 read,  0 written
STATEMENT:  explain analyze select count(1) from dd;
QUERY
PLAN
--
 Aggregate  (cost=7323.10..7323.11 rows=1 width=0) (actual time=
276.838..276.838 rows=1 loops=1)
   ->  Index Only Scan using idx on dd  (cost=0.00..6741.42 rows=232671
width=0) (actual time=0.042..160.753 rows=232679 loops=1)
 Total runtime: 276.928 ms
(3 rows)

gokul=# set enable_indexscan=off;
SET

gokul=# explain analyze select count(1) from dd;
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!   0.331441 elapsed 0.258903 user 0.067953 system sec
!   [1.906069 user 0.211479 sys total]
!   0/0 [0/0] filesystem blocks in/out
!   0/0 [0/0] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent
!   0/39 [9/312] voluntary/involuntary context switches
! buffer usage stats:
!   Shared blocks:   5223 Logical Reads,   4391 Physical
Reads,  0 written, buffer hit rate = 15.93%
!   Local  blocks:  0 read,  0 written, buffer
hit rate = 0.00%
!   Direct blocks:  0 read,  0 written
STATEMENT:  explain analyze select count(1) from dd;
   QUERY
PLAN
-
 Aggregate  (cost=8131.39..8131.40 rows=1 width=0) (actual time=
331.075..331.076 rows=1 loops=1)
   ->  Seq Scan on dd  (cost=0.00..7549.71 rows=232671 width=0) (actual
time=0.042..203.958 rows=232679 loops=1)
 Total runtime: 331.167 ms
(3 rows)

I have run the query multiple times and it shows the response time, around
what is shown here
The table is just a multiple copy of pg_class table( approx 20 rows). As
it can be seen, the Logical reads show a ratio of 1:5, but the  response
time is not in the same ratio. I tried to profile and couldn't find anything
significant. Eventhough it shows 4391 physical reads, that's from OS cache,
since i ave already run the query multiple times.

One more disadvantage with using select count(1) using index scan is that,
it pollutes the shared memory, unlike full-table scans. But something can be
done in the regard.



Any thoughts?

Thanks,
Gokul.


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Alvaro Herrera
Heikki Linnakangas escribió:

> Another issue is that reading WAL is inherently not very scalable. There's 
> only one WAL for the whole cluster, and it needs to be read sequentially, 
> so it can easily become a bottleneck on large systems.

I have wondered why do we do it this way.  Is there a problem with
having one WAL per database, and another for general operations?  This
last WAL would have changes to shared tables, as well as global stuff
like "create database" or "create tablespace".

Of course, it means a lot more files, and a PITR setup is a bit more
complex.

One obvious problem is that it is no longer true that you have a "no
seek" disk head.  But is there much use of that, these days?  People
have either a big RAID on which the WAL resides along all data; or, on
low-cost systems, the whole thing is in a single disk or a small RAID.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Password policy

2008-01-16 Thread Andrew Dunstan



D'Arcy J.M. Cain wrote:

On Tue, 15 Jan 2008 16:11:16 -0600
"Roberts, Jon" <[EMAIL PROTECTED]> wrote:
  

I need to set a basic password policy for accounts but I don't see any
documentation on how to do it.  I'm assuming there is a way to do this,
maybe even with a trigger.

The policy would be something like this:
1.  Must contain letters and numbers
2.  Must be at least 8 characters long
3.  Must contain one special character (#,@,$,%,!, etc)
4.  Password (not the account) must expire after 90 days
5.  Must warn users 10 days before the expire to change the password



Look at my chkpass type in contrib.  There is a function to verify the
password.  It is just a placeholder now but you can modify it to do all
your checking.

  


I assumed he was asking about Postgres level passwords rather than 
passwords maintained by an application. chkpass is only for the latter.


( Slightly OT - chkpass uses crypt(). Maybe that should be upgraded to 
use md5 or some more modern hashing function. )


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Declarative partitioning grammar

2008-01-16 Thread Zeugswetter Andreas ADI SD
 
> > Personally I find the automatic partition idea intriguing, where you
> > only have to choose an expression that equates to one value (value
> > group) per partition (and possibly a way to derive a 
> partition name).
> 
> IMO, better go right to a fully automated approach. Or why would you 
> need partition names in such a case?

Yes, I tend to aggree on that, with the exception that I think the
partition borders should be declarative expressions (equal in syntax to
a GROUP BY clause). 

Names are only for a short and crisp way to identify the partition for
the following operations:
- drop/detach data in a partition (as opposed to a normal delete)
- move to/create in other tablespace (I don't value that one, but others
have)
- reorg, create index

The drop can probably be done straight from a normal delete (a problem
is RI and triggers though).
The normal delete would need to have the smarts, that a delete covers a
whole partition, and thus mark the whole partition dead instead of it's
individual rows.

A detach would need some extra syntax, but could also be based on a
where clause that specifies which partitions are to be detached. I am
not so sure about how to usefully do the reorg part with where clauses.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:


Well, one of the principal arguments for having VACUUM at all is that it
off-loads required maintenance effort from foreground transaction code
paths.  I'm not really going to be in favor of solutions that put more
work into the transaction code paths (HOT already did more of that than
I would like :-().  OTOH, I agree that scanning the WAL log doesn't
really sound like something well-matched to this problem either.



Tom, Don't you like the idea of building some more structures around WAL,
like Asynchronous Materialized views. Indexes, if implemented as  stated,
would remove the HOT code in the path of the transaction(as you may know).
I am also slightly doubtful of the argument, that doing full-table scans and
full index scans for Vacuum is efficient. Can you please advise me on why we
should not use a read only operation on WAL log ?


I haven't been paying close attention to this thread, but there is a 
couple general issues with using the WAL for this kind of things. First 
of all, one extremely cool feature of PostgreSQL is that transaction 
size is not limited by WAL space, unlike on many other DBMSs. I think 
many of the proposed ideas of reading WAL would require us to keep all 
WAL available back to the beginning of the oldest running transaction.


Another issue is that reading WAL is inherently not very scalable. 
There's only one WAL for the whole cluster, and it needs to be read 
sequentially, so it can easily become a bottleneck on large systems.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Postgresql Materialized views

2008-01-16 Thread Heikki Linnakangas

Merlin Moncure wrote:

On Jan 12, 2008 4:19 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:

On Sat, 2008-01-12 at 13:27 +0100, Jean-Michel Pouré wrote:


Please pick-up this important issue for developpers. There is no need to
concentrate on complex issues, when handling materialized views could
boost somme web apps. by a factor of 10 or more.

It's more complex than you think, but the main reason was that HOT was a
prerequisite for making summary tables work efficiently, which is only
now just about to go live into 8.3


+1


I don't quite agree with that. HOT certainly speeds up UPDATEs on small 
tables, like you a summary table, but there's a lot of use cases like 
data warehousing, where the summary tables are not updated that often 
for the updates to become a bottleneck.



If you know how to write triggers, materialization techniques aren't
all that difficult.  The real technical limitation was not lack of
materialization techniques (write triggers), but was dealing with the
mvcc penalty.  Previously to HOT, for summary tables I would redirect
the trigger to insert to a 'roll up' table and move the data to the
summary on cron or from an application event.

Materialized views are syntax sugar (but still very sweet).


There's two things involved in materialized views:

1. Automatically updating the materialized view, when the tables change. 
This can be done with triggers, right now, but requires quite a bit of 
manual work to set up, especially with more complex views.


2. Using the materialized views to speed up existing queries. For 
example, if you have a materialized view on "SELECT COUNT(*) FROM foo", 
and someone issues the query "SELECT COUNT(*) FROM foo", the planner 
should automatically use the view to satisfy that.


1 is syntactic sugar, but 2 isn't.

These are orthogonal features. Implementing just 1 without 2 would still 
be very useful, and in fact that seems to be what most people mean by 
materialized views.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] to_char incompatibility

2008-01-16 Thread Peter Eisentraut
Am Donnerstag, 10. Januar 2008 schrieb Roberts, Jon:
> > On PostgreSQL:
> >
> > select to_date('31-DEC-200700:00:00', 'dd-mon- hh24:mi:ss');
> >to_date
> > --
> >  200700-12-31

> Oracle removes all white spaces in the date you pass in and the date
> format.

I don't have a strong opinion on the whitespace handling, but then I wonder

1. If I put four , why does it create a six-digit year?

2. If it does create a six digit year, the rest of the pattern doesn't match 
anymore, so it should error.

A further example shows that to_date seems to have little error checking 
altogether:

select to_date('17.12.1978', '-MM-DD');
  to_date

 0017-12-19

That can't possibly be a good idea, in the interest of the robustness of 
applications built on this.

select to_date('whatever', 'foobar');
to_date
---
 0001-01-01 BC

Yah.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Array behavior oddities

2008-01-16 Thread Jens-Wolfhard Schicke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Bruce Momjian wrote:
>   An array subscript expression will return null if either the array itself
>   or any of the subscript expressions are null.  Also, null is returned
>   if a subscript is outside the array bounds (this case does not raise an
>   error).  For example, if schedule currently has the
>   dimensions [1:3][1:2] then referencing
>   schedule[3][3] yields NULL.  Similarly, an array reference
>   with the wrong number of subscripts yields a null rather than an error.
> 
>   An array slice expression likewise yields null if the array itself or
>   any of the subscript expressions are null.  However, in other corner
>   cases such as selecting an array slice that is completely outside the
>   current array bounds, a slice expression yields an empty
>   (zero-dimensional) array instead of null.  If the requested slice
>   partially overlaps the array bounds, then it is silently reduced to just
>   the overlapping region.
> 
> Is there a reason out-of-bounds array accesses behave differently for
> slices and non-slices?
> 
> Having slices and non-slices behave differently is very confusing to me.
I think the case of partially-out-of-bound slices is a good reason to have
this difference:

fastgraph=# select ('{foo,bar}'::text[])[1:2];
   text
- ---
 {foo,bar}
(1 row)

fastgraph=# select ('{foo,bar}'::text[])[2:3];
 text
- ---
 {bar}
(1 row)

fastgraph=# select ('{foo,bar}'::text[])[3:4];
 text
- --
 {}
(1 row)

We cannot return an empty array in case of unsliced out-of-bounds access
because the type wouldn't match at all.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHjdsOzhchXT4RR5ARAvBvAKCGVxgl6u2ZUcB/Bvl2jPN2/p6hzACdFXE3
9w01URr/xPYukzHhD5qhudE=
=iZxq
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] WAL logging of hash indexes

2008-01-16 Thread Gokulakannan Somasundaram
>
> There is absolutely 0 value in tackling that until someone can fix
> hash's performance problems.  If there is no real-world scenario for
> using it ... which there really isn't ... then adding WAL support
> still leaves you with no real-world scenario for using it.
>
> This is not to suggest that I wouldn't like to see all of that fixed;
> I would.  But let's concentrate on the showstoppers first, rather than
> expending effort that might ultimately be a waste.
>
>
> I agree on that. I think working on Hash clusters would add more use-cases
than working on hash-indexes.

Thanks,
Gokul.


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Gokulakannan Somasundaram
>
>
> Well, one of the principal arguments for having VACUUM at all is that it
> off-loads required maintenance effort from foreground transaction code
> paths.  I'm not really going to be in favor of solutions that put more
> work into the transaction code paths (HOT already did more of that than
> I would like :-().  OTOH, I agree that scanning the WAL log doesn't
> really sound like something well-matched to this problem either.
>

Tom, Don't you like the idea of building some more structures around WAL,
like Asynchronous Materialized views. Indexes, if implemented as  stated,
would remove the HOT code in the path of the transaction(as you may know).
I am also slightly doubtful of the argument, that doing full-table scans and
full index scans for Vacuum is efficient. Can you please advise me on why we
should not use a read only operation on WAL log ?

Thanks,
Gokul.


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Gokulakannan Somasundaram
One more application of the same is Asynchronous Materialized views. I hope
you agree that the asynchronous materialized views have to get updated only
through WAL. If WAL can be used for that purpose, why can't we multiplex it?

Thanks,
Gokul.


Re: [HACKERS] Some ideas about Vacuum

2008-01-16 Thread Gokulakannan Somasundaram
Hi,

Please find my answers inline

>
> Do you have evidence of that contention being so worse, that it
> justifies the additional WAL reading from disk? (Assuming no WAL
> archiving).

On a broader sense, DSM is a bitmap index with some optimization that has
been placed to make the updates more effective. As you may know, the design
of Bitmap index doesn't scale very well with concurrency. If you put more
information into a little space, then i feel it might affect concurrency.
Let us discuss it in detail.
DSM, i believe plans to achieve the following objectives,
a) To find out the blocks, which are to be Vacuumed
b) To find out the blocks, where freezing is required
c) To find out the blocks which are visible to everyone.

The DSM might get split into multiple maps like Visibility maps(already
proposed by Heikki), Vacuum Maps and Freezing maps.  When the inserts
happen, the map has to get extended and it has to lock the block to extend
the map. Say if the DSM block corresponds to some 60K data blocks. Then any
updates / deletes happening over those blocks have to wait for that time.
This is just an example, which i can think of off-hand. May be the people,
who are implementing might throw more light on the synchronization points.

>
> IMO we can get about any granularity we want for DSM update locking,
> depending on how we arrange the DSM bits.

I can't understand this exactly.


> > Since Vacuum process is going to
> > have much more information on what has happened in the database,
>
> Why should that be? IMO, collecting the information at transaction time
> can give you exactly the same information, if not more or better
> information.


My argument is if we have collected that information in WAL, why should we
collect it again and again?


> > it is
> > possible for some new structures. For example i have been thinking of
> > changing our current index structure in such a way, it won't hold any
> > duplicate tuples for different versions of data. Whenever there is a
> > update, only the indexes relevant to the columns changed will get
> > updated. The Vacuum has to play the role of changing the tid, the index
> > tuple points to, whenever it vacuums a older version.
>
> Huh? The index would then point to the old tuple only, until a VACUUM
> comes by, right. How are following transactions expected to find the new
> tuple before that VACUUMing?

You are right. We have already discusses about this. In the Vacuum aproach,
we travel front in time. We catch the oldest transaction and go to the new
transaction, by following the ctid in the old tuple. In the undo log
approach, it is the reverse. We go to the latest transaction and travel back
in time. Its interesting to see, how theory of relativity has got applied in
database science right?

So say we have 'n' versions of the same data in index. Right now we have 'n'
index tuples which point to 'n' block in heap. we would read all the 'n'
index tuples and go to all the versions of data in the table. If this
changes, there will be one index tuple, which would point to the oldest heap
tuple and from there we will navigate to all the new tuples. The advantage
is obvious, the index is going to have lesser size and the updates will not
update indexes, unless the data in it has got changed.

Hope i was clear. Please revert back, in case i am not clear.

Thanks,
Gokul.