date:20071012

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Deblauwe Gino


Simon Riggs schreef:

On Fri, 2007-10-12 at 07:17 +0100, Simon Riggs wrote:
  

On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:


Michael Paesold escribió:
  

Simon Riggs wrote:

Hmm, I am not sure we are there, yet. Autovacuum does take extra care to 
vacuum tables nearing xid wrap-around, right? It even does so when 
autovacuum is disabled in the configuration.


So in case a vacuum is needed for that very reason, the vacuum should *not* 
be canceled, of course. So we don't really need the information, whether 
the AV worker is doing VACUUM or ANALYZE, but whether it is critical 
against xid wrap-around. Could that be done as easily as in Alvaro's patch 
for distinguishing vacuum/analyze? Alvaro?


Yes, I think it is easy to mark the "is for xid wraparound" bit in the
WorkerInfo struct and have the cancel work only if it's off.

However, what I think should happen is that the signal handler for
SIGINT in a worker for xid wraparound should not cancel the current
vacuum.  Instead turn it into a no-op, if possible.  That way we also
disallow a user from cancelling vacuums for xid wraparound.  I think he
can do that with pg_cancel_backend, and it could be dangerous.
  

I think that is dangerous too because the user may have specifically
turned AV off. That anti-wraparound vacuum might spring up right in a
busy period and start working its way through many tables, all of which
cause massive writes to occur. That's about as close to us causing an
outage as I ever want to see. We need a way through that to allow the
user to realise his predicament and find a good time to VACUUM. I never
want to say to anybody "nothing you can do, just sit and watch, your
production system will be working again in no time. Restart? no that
won't work either."



I think the best way to handle this is to have two limits.

First limit attempts to autovacuum, but can be cancelled.

When we hit second limit, sometime later, then autovacuum cannot be
cancelled.

That would give us a breathing space if we need it.

  

Just a few thoughts:

1) In the postgresql.conf you can define if you use autovacuum. 
You make a parameter that states a time of day. 
If autovacuum is canceled once and not performed manually before that time,
then it executes at that time (or just after the next system restart 
after that time). 
So you ensure that it isn't delayed indefinitely and you execute it on a 
time the database is normally not under a heavy load.
As a standard value you could take 2am in the morning or so. 

2) I you can cancel an autovacuum that way, could you prevent it by a 
statement to start executing in the first
place, and then restart execution by another statement.  There are a few 
situations where vacuuming is entirely pointless


Example:
a) Everyone logs out, upgradeprocedure of db is started
b) drop indexes
c) add tables/change tables/add columns/change columns
d) convert data
e) drop tables/drop columns
f) add indexes
g) vacuum full analyze
h) Everyone starts new app

BTW: I like pg83, allready looking for implementation when it hits the 
shelves...

Re: [HACKERS] Release notes introductory text

2007-10-12 Thread Magnus Hagander

On Fri, Oct 12, 2007 at 07:51:13AM +0100, Simon Riggs wrote:
> On Thu, 2007-10-11 at 17:46 -0400, Tom Lane wrote:
> > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > > Kevin Grittner wrote:
> > >> If the goal is to provide fair warning of a high-than-usual-risk
> > >> release, you've got it covered.
> > 
> > > No, that was not the intent. The indent was to say we got a lot done in
> > > one year.  You have a suggestion?
> > 
> > Yeah: take the entire paragraph out again.  I concur with Neil that
> > it's nothing but marketing fluff, and inaccurate fluff at that.
> 
> I think it is important that we are up beat about what we have achieved,
> but perhaps we should avoid opinions in the release notes.
> 
> Perhaps it should be part of the press release?

It certainly sounds a lot more like press release than release notes to me...

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Some questions about mammoth replication

2007-10-12 Thread Alexey Klyukin

Hannu Krosing wrote:

> > We have hooks in executor calling our own collecting functions, so we
> > don't need the trigger machinery to launch replication.
> 
> But where do you store the collected info - in your own replication_log
> table, or do reuse data in WAL you extract it on master befor
> replication to slave (or on slave after moving the WAL) ?

We don't use either a log table in database or WAL. The data to
replicate is stored in disk files, one per transaction. As Joshua said,
the WAL is used to ensure that only those transactions that are recorded
as committed in WAL are sent to slaves.

> 
> > > Do you make use of snapshot data, to make sure, what parts of WAL log
> > > are worth migrating to slaves , or do you just apply everything in WAL
> > > in separate transactions and abort if you find out that original
> > > transaction aborted ?
> > 
> > We check if a data transaction is recorded in WAL before sending
> > it to a slave. For an aborted transaction we just discard all data 
> > collected 
> > from that transaction.
> 
> Do you duplicate postgresql's MVCC code for that, or will this happen
> automatically via using MVCC itself for collected data ?

Every transaction command that changes data in a replicated relation is
stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
when transaction commands are replayed there.

> 
> How do you handle really large inserts/updates/deletes, which change say 10M 
> rows in one transaction ?

We produce really large disk files ;). When a transaction commits - a
special queue lock is acquired and transaction is enqueued to a sending
queue. Since the locking mode for that lock is exclusive a commit of a
very large transaction would delay commits of other transactions until
the lock is held. We are working on minimizing the time of holding this
lock in the new version of Replicator.

> > > Do you extract / generate full sql DML queries from data in WAL logs, or
> > > do you apply the changes at some lower level ?
> > 
> > We replicate the binary data along with a command type. Only the data
> > necessary to replay the command on a slave are replicated.
> 
> Do you replay it as SQL insert/update/delete commands, or directly on
> heap/indexes ?

We replay the commands directly using heap/index functions on a slave.

Regards,

-- 
Alexey Klyukin http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-12 Thread Heikki Linnakangas

Gokulakannan Somasundaram wrote:
>  Last two mails were sent by mistake without completion. I couldn't
> curse my system any further

:-)

> a) Dead Space, if it is successfull in its implementation of what it claims,
> will have the means to point out that all the tuples of certain chunks are
> frozen for registered relations and registered chunks. There would be lot of
> blocks which won't fall under this category.
>   i) For example, if the records are newly inserted, that block
> can't be marked as containing all frozen tuples.

If records have just been inserted to a block, it is in cache. Therefore
hitting that block to check visibility isn't going to cost much. There
might be some middle-ground where a tuple has been inserted a while ago,
so that the block has already been evicted from cache, but the
transaction hasn't yet been committed, but that's a pretty narrow use case.

Note that we can flag a page in the DSM not only by VACUUM, but by any
backend as soon as all tuples are visible to everyone. You do have to
scan the tuple headers on the page to determine that, but that's not so
much overhead, and it could be offloaded to the bgwriter.

>  ii) Imagine the case where there is a batch job / Huge select
> query running in a enterprise for more than 6hrs. All the blocks which have
> got inserted into the tables, during this period might not be able to get
> the advantage of DSM

Yep, true. A long-running transaction like that is problematic anyway,
because we can't vacuum away any dead rows generated during that period.

> iii) Imagine the case for which i am actually proposing the
> Index with Snapshot infos. Partitioned tables. Every time a new table gets
> created, it has to get registered into the Deadspace. This requires more
> maintenance on the DBA Side

Why do you think that the DBA needs to register tables to the DSM
manually? Surely that would happen automatically.

> iv) I understand the DeadSpaceLock to be a Global lock(If one
> transaction is updating the dead space for any unregistered chunk, no one
> else can query the DeadSpace). If my statement is right, then partitioned
> tables might not be able to benefit from DSM. We have to remember for tables
> with daily partitions, this would prove to be a nightmare

The patch submitted for 8.3 did use a global lock, and a fixed size
shared memory area, but those were exactly the reasons the patch was
rejected. It will have to be reworked for 8.4.

> Other than that there are lot of advantages, i foresee with including the
> indexes with snapshots
> i) Vacumming of these indexes need not be done with SuperExclusive Locks. It
> is possible to design a strategy to vacuum these indexes with Exclusive
> locks on pages

I'm not convinced that's true. We only need super-exclusive locks on
index pages for interlocking index and heap accesses with non-MVCC
snapshots, IOW system tables. And since the lock is only held for a
short time and infrequently, it hasn't been a problem at all.

> ii) The above would mean that index can be in operation while the vacuum is
> happening

Huh? VACUUM hasn't locked out other access since version 7.2!

> iii) As we have already stated, it provides a efficient clustering of
> related data.

Sorry, I missed that part. What's that?

> iv) The Reverse Mapping Index, if present provides an efficient solution to
> the Retail Vacuum problem. So HOT can be improved further with no need to
> place the restriction of the updated tuple should be in the same page
> iv) Updates on tables with primary keys(where primary key is not updated),
> will be able to resolve the unique constraint faster. This is a minor
> advantage.
> 
> The complexity of Reverse Mapping index will only be there for user-defined
> functional indexes.

The *run-time* complexity of that will only be there for UDF indexes,
but the *code* complexity will always be there. Sorry, I think the
probability of a reverse mapping index being accepted to Postgres is
very close to zero.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Michael Paesold


Simon Riggs wrote:

I think the best way to handle this is to have two limits.

First limit attempts to autovacuum, but can be cancelled.

When we hit second limit, sometime later, then autovacuum cannot be
cancelled.

That would give us a breathing space if we need it.


Sounds quite reasonable.

Best Regards
Michael Paesold

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-12 Thread Andreas Joseph Krogh

Will $SUBJECT make it possible for count(*) to use index? This is a much 
wanted feature.

-- 
Andreas Joseph Krogh <[EMAIL PROTECTED]>
Senior Software Developer / Manager
+-+
OfficeNet AS| The most difficult thing in the world is to |
Karenslyst Allé 11  | know how to do a thing and to watch |
PO. Box 529 Skøyen  | somebody else doing it wrong, without   |
0214 Oslo   | comment.|
NORWAY  | |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] pg_restore oddity?

2007-10-12 Thread Heikki Linnakangas

Mario Weilguni wrote:
> I cannot use "-1" for performance, because some gist stuff has changed
> and the restore fails. But there seems to be no option for pg_restore to
> use transactions for data restore, so it's very very slow (one million
> records, each obviously in it's own transaction - because a separate
> session "select count(1) from logins" shows a growing number).

By default, pg_dump/pg_restore uses a COPY command for each table, and
each COPY executes as a single transaction, so you shouldn't see the row
count growing like that. Is the dump file in --inserts format?

> It would be nice to use transactions for the data stuff itself, but not
> for schema changes or functions. I know I can use separate pg_restore
> runs for schema and data, but it's complicated IMHO.

pg_restore -s foo
pg_restore -a -1 foo

doesn't seem too complicated to me. Am I missing something?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Martijn van Oosterhout

On Fri, Oct 12, 2007 at 02:03:47PM +0100, Gregory Stark wrote:
> This approach doesn't address that but I don't think it makes the problems
> there any worse either. That is, I think already have these problems around
> shared tables. 

Or we could just setup encodings/locales per column and the problem
goes away entirely. Most of the code's already been written, it's not
even terribly difficult. Where we're stuck is that we can't agree on a
source of locale data. People don't want the ICU or glibc data and
there's no other source as readily available.

Perhaps we should fix that problem, rather than making more
workarounds.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Gregory Stark

"Peter Eisentraut" <[EMAIL PROTECTED]> writes:

> Am Freitag, 12. Oktober 2007 schrieb Gregory Stark:
>> . when creating a new database from a template the new locale and encoding
>>   must be identical to the template database's encoding and locale. Unless
>> the template is template0 in which case we rebuild all indexes after
>> copying.
>
> Why would you restrict the index rebuilding only to this particular case?  It 
> could be done for any database.

Well there's no guarantee there isn't 8-bit data in other databases which
would be invalid in the new encoding. I think it's reasonable to assume
there's only 7-bit ascii in template0 however. 

An alternative would be introducing an ASCII7 encoding which template0 would
use and any other database in that encoding could be used as a template for
any encoding. However that would still require index rebuilds which would
potentially take a long time. Another alternative would be recoding all the
data from the template database encoding to the new encoding and throwing an
error if a non-encodable character is found.

I think it's a lot simpler to just declare it a non-problem by saying there
won't be any non-ascii text in template0.

> The other issue are shared catalogs.

This approach doesn't address that but I don't think it makes the problems
there any worse either. That is, I think already have these problems around
shared tables. 

. If you have two databases with locales that don't agree then the indexes on
  those tables won't function properly.

. What happens if you create a user while connected to a latin1 database with
  an é in his username and then connect to a database in a UTF8 database? That
  username is now an invalidly encoded UTF8 string.

Perhaps we should be using pattern_ops for the indexes on the shared tables?
Or using bytea with UTF8 encoded strings instead of name and text? That
actually sounds reasonable now that we have convert() functions which take and
generate bytea, at least for the text fields like in pltemplate -- less so for
the name columns.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] pg_restore oddity?

2007-10-12 Thread Florian G. Pflug


Heikki Linnakangas wrote:

Mario Weilguni wrote:

I cannot use "-1" for performance, because some gist stuff has changed
and the restore fails. But there seems to be no option for pg_restore to
use transactions for data restore, so it's very very slow (one million
records, each obviously in it's own transaction - because a separate
session "select count(1) from logins" shows a growing number).


By default, pg_dump/pg_restore uses a COPY command for each table, and
each COPY executes as a single transaction, so you shouldn't see the row
count growing like that. Is the dump file in --inserts format?


It would be nice to use transactions for the data stuff itself, but not
for schema changes or functions. I know I can use separate pg_restore
runs for schema and data, but it's complicated IMHO.


pg_restore -s foo
pg_restore -a -1 foo

doesn't seem too complicated to me. Am I missing something?


Doesn't pg_restore create the indices *after* loading the data if you let it 
restore the schema *and* the data in one step? The above workaround would 
disable that optimization, thereby making the data-restore phase much more costly.


Now that I think about it, I remember that I've often whished that we not only 
had --schema-only and --data-only, but also --schema-unconstrained-only and 
--constraints-only.


regards, Florian Pflug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Magnus Hagander

Tom Lane wrote:
> Magnus Hagander <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> * Just remove the above-quoted lines.  Superusers should be allowed to
>>> shoot themselves in the foot.  (I'm not actually sure that there would
>>> be any bad consequences from putting an ordinary table into pg_global
>>> anyway.
> 
>> Is there ever *any* reason for doing this?
> 
> Probably not a good one, and I suspect there would be some funny
> misbehaviors if you were to clone the database containing the table.
> The table would be physically shared but logically not.

yuck.


> What I'm inclined to do about it is is adopt my suggestion #2 (move the
> location of the defense), since "permission denied" for a superuser is
> a pretty unhelpful error message anyway.

Ok. Works for me.


>>> * Decide that we should allow anyone to do pg_tablespace_size('pg_global')
>>> and put in a special wart for that in dbsize.c.  This wasn't part of
>>> the original agreement but maybe there's a case to be made for it.
> 
>> That's pretty much the same thing, right?
> 
> Well, no, I was suggesting that we might want to special-case pg_global
> as a tablespace that anyone (superuser or no) could get the size of.
> This is actually independent of whether we change the aclmask behavior.

Oh, ok, I see. Then my vote is for the other solution = not allowing
anybody to do this.

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Tom Lane

Magnus Hagander <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> * Just remove the above-quoted lines.  Superusers should be allowed to
>> shoot themselves in the foot.  (I'm not actually sure that there would
>> be any bad consequences from putting an ordinary table into pg_global
>> anyway.

> Is there ever *any* reason for doing this?

Probably not a good one, and I suspect there would be some funny
misbehaviors if you were to clone the database containing the table.
The table would be physically shared but logically not.

What I'm inclined to do about it is is adopt my suggestion #2 (move the
location of the defense), since "permission denied" for a superuser is
a pretty unhelpful error message anyway.

>> * Decide that we should allow anyone to do pg_tablespace_size('pg_global')
>> and put in a special wart for that in dbsize.c.  This wasn't part of
>> the original agreement but maybe there's a case to be made for it.

> That's pretty much the same thing, right?

Well, no, I was suggesting that we might want to special-case pg_global
as a tablespace that anyone (superuser or no) could get the size of.
This is actually independent of whether we change the aclmask behavior.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Magnus Hagander

Tom Lane wrote:
> I wrote:
>> [ squint... ]  There is something wrong here, because a superuser should
>> certainly pass the aclcheck test.  I don't know where the bug is but
>> this is not the correct fix.
> 
> OK, after looking, the issue is this wart in pg_tablespace_aclmask():
> 
> /*
>  * Only shared relations can be stored in global space; don't let even
>  * superusers override this
>  */
> if (spc_oid == GLOBALTABLESPACE_OID && !IsBootstrapProcessingMode())
> return 0;
> 
> /* Otherwise, superusers bypass all permission checking. */

Yup, that was my point.


> There are a number of ways that we could deal with this:
> 
> * Just remove the above-quoted lines.  Superusers should be allowed to
> shoot themselves in the foot.  (I'm not actually sure that there would
> be any bad consequences from putting an ordinary table into pg_global
> anyway.  I think I wrote the above code in fear that some parts of the
> system would equate reltablespace = pg_global with relisshared, but
> AFAICS that's not the case now.)

Is there ever *any* reason for doing this?
If there isn't, I don't think we should provide just that foot-gun. But
if there is any case where it makes sense to do that, then the superuser
should probably be allowed to do it.


> * Remove the above lines and instead put a defense into heap_create.
> This might be better design anyway since a more specific error could
> be reported.
> 
> * Leave aclchk.c as-is and apply Magnus' patch to allow superusers
> to bypass the check in pg_tablespace_size.

See foot-gun above. If we want to keep the check, I think that my patch
is fine. If we don't, then taking out that code is better.


> * Decide that we should allow anyone to do pg_tablespace_size('pg_global')
> and put in a special wart for that in dbsize.c.  This wasn't part of
> the original agreement but maybe there's a case to be made for it.

That's pretty much the same thing, right? Since the acl check will check
for pg_global, and if it's anything else, let superuser in. It's gotta
be easier to read if it's just a plain superuser check, I think.

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Simon Riggs

On Fri, 2007-10-12 at 13:51 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Can you explain further what you meant by "don't disable manual
> > cancels".
> 
> I meant that pg_cancel_backend() should still work on autovac workers,
> contrary to Alvaro's suggestion that autovac workers should sometimes
> ignore SIGINT.
> 
> Basically the implementation vision I have is that the SIGINT catcher in
> an autovac worker should remain stupid, and any intelligence involved
> should be on the side where we're deciding whether to send a signal or
> not.  This probably does involve exposing more state in PGPROC but I see
> nothing much wrong with that.  (It might be time to merge inVacuum,
> isAutovacuum, and the additional state into a bitwise vacuumFlags field.)

Gotcha

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Release notes introductory text

2007-10-12 Thread Ron Mayer

Tom Lane wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
>> With respect to you Kevin, your managers should wait. You don't
>> install .0 releases of "any" software into production without "months"
>> of testing. At which point, normally a .1 release has come out anyway.
> 
> How exactly do you expect the software to get from a .0 to a .1 release,
> or to have addressed the bugs that might bite you when it does get to .1,
> if you aren't helping to test it?

In most environments I've seen, developer and QA systems don't hesitate
to move to .0 releases (or even beta).   I agree with Joshua that it's
nerve wracking to move _production_ systems to .0 releases from most
software vendors.

> Now I realize that you did say "test" above, but way too often I see
> this sort of argument as a justification for doing nothing and expecting
> somebody else to fix it.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> Can you explain further what you meant by "don't disable manual
> cancels".

I meant that pg_cancel_backend() should still work on autovac workers,
contrary to Alvaro's suggestion that autovac workers should sometimes
ignore SIGINT.

Basically the implementation vision I have is that the SIGINT catcher in
an autovac worker should remain stupid, and any intelligence involved
should be on the side where we're deciding whether to send a signal or
not.  This probably does involve exposing more state in PGPROC but I see
nothing much wrong with that.  (It might be time to merge inVacuum,
isAutovacuum, and the additional state into a bitwise vacuumFlags field.)

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Florian G. Pflug


Tom Lane wrote:

Peter Eisentraut <[EMAIL PROTECTED]> writes:

Am Freitag, 12. Oktober 2007 schrieb Gregory Stark:

It would make Postgres inconsistent and less integrated with the rest of
the OS. How do you explain that Postgres doesn't follow the system's
configurations and the collations don't agree with the system collations?


We already have our own encoding support (for better or worse), and I don't 
think having one's own locale support would be that much different.


Well, yes it would be, because encodings are pretty well standardized;
there is not likely to be any user-visible difference between one
platform's idea of UTF8 and another's.  This is very very far from being
the case for locales.  See for instance the recent thread in which we
found out that "en_US" locale has utterly different sort orders on
Linux and OS X.


For me, this paragraph is more of in argument *in favour* of having our own 
locale support. At least for me, consistency between PG running on different 
platforms would bring more benefits than consistency between PG and the platform 
it runs on.


At the company I used to work for, we had all our databases running with 
encoding=utf-8 and locale=C, because I didn't want our applications to depend on 
platform-specific locale issues. Plus, some of the applications supported 
multiple languages, making a cluster-global locale unworkable anyway - a 
restriction which would go away if we went with ICU.


regards, Florian Pflug


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Simon Riggs

On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > That seemed more complex when I thought about that, but if we just use
> > SIGUSR2 for automatic cancels then this would be very simple.
> 
> Why not SIGINT?

I must be missing something. How would I tell the difference between
manual and automatic cancels if we use SIGINT for both cases?

Or did you think to put another flag on the PGPROC to allow us to tell
whether it isAvoidingWraparound?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Freitag, 12. Oktober 2007 schrieb Gregory Stark:
>> It would make Postgres inconsistent and less integrated with the rest of
>> the OS. How do you explain that Postgres doesn't follow the system's
>> configurations and the collations don't agree with the system collations?

> We already have our own encoding support (for better or worse), and I don't 
> think having one's own locale support would be that much different.

Well, yes it would be, because encodings are pretty well standardized;
there is not likely to be any user-visible difference between one
platform's idea of UTF8 and another's.  This is very very far from being
the case for locales.  See for instance the recent thread in which we
found out that "en_US" locale has utterly different sort orders on
Linux and OS X.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Magnus Hagander

Tom Lane wrote:
> Magnus Hagander <[EMAIL PROTECTED]> writes:
>> Attached is a patch I want to apply for this. Toms message at
>> http://archives.postgresql.org/pgsql-hackers/2007-10/msg00448.php makes me
>> bring it up here before I apply it.
> 
> [ squint... ]  There is something wrong here, because a superuser should
> certainly pass the aclcheck test.  I don't know where the bug is but
> this is not the correct fix.

Are you sure? pg_tablespace_aclmask() has:
/*
 * Only shared relations can be stored in global space; don't let even
 * superusers override this
 */
if (spc_oid == GLOBALTABLESPACE_OID && !IsBootstrapProcessingMode())
return 0;

And this is what causes the failure. I certainly didn't want to take out
that check, so short-circuiting it in the way I did seemed right..

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Martijn van Oosterhout

On Fri, Oct 12, 2007 at 03:28:26PM +0100, Gregory Stark wrote:
> Fix the problem by making ICU a smaller less complex dependency? 

How? It's 95% data, you can't reduce that. glibc also has 10MB of locale
data. That actual code is much smaller than postgres and doesn't depend
on any other non-system libraries.

> I think realistically we're basically waiting for strcoll_l to become
> standardized by POSIX so we can depend on it.

I think we could be waiting forever then. It's supported by Win32,
MacOSX and glibc. The systems that don't support it tend not to support
multibyte collation anyway. Patches have been created to use this and
rejected because not enough platforms support it...

> Personally I think we should just implement our own strcoll_l as a wrapper
> around setlocale-strcoll-setlocale and use strcoll_l if it's available and
> our, possibly slow, wrapper if not. If we ban direct use of strcoll and other
> lc_collate sensitive functions in Postgres we could also remember the last
> locale used and not do unnecessary setlocales so existing use cases aren't
> slowed down at all.

Been done also. As I recall it was *really* slow, not just a little
bit.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Andreas Pflug

Michael Glaesemann wrote:
>
> On Oct 12, 2007, at 10:19 , Gregory Stark wrote:
>
>> It would make Postgres inconsistent and less integrated with the rest
>> of the
>> OS. How do you explain that Postgres doesn't follow the system's
>> configurations and the collations don't agree with the system
>> collations?
>
> How is this fundamentally different from PostgreSQL using a separate
> users/roles system than the OS?
Even more, eliminating dependencies on a OS's correct implementation of
locale stuff appears A Good Thing to me. I wonder if a compile time
option to use ICU in 8.4 should be considered, regarding all those
lengthy threads about encoding/locale/collation problems.

Regards,
Andreas


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> That seemed more complex when I thought about that, but if we just use
> SIGUSR2 for automatic cancels then this would be very simple.

Why not SIGINT?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Bruce Momjian

Tom Lane wrote:
> Dave Page <[EMAIL PROTECTED]> writes:
> > select pg_tablespace_size('pg_global');
> > ERROR:  permission denied for tablespace pg_global
> 
> > Can this be fixed please?
> 
> What's to be fixed?  That's exactly the result I'd expect given the
> previously-agreed-to behavior for pg_tablespace_size.

I know we said tablespace requires CREATE privs, but certainly the
super-user should be able to use the function too.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Tom Lane

Magnus Hagander <[EMAIL PROTECTED]> writes:
> Attached is a patch I want to apply for this. Toms message at
> http://archives.postgresql.org/pgsql-hackers/2007-10/msg00448.php makes me
> bring it up here before I apply it.

[ squint... ]  There is something wrong here, because a superuser should
certainly pass the aclcheck test.  I don't know where the bug is but
this is not the correct fix.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-12 Thread Tom Lane

Andreas Joseph Krogh <[EMAIL PROTECTED]> writes:
> Will $SUBJECT make it possible for count(*) to use index? This is a much
> wanted feature.

If you mean "count(*) will become instantaneous", no it won't.  It would
get faster, but probably not by more than a factor of 10 or so,
corresponding to the size ratio between index and table.  Remember that
the proposal is going to bloat indexes pretty badly: a simple index on
an int4 column will double in size, more or less.  So that ratio will
be much less favorable than it is today.

Personally I think the bloat problem will doom this entire approach.
The distributed costs of that will outweigh the speedups that can be
achieved for specific queries.  The OP is free to try to prove this
fear wrong, but no amount of discussion will constitute such a proof;
only a testable implementation.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Locale + encoding combinations

2007-10-12 Thread Magnus Hagander

On Fri, Oct 12, 2007 at 06:03:52AM -0700, Trevor Talbot wrote:
> On 10/12/07, Dave Page <[EMAIL PROTECTED]> wrote:
> > Tom Lane wrote
> > > That still leaves us with the problem of how to tell whether a locale
> > > spec is bad on Windows.  Judging by your example, Windows checks whether
> > > the code page is present but not whether it is sane for the base locale.
> > > What happens when there's a mismatch --- eg, what encoding do system
> > > messages come out in?
> >
> > I'm not sure how to test that specifically, but it seems that accented
> > characters simply fall back to their undecorated equivalents if the
> > encoding is not appropriate, eg:
> >
> > [EMAIL PROTECTED]:~$ ./setlc French_France.1252
> > Locale: French_France.1252
> > The date is: sam. 01 of août  2007
> > [EMAIL PROTECTED]:~$ ./setlc French_France.28597
> > Locale: French_France.28597
> > The date is: sam. 01 of aout  2007
> >
> > (the encodings used there are WIN1252 and ISO8859-7 (Greek)).
> >
> > I'm happy to test further is you can suggest how I can figure out the
> > encoding actually output.
> 
> The encoding output is the one you specified.  Keep in mind,
> underneath Windows is mostly working with Unicode, so all characters
> exist and the locale rules specify their behavior there.  The encoding
> is just the byte stream it needs to force them all into after doing
> whatever it does to them.  As you've seen, it uses some sort of
> best-fit mapping I don't know the details of.  (It will drop accent
> marks and choose characters with similar shape where possible, by
> default.)
> 
> I think it's a bit more complex for input/transform cases where you
> operate on the byte stream directly without intermediate conversion to
> Unicode, which is why UTF-8 doesn't work as a codepage, but again I
> don't have the details nearby.  I can try to do more digging if
> needed.

Just so the non-windows-savvy people get it.. When Windows documentation or
users refer to Unicode, they mean UTF-16.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Magnus Hagander

On Fri, Oct 12, 2007 at 12:20:08PM +0100, Dave Page wrote:
> Following on from Hiroshi's earlier post:
> http://archives.postgresql.org/pgsql-hackers/2007-10/msg00324.php
> 
> Per http://archives.postgresql.org/pgsql-hackers/2007-08/msg01018.php,
> superusers should be able to use pg_tablespace_size() on any tablespace,
> however when logged in as postgres on a beta1 server, I get:
> 
> -- Executing query:
> select pg_tablespace_size('pg_global');
> 
> ERROR:  permission denied for tablespace pg_global
> 
> Can this be fixed please?

Attached is a patch I want to apply for this. Toms message at
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00448.php makes me
bring it up here before I apply it.

I think the correct behaviour is that superusers should be able to do it,
and that's in the thread that Dave referred to. But since Tom said it's "by
design" in the message above, I'd like confirmation that this is ok.

Since I think that allowing superusers to do it is the only sensible thing,
I will apply this patch unless someone objects :-)

//Magnus
Index: src/backend/utils/adt/dbsize.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/adt/dbsize.c,v
retrieving revision 1.14
diff -c -r1.14 dbsize.c
*** src/backend/utils/adt/dbsize.c  29 Aug 2007 17:24:29 -  1.14
--- src/backend/utils/adt/dbsize.c  12 Oct 2007 14:38:32 -
***
*** 164,171 
/*
 * User must have CREATE privilege for target tablespace, either 
explicitly
 * granted or implicitly because it is default for current database.
 */
!   if (tblspcOid != MyDatabaseTableSpace)
{
aclresult = pg_tablespace_aclcheck(tblspcOid, GetUserId(), 
ACL_CREATE);
if (aclresult != ACLCHECK_OK)
--- 164,175 
/*
 * User must have CREATE privilege for target tablespace, either 
explicitly
 * granted or implicitly because it is default for current database.
+*
+* Specifically check for superuser permissions here, since
+* pg_tablespace_aclcheck() will deny access to pg_global even for
+* superusers.
 */
!   if (tblspcOid != MyDatabaseTableSpace && !superuser())
{
aclresult = pg_tablespace_aclcheck(tblspcOid, GetUserId(), 
ACL_CREATE);
if (aclresult != ACLCHECK_OK)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Peter Eisentraut

Am Freitag, 12. Oktober 2007 schrieb Martijn van Oosterhout:
> Where we're stuck is that we can't agree on a
> source of locale data. People don't want the ICU or glibc data and
> there's no other source as readily available.

What were the objections to ICU?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Gregory Stark

"Martijn van Oosterhout" <[EMAIL PROTECTED]> writes:

> People don't want the ICU or glibc data and there's no other source as
> readily available.
>
> Perhaps we should fix that problem, rather than making more
> workarounds.

Fix the problem by making ICU a smaller less complex dependency? 
Or fix the problem that glibc isn't everyone's libc?

I think realistically we're basically waiting for strcoll_l to become
standardized by POSIX so we can depend on it.

Personally I think we should just implement our own strcoll_l as a wrapper
around setlocale-strcoll-setlocale and use strcoll_l if it's available and
our, possibly slow, wrapper if not. If we ban direct use of strcoll and other
lc_collate sensitive functions in Postgres we could also remember the last
locale used and not do unnecessary setlocales so existing use cases aren't
slowed down at all.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Locale + encoding combinations

2007-10-12 Thread Trevor Talbot

On 10/12/07, Dave Page <[EMAIL PROTECTED]> wrote:
> Tom Lane wrote
> > That still leaves us with the problem of how to tell whether a locale
> > spec is bad on Windows.  Judging by your example, Windows checks whether
> > the code page is present but not whether it is sane for the base locale.
> > What happens when there's a mismatch --- eg, what encoding do system
> > messages come out in?
>
> I'm not sure how to test that specifically, but it seems that accented
> characters simply fall back to their undecorated equivalents if the
> encoding is not appropriate, eg:
>
> [EMAIL PROTECTED]:~$ ./setlc French_France.1252
> Locale: French_France.1252
> The date is: sam. 01 of août  2007
> [EMAIL PROTECTED]:~$ ./setlc French_France.28597
> Locale: French_France.28597
> The date is: sam. 01 of aout  2007
>
> (the encodings used there are WIN1252 and ISO8859-7 (Greek)).
>
> I'm happy to test further is you can suggest how I can figure out the
> encoding actually output.

The encoding output is the one you specified.  Keep in mind,
underneath Windows is mostly working with Unicode, so all characters
exist and the locale rules specify their behavior there.  The encoding
is just the byte stream it needs to force them all into after doing
whatever it does to them.  As you've seen, it uses some sort of
best-fit mapping I don't know the details of.  (It will drop accent
marks and choose characters with similar shape where possible, by
default.)

I think it's a bit more complex for input/transform cases where you
operate on the byte stream directly without intermediate conversion to
Unicode, which is why UTF-8 doesn't work as a codepage, but again I
don't have the details nearby.  I can try to do more digging if
needed.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Deblauwe Gino


Simon Riggs schreef:

On Fri, 2007-10-12 at 11:44 +0200, Michael Paesold wrote:
  

Simon Riggs wrote:


On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:
  

Yes, I think it is easy to mark the "is for xid wraparound" bit in the
WorkerInfo struct and have the cancel work only if it's off.

However, what I think should happen is that the signal handler for
SIGINT in a worker for xid wraparound should not cancel the current
vacuum.  Instead turn it into a no-op, if possible.  That way we also
disallow a user from cancelling vacuums for xid wraparound.  I think he
can do that with pg_cancel_backend, and it could be dangerous.


I think that is dangerous too because the user may have specifically
turned AV off. That anti-wraparound vacuum might spring up right in a
busy period and start working its way through many tables, all of which
cause massive writes to occur. That's about as close to us causing an
outage as I ever want to see. We need a way through that to allow the
user to realise his predicament and find a good time to VACUUM. I never
want to say to anybody "nothing you can do, just sit and watch, your
production system will be working again in no time. Restart? no that
won't work either."
  
You are probably right that VACUUM going full-steam is a bad idea in most 
situations. Except for anti-wraparound vacuum, cancellation seems the most 
reasonable thing to do. Because autovacuum will usually pickup the table in 
time again.



Yeh, if we do have to do the second emergency anti-wraparound, then that
should be at full speed, since there's nothing else to do at that point.

  
The only problem I would see is if someone has an application that does a 
lot of schema changes (doesn't sound like a good idea anyway). In that case 
they would better issue manual vacuums on such tables.



I can't see a use case for regular DDL as part of an application, on an
otherwise integral table (lots of updates and deletes).
  

As part of an application there's no use.
As part of an upgrade between 2 different versions of that application 
there is.
And that's exactly the kind of situation where temporary disabling 
autovacuum could become handy.

[HACKERS] pg_restore oddity?

2007-10-12 Thread Mario Weilguni

There's a IMO a problem with pg_restore, it should be easy to fix (I 
hope - and I could try to fix it and send a patch).

* I've a dump taken from a 8.1 database
* I'm using gist and ltree
* I'm restoring to a 8.2 database

Problem:
I cannot use "-1" for performance, because some gist stuff has changed 
and the restore fails. But there seems to be no option for pg_restore to 
use transactions for data restore, so it's very very slow (one million 
records, each obviously in it's own transaction - because a separate 
session "select count(1) from logins" shows a growing number).


It would be nice to use transactions for the data stuff itself, but not 
for schema changes or functions. I know I can use separate pg_restore 
runs for schema and data, but it's complicated IMHO.


I see several options:
* Use transactions for data, maybe with a separate command line option
* Use transactions everytime, and place savepoints to recover from errors?

Any ideas what I could do?

Regards
Mario


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Peter Eisentraut

Am Freitag, 12. Oktober 2007 schrieb Gregory Stark:
> . when creating a new database from a template the new locale and encoding
>   must be identical to the template database's encoding and locale. Unless
> the template is template0 in which case we rebuild all indexes after
> copying.

Why would you restrict the index rebuilding only to this particular case?  It 
could be done for any database.

The other issue are shared catalogs.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-12 Thread Gokulakannan Somasundaram

On 10/12/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
>
> Gokulakannan Somasundaram wrote:
> If records have just been inserted to a block, it is in cache. Therefore
> hitting that block to check visibility isn't going to cost much. There
> might be some middle-ground where a tuple has been inserted a while ago,
> so that the block has already been evicted from cache, but the
> transaction hasn't yet been committed, but that's a pretty narrow use
> case.
>
> Note that we can flag a page in the DSM not only by VACUUM, but by any
> backend as soon as all tuples are visible to everyone. You do have to
> scan the tuple headers on the page to determine that, but that's not so
> much overhead, and it could be offloaded to the bgwriter.

The first step in any database tuning of course is to reduce Random I/Os.
But then the next step is to reduce the logical I/Os,
Whether the I/O happens from the Disk/Memory, we should try to reduce the
access to a shared resource as much as possible.
So even if the newly inserted tuples are in memory / disk, restricting the
access to it would improve the overall performance of the system(That place
can be taken over by other blocks). If we see the overall picture,
scalability of the database gets increased, as we reduce the number of locks
being taken.

Yep, true. A long-running transaction like that is problematic anyway,
> because we can't vacuum away any dead rows generated during that period.

It is not problematic for the Indexes with snapshot. They will be working as
usual. And i think one reason why timestamp based databases got an advantage
over locking based databases is that batch jobs can run together with OLTP
transactions. In order to maintain that advantage in PostgreSQL, Indexes
with snapshot helps.

Why do you think that the DBA needs to register tables to the DSM
> manually? Surely that would happen automatically.

Accepted.

The patch submitted for 8.3 did use a global lock, and a fixed size
> shared memory area, but those were exactly the reasons the patch was
> rejected. It will have to be reworked for 8.4.

Ok, then the best case rework to my understanding would be to include the
bitmap DSM block number  into the locking attributes. But still one DSM
block would be mapping to 8192 * 8 = 65536 Table blocks. So if you have to
add a unregistered chunk of a newly created partition, then any access into
the 65536 blocks will have to get stalled, which may not be acceptable under
the OLTP performance requirements. This becomes a performance overhead for
people maintaining daily partitions, as they create a new table everyday and
the space would be increasing from morning till evening and the same table
would be queried the most.

I'm not convinced that's true. We only need super-exclusive locks on
> index pages for interlocking index and heap accesses with non-MVCC
> snapshots, IOW system tables. And since the lock is only held for a
> short time and infrequently, it hasn't been a problem at all.

For a heap with no indexes, we don't take super-exclusive lock to do Vacuum
on it. We just  need to take Exclusive lock on each block and do the Vacuum
on it. That's because the table contains the necessary visibility
information. But with indexes, we may need to refer to the table in order to
do Vacuum. In the mean-while we don't want any page splits to happen. That's
why we take a super exclusive lock on all the leaf  pages (no body should
even have a pin on one of them Ref : README file in nbtree directory) But if
we have the snapshot info into the indexes, then we can just do a index
scan(similar to the heap scan described before) by taking Exclusive lock on
pages one by one and Vacuuming them.

> ii) The above would mean that index can be in operation while the vacuum
> is
> > happening
>
> Huh? VACUUM hasn't locked out other access since version 7.2!

I might have missed something here. If we need Super-Exclusive lock on all
leaf pages in the index to do the Vacuum(no-one should be even having a PIN
on it), then how are we able to allow index scans while the Vacuum is
happening? In my case, the index will have the snapshot information. so no
super exclusive locks, so other leaf pages can be accessed.
If there is a explanation, it might also be useful to update the README file
in  the nbtree directory

> iii) As we have already stated, it provides a efficient clustering of
> > related data.
>
> Sorry, I missed that part. What's that?

Say i create a index on SalespersonId, Date, Customer Name, Details of the
Transaction on a table. For a query like

select Customer Name, Details of the transaction from table where
salespersonid='xyz' order by date.

The entire query gets satisfied by the Index. We will not goto the table. In
short the index acts like an IOT in oracle/ Clustered indexes in other
databases. The necessary information is obtained from one place, since
snapshot is stored in the index
 It will be very useful, especially when the query is goi

Re: [HACKERS] Some questions about mammoth replication

2007-10-12 Thread Alexey Klyukin

Hannu Krosing wrote:

> > We don't use either a log table in database or WAL. The data to
> > replicate is stored in disk files, one per transaction.
> 
> Clever :)
> 
> How well does it scale ? That is, at what transaction rate can your
> replication keep up with database ?

This depend on a number of concurrent transactions (the data is
collected by every backend process), max transaction size etc. I don't
have numbers here, sorry.

> 
> >  As Joshua said,
> > the WAL is used to ensure that only those transactions that are recorded
> > as committed in WAL are sent to slaves.
> 
> How do you force correct commit order of applying the transactions ?

The first transaction that is committed in PostgreSQL is the first
transaction placed into the queue, and the first that is restored by the
slave.

> 
> > > 
> > > > > Do you make use of snapshot data, to make sure, what parts of WAL log
> > > > > are worth migrating to slaves , or do you just apply everything in WAL
> > > > > in separate transactions and abort if you find out that original
> > > > > transaction aborted ?
> > > > 
> > > > We check if a data transaction is recorded in WAL before sending
> > > > it to a slave. For an aborted transaction we just discard all data 
> > > > collected 
> > > > from that transaction.
> > > 
> > > Do you duplicate postgresql's MVCC code for that, or will this happen
> > > automatically via using MVCC itself for collected data ?
> > 
> > Every transaction command that changes data in a replicated relation is
> > stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
> > when transaction commands are replayed there.
> 
> Do you replay several transaction files in the same transaction on
> slave ?

> Can you replay several transaction files in parallel ?

No, we have plans for concurrent restore of replicated data, but
currently we a single slave process responsible for restoring data
received from MCP.

> 
> > > How do you handle really large inserts/updates/deletes, which change say 
> > > 10M 
> > > rows in one transaction ?
> > 
> > We produce really large disk files ;). When a transaction commits - a
> > special queue lock is acquired and transaction is enqueued to a sending
> > queue. 
> > Since the locking mode for that lock is exclusive a commit of a
> > very large transaction would delay commits of other transactions until
> > the lock is held. We are working on minimizing the time of holding this
> > lock in the new version of Replicator.
> 
> Why does it take longer to queue a large file ? dou you copy data from
> one file to another ?

Yes, currently the data is copied from the transaction files into the
queue (this doesn't apply to dump transactions).

However, we have recently changed this, the new code will acquire the
queue lock only to record transaction as committed in replication log
without moving the data.

> > > Do you replay it as SQL insert/update/delete commands, or directly on
> > > heap/indexes ?
> > 
> > We replay the commands directly using heap/index functions on a slave.
> 
> Does that mean that the table structures will be exactly the same on
> both master slave ? 

Yes, the table structure on the slaves should match the table structure
on master.

> That is, do you replicate a physical table image
> (maybe not including transaction ids on master) ?

Yes, we call this 'full dump', and it is fired automatically for every
replicated table. We replicate only data however, not DDL commands to
create/alter table or sequence.

> 
> Or you just use lower-level versions on INSERT/UPDATE/DELETE ?
> 
> -
> Hannu
> 
> 
> 

Regards,
-- 
Alexey Klyukin http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Locales and Encodings

2007-10-12 Thread Gregory Stark


It seems like the root of the problems we're butting our heads against with
encoding and locale is all the same issue: it's nonsensical to take the locale
at initdb time per-cluster and then allow user-specified encoding
per-database. If anything it would make more sense to go the other way around.

But actually it seems to me we could allow changing both on a per-database
basis with certain restrictions:

. template0 is always SQL_ASCII with locale C

. when creating a new database you can specify the encoding and locale and we
  check that they're compatible. 

. when creating a new database from a template the new locale and encoding
  must be identical to the template database's encoding and locale. Unless the
  template is template0 in which case we rebuild all indexes after copying.

We could liberalize this last restriction if we created a new encoding like
SQL_ASCII but which enforces 7-bit ascii. But then the index rebuild step
could take a long time.

This would make the whole locale/encoding issue make much more transparent. In
database listings you would see both listed alongside, you wouldn't be bound
by any initdb environment choices, and errors when running create database
would be able to tell you exactly what you're doing wrong and what you have to
do to avoid the problem.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] pg_tablespace_size()

2007-10-12 Thread Dave Page

Following on from Hiroshi's earlier post:
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00324.php

Per http://archives.postgresql.org/pgsql-hackers/2007-08/msg01018.php,
superusers should be able to use pg_tablespace_size() on any tablespace,
however when logged in as postgres on a beta1 server, I get:

-- Executing query:
select pg_tablespace_size('pg_global');

ERROR:  permission denied for tablespace pg_global

Can this be fixed please?

Regards, Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Some questions about mammoth replication

2007-10-12 Thread Hannu Krosing

Ühel kenal päeval, R, 2007-10-12 kell 12:39, kirjutas Alexey Klyukin:
> Hannu Krosing wrote:
> 
> > > We have hooks in executor calling our own collecting functions, so we
> > > don't need the trigger machinery to launch replication.
> > 
> > But where do you store the collected info - in your own replication_log
> > table, or do reuse data in WAL you extract it on master befor
> > replication to slave (or on slave after moving the WAL) ?
> 
> We don't use either a log table in database or WAL. The data to
> replicate is stored in disk files, one per transaction.

Clever :)

How well does it scale ? That is, at what transaction rate can your
replication keep up with database ?

>  As Joshua said,
> the WAL is used to ensure that only those transactions that are recorded
> as committed in WAL are sent to slaves.

How do you force correct commit order of applying the transactions ?

> > 
> > > > Do you make use of snapshot data, to make sure, what parts of WAL log
> > > > are worth migrating to slaves , or do you just apply everything in WAL
> > > > in separate transactions and abort if you find out that original
> > > > transaction aborted ?
> > > 
> > > We check if a data transaction is recorded in WAL before sending
> > > it to a slave. For an aborted transaction we just discard all data 
> > > collected 
> > > from that transaction.
> > 
> > Do you duplicate postgresql's MVCC code for that, or will this happen
> > automatically via using MVCC itself for collected data ?
> 
> Every transaction command that changes data in a replicated relation is
> stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
> when transaction commands are replayed there.

Do you replay several transaction files in the same transaction on
slave ?

Can you replay several transaction files in parallel ?

> > How do you handle really large inserts/updates/deletes, which change say 
> > 10M 
> > rows in one transaction ?
> 
> We produce really large disk files ;). When a transaction commits - a
> special queue lock is acquired and transaction is enqueued to a sending
> queue. 
> Since the locking mode for that lock is exclusive a commit of a
> very large transaction would delay commits of other transactions until
> the lock is held. We are working on minimizing the time of holding this
> lock in the new version of Replicator.

Why does it take longer to queue a large file ? dou you copy data from
one file to another ?

> > > > Do you extract / generate full sql DML queries from data in WAL logs, or
> > > > do you apply the changes at some lower level ?
> > > 
> > > We replicate the binary data along with a command type. Only the data
> > > necessary to replay the command on a slave are replicated.
> > 
> > Do you replay it as SQL insert/update/delete commands, or directly on
> > heap/indexes ?
> 
> We replay the commands directly using heap/index functions on a slave.

Does that mean that the table structures will be exactly the same on
both master slave ? That is, do you replicate a physical table image
(maybe not including transaction ids on master) ?

Or you just use lower-level versions on INSERT/UPDATE/DELETE ?

-
Hannu




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] ECPG regression tests

2007-10-12 Thread Magnus Hagander

On Tue, Oct 09, 2007 at 05:44:22PM -0400, Andrew Dunstan wrote:
> 
> 
> Andrew Dunstan wrote:
> >
> >
> >Magnus Hagander wrote:
> >>>(Hint: if you don't put the PlatformSDK directories first in the 
> >>>INCLUDE and LIB lists bad and inexplicable things can happen.)
> >>>
> >>>Pick up the latest version of run_build.pl in CVS if you want to run 
> >>>this in your buildfarm animal now.
> >>>
> >>>A release will be forthcoming very soon.
> >>>
> >>
> >>I put it in, but it doesn't work. It works when running ecpg tests 
> >>manual,
> >>but from run_build I get:
> >>http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=skylark&dt=2007-10-09%20090814&stg=ecpg-check
> >> 
> >>
> >>
> >>which seems similar to what you had before. How did you fix that 
> >>one?  Is
> >>that the one requiring a reorder?
> >>
> >>
> >>  
> >
> >Yes. compare its build_env INCLUDE and LIB and possibly PATH values 
> >with those of red_bat: 
> >http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=red_bat&dt=2007-10-09%2004:30:04
> > 
> >
> >
> >BTW, following some advice I found on the net those PlatformSDK 
> >directories were copied manually from the SDK install, back when I 
> >installed MSVC.
> 
> You also appear to have simply a configuration bug - there is a missing 
> semicolon or two in:
> 
>  LIB' => 'D:\\Program Files (x86)\\Microsoft Visual Studio 
>  8\\VC\\ATLMFC\\LIB;D:\\Program Files (x86)\\Microsoft Visual Studio 
>  8\\VC\\LIBD:\\Program Files (x86)\\Microsoft Visual Studio 
>  8\\VC\\PlatformSDK\\lib;D:\\Program Files (x86)\\Microsoft Visual Studio 
>  8\\SDK\\v2.0\\libD:\\Program Files (x86)\\Microsoft Visual Studio .NET 
>  2003\\SDK\\v1.1\\Lib\\;C:\\Program Files\\SQLXML 4.0\\bin\\',

Gah. That was it, I had . instead of , in the config file :-(

Fixing that made the ecpg regression etsts pass, didn't have to reorder or
change anything at all.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-12 Thread Heikki Linnakangas

Andreas Joseph Krogh wrote:
> Will $SUBJECT make it possible for count(*) to use index? This is a much 
> wanted feature.

Yes, both the DSM approach and the approach proposed by Gokul.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-12 Thread Andreas Joseph Krogh

On Friday 12 October 2007 11:49:17 Heikki Linnakangas wrote:
> Andreas Joseph Krogh wrote:
> > Will $SUBJECT make it possible for count(*) to use index? This is a much
> > wanted feature.
>
> Yes, both the DSM approach and the approach proposed by Gokul.

Good.

-- 
Andreas Joseph Krogh <[EMAIL PROTECTED]>
Senior Software Developer / Manager
+-+
OfficeNet AS| The most difficult thing in the world is to |
Karenslyst Allé 11  | know how to do a thing and to watch |
PO. Box 529 Skøyen  | somebody else doing it wrong, without   |
0214 Oslo   | comment.|
NORWAY  | |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Michael Paesold


Simon Riggs wrote:

On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:

Yes, I think it is easy to mark the "is for xid wraparound" bit in the
WorkerInfo struct and have the cancel work only if it's off.

However, what I think should happen is that the signal handler for
SIGINT in a worker for xid wraparound should not cancel the current
vacuum.  Instead turn it into a no-op, if possible.  That way we also
disallow a user from cancelling vacuums for xid wraparound.  I think he
can do that with pg_cancel_backend, and it could be dangerous.


I think that is dangerous too because the user may have specifically
turned AV off. That anti-wraparound vacuum might spring up right in a
busy period and start working its way through many tables, all of which
cause massive writes to occur. That's about as close to us causing an
outage as I ever want to see. We need a way through that to allow the
user to realise his predicament and find a good time to VACUUM. I never
want to say to anybody "nothing you can do, just sit and watch, your
production system will be working again in no time. Restart? no that
won't work either."


You are probably right that VACUUM going full-steam is a bad idea in most 
situations. Except for anti-wraparound vacuum, cancellation seems the most 
reasonable thing to do. Because autovacuum will usually pickup the table in 
time again.


The only problem I would see is if someone has an application that does a 
lot of schema changes (doesn't sound like a good idea anyway). In that case 
they would better issue manual vacuums on such tables.


Best Regards
Michael Paesold

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Timezone database changes

2007-10-12 Thread Peter Eisentraut

Am Donnerstag, 11. Oktober 2007 schrieb Gregory Stark:
> Thinking of it as UTC is the wrong way to think about it. A birth occurred
> at a specific moment in time. You want to record that precise moment, not
> what it happened to show on the clock at the time. If the clock turns out
> to have been in the wrong timezone the birth isn't going to move.
>
> The use case for storing a local timestamp with a timezone attached is for
> things like appointments. If the time zone rules change you would want the
> appointment to move with them, not to stay at the same moment in time.

The difference here is that one occured in the past and one is planned for the 
future.  Appointments in the past will still stay at the same time even if 
the time zone rules change afterwards.

The supercorrect way to handle this would likely be to introduce some sort of 
time-zone rules changeset that describes "as of point in time X, the time 
zone designation ABC changes in the following way", which would then fix up 
all data items past point X in the database in some clever way.  Obviously 
this is quite a bit too much for us to manage.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote:
>> Why not SIGINT?

> I must be missing something. How would I tell the difference between
> manual and automatic cancels if we use SIGINT for both cases?

Why do you need to?  I thought the plan was that DeadlockCheck would
only try to signal autovac workers.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Tom Lane

I wrote:
> [ squint... ]  There is something wrong here, because a superuser should
> certainly pass the aclcheck test.  I don't know where the bug is but
> this is not the correct fix.

OK, after looking, the issue is this wart in pg_tablespace_aclmask():

/*
 * Only shared relations can be stored in global space; don't let even
 * superusers override this
 */
if (spc_oid == GLOBALTABLESPACE_OID && !IsBootstrapProcessingMode())
return 0;

/* Otherwise, superusers bypass all permission checking. */

There are a number of ways that we could deal with this:

* Just remove the above-quoted lines.  Superusers should be allowed to
shoot themselves in the foot.  (I'm not actually sure that there would
be any bad consequences from putting an ordinary table into pg_global
anyway.  I think I wrote the above code in fear that some parts of the
system would equate reltablespace = pg_global with relisshared, but
AFAICS that's not the case now.)

* Remove the above lines and instead put a defense into heap_create.
This might be better design anyway since a more specific error could
be reported.

* Leave aclchk.c as-is and apply Magnus' patch to allow superusers
to bypass the check in pg_tablespace_size.

* Decide that we should allow anyone to do pg_tablespace_size('pg_global')
and put in a special wart for that in dbsize.c.  This wasn't part of
the original agreement but maybe there's a case to be made for it.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Simon Riggs

On Fri, 2007-10-12 at 11:44 +0200, Michael Paesold wrote:
> Simon Riggs wrote:
> > On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:
> >> Yes, I think it is easy to mark the "is for xid wraparound" bit in the
> >> WorkerInfo struct and have the cancel work only if it's off.
> >>
> >> However, what I think should happen is that the signal handler for
> >> SIGINT in a worker for xid wraparound should not cancel the current
> >> vacuum.  Instead turn it into a no-op, if possible.  That way we also
> >> disallow a user from cancelling vacuums for xid wraparound.  I think he
> >> can do that with pg_cancel_backend, and it could be dangerous.
> > 
> > I think that is dangerous too because the user may have specifically
> > turned AV off. That anti-wraparound vacuum might spring up right in a
> > busy period and start working its way through many tables, all of which
> > cause massive writes to occur. That's about as close to us causing an
> > outage as I ever want to see. We need a way through that to allow the
> > user to realise his predicament and find a good time to VACUUM. I never
> > want to say to anybody "nothing you can do, just sit and watch, your
> > production system will be working again in no time. Restart? no that
> > won't work either."
> 
> You are probably right that VACUUM going full-steam is a bad idea in most 
> situations. Except for anti-wraparound vacuum, cancellation seems the most 
> reasonable thing to do. Because autovacuum will usually pickup the table in 
> time again.

Yeh, if we do have to do the second emergency anti-wraparound, then that
should be at full speed, since there's nothing else to do at that point.

> The only problem I would see is if someone has an application that does a 
> lot of schema changes (doesn't sound like a good idea anyway). In that case 
> they would better issue manual vacuums on such tables.

I can't see a use case for regular DDL as part of an application, on an
otherwise integral table (lots of updates and deletes).

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-12 Thread Gokulakannan Somasundaram

I agree with that. I will go ahead and do a test implementation and share
the results with everyone.

Thanks,
Gokul.

On 10/12/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> Andreas Joseph Krogh <[EMAIL PROTECTED]> writes:
> > Will $SUBJECT make it possible for count(*) to use index? This is a much
> > wanted feature.
>
> If you mean "count(*) will become instantaneous", no it won't.  It would
> get faster, but probably not by more than a factor of 10 or so,
> corresponding to the size ratio between index and table.  Remember that
> the proposal is going to bloat indexes pretty badly: a simple index on
> an int4 column will double in size, more or less.  So that ratio will
> be much less favorable than it is today.
>
> Personally I think the bloat problem will doom this entire approach.
> The distributed costs of that will outweigh the speedups that can be
> achieved for specific queries.  The OP is free to try to prove this
> fear wrong, but no amount of discussion will constitute such a proof;
> only a testable implementation.
>
> regards, tom lane
>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org
>

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Michael Glaesemann



On Oct 12, 2007, at 10:19 , Gregory Stark wrote:

It would make Postgres inconsistent and less integrated with the  
rest of the

OS. How do you explain that Postgres doesn't follow the system's
configurations and the collations don't agree with the system  
collations?


How is this fundamentally different from PostgreSQL using a separate  
users/roles system than the OS?


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Tom Lane

Dave Page <[EMAIL PROTECTED]> writes:
> In the message I quoted at the end of the long discussion on the topic
> you assured me it would work for superusers.
> Is there any reason the superuser shouldn't see the size of pg_global?

Oh, you were superuser?  Then something's broken, agreed.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Dave Page

Tom Lane wrote:
> Dave Page <[EMAIL PROTECTED]> writes:
>> select pg_tablespace_size('pg_global');
>> ERROR:  permission denied for tablespace pg_global
> 
>> Can this be fixed please?
> 
> What's to be fixed?  That's exactly the result I'd expect given the
> previously-agreed-to behavior for pg_tablespace_size.

In the message I quoted at the end of the long discussion on the topic
you assured me it would work for superusers.

Is there any reason the superuser shouldn't see the size of pg_global?

Regards, Dave.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Locale + encoding combinations

2007-10-12 Thread Dave Page

Tom Lane wrote
> That still leaves us with the problem of how to tell whether a locale
> spec is bad on Windows.  Judging by your example, Windows checks whether
> the code page is present but not whether it is sane for the base locale.
> What happens when there's a mismatch --- eg, what encoding do system
> messages come out in?

I'm not sure how to test that specifically, but it seems that accented
characters simply fall back to their undecorated equivalents if the
encoding is not appropriate, eg:

[EMAIL PROTECTED]:~$ ./setlc French_France.1252
Locale: French_France.1252
The date is: sam. 01 of août  2007
[EMAIL PROTECTED]:~$ ./setlc French_France.28597
Locale: French_France.28597
The date is: sam. 01 of aout  2007

(the encodings used there are WIN1252 and ISO8859-7 (Greek)).

I'm happy to test further is you can suggest how I can figure out the
encoding actually output.

Regards, Dave.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] pg_restore oddity?

2007-10-12 Thread Mario Weilguni


Heikki Linnakangas schrieb:

Mario Weilguni wrote:
  

I cannot use "-1" for performance, because some gist stuff has changed
and the restore fails. But there seems to be no option for pg_restore to
use transactions for data restore, so it's very very slow (one million
records, each obviously in it's own transaction - because a separate
session "select count(1) from logins" shows a growing number).



By default, pg_dump/pg_restore uses a COPY command for each table, and
each COPY executes as a single transaction, so you shouldn't see the row
count growing like that. Is the dump file in --inserts format?

  


You are right, it was my fault. I was confused about the pg_dump syntax, 
and used "-d" (the "-d" because pg_restore needs it for the destination 
database, not the dump itself), so it was using "--inserts".


Everything is working fine. I've done dump/restores cycles a hundreds 
times, and now such a mistake. I can't believe it.

Seems like I need to take some vacations.

Thanks for the help!



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> I think the best way to handle this is to have two limits.
> First limit attempts to autovacuum, but can be cancelled.
> When we hit second limit, sometime later, then autovacuum cannot be
> cancelled.

This seems like uselessly complex overdesign.

Remember that we still have the 8.1-era mechanism of warning messages
and eventual shutdown of normal backends as the wrap point gets closer.
A DBA who persists in cancelling antiwraparound vacuums will start
getting nasty warnings, and if he still persists will eventually be
forced to do the vacuuming manually in a standalone backend (which will
go at full speed btw).

I concur with Simon's original argument that it's a terrible idea to
make a background vacuum cancel-proof.  Pretty much the entire argument
for having vacuum at all (as opposed to say Oracle-style rollback) is
that it's good to be able to postpone maintenance overhead to a time of
the DBA's choosing.  I don't see a reason why that doesn't apply to
anti-wraparound vacuuming too.  If he postpones it too long, the
recovery will get unpleasant, but that's his choice.

An independent reason for minimizing complexity in this area is that
it's a corner case that (I trust) will never be hit by most
installations, which means it'll never get much field testing.
Poorly tested complicated code is dangerous.

In short: put in the automatic lock cancel for regular vacuums, disable
it for antiwraparound vacuums, but don't disable manual cancels; and
definitely don't invent a complicated new set of behaviors around that.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Locale + encoding combinations

2007-10-12 Thread Dave Page

Trevor Talbot wrote:
> The encoding output is the one you specified.  

OK.

> Keep in mind,
> underneath Windows is mostly working with Unicode, so all characters
> exist and the locale rules specify their behavior there.  The encoding
> is just the byte stream it needs to force them all into after doing
> whatever it does to them.  As you've seen, it uses some sort of
> best-fit mapping I don't know the details of.  (It will drop accent
> marks and choose characters with similar shape where possible, by
> default.)

Right, that makes sense. The codepages used by setlocale etc. are just
translation tables to/from the internal unicode representation.

> I think it's a bit more complex for input/transform cases where you
> operate on the byte stream directly without intermediate conversion to
> Unicode, which is why UTF-8 doesn't work as a codepage, but again I
> don't have the details nearby.  I can try to do more digging if
> needed.

It does (sort of) work as a codepage, it just doesn't have the NLS file
to define how things like UPPER() and LOWER() should work.

Regards, Dave

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] pg_tablespace_size()

2007-10-12 Thread Tom Lane

Dave Page <[EMAIL PROTECTED]> writes:
> select pg_tablespace_size('pg_global');
> ERROR:  permission denied for tablespace pg_global

> Can this be fixed please?

What's to be fixed?  That's exactly the result I'd expect given the
previously-agreed-to behavior for pg_tablespace_size.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Simon Riggs

On Fri, 2007-10-12 at 10:19 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I think the best way to handle this is to have two limits.
> > First limit attempts to autovacuum, but can be cancelled.
> > When we hit second limit, sometime later, then autovacuum cannot be
> > cancelled.
> 
> This seems like uselessly complex overdesign.

OK

> In short: put in the automatic lock cancel for regular vacuums, disable
> it for antiwraparound vacuums, but don't disable manual cancels; and
> definitely don't invent a complicated new set of behaviors around that.

That seemed more complex when I thought about that, but if we just use
SIGUSR2 for automatic cancels then this would be very simple.

Or did you have another design?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Gregory Stark

"Peter Eisentraut" <[EMAIL PROTECTED]> writes:

> Am Freitag, 12. Oktober 2007 schrieb Martijn van Oosterhout:
>> Where we're stuck is that we can't agree on a
>> source of locale data. People don't want the ICU or glibc data and
>> there's no other source as readily available.
>
> What were the objections to ICU?

It's introducing a new dependency to do something fundamental to Postgres, one
that's larger than all of Postgres.

It would make Postgres inconsistent and less integrated with the rest of the
OS. How do you explain that Postgres doesn't follow the system's
configurations and the collations don't agree with the system collations?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Peter Eisentraut

Am Freitag, 12. Oktober 2007 schrieb Gregory Stark:
> It would make Postgres inconsistent and less integrated with the rest of
> the OS. How do you explain that Postgres doesn't follow the system's
> configurations and the collations don't agree with the system collations?

We already have our own encoding support (for better or worse), and I don't 
think having one's own locale support would be that much different.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Locales and Encodings

2007-10-12 Thread Tom Lane

Martijn van Oosterhout <[EMAIL PROTECTED]> writes:
> On Fri, Oct 12, 2007 at 03:28:26PM +0100, Gregory Stark wrote:
>> I think realistically we're basically waiting for strcoll_l to become
>> standardized by POSIX so we can depend on it.

> I think we could be waiting forever then.

strcoll is only a small fraction of the problem anyway.  The 
and  functions are another chunk of it, and then there's the
issues of system message spellings, LC_MONETARY info, etc etc.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] First steps with 8.3 and autovacuum launcher

2007-10-12 Thread Simon Riggs

On Fri, 2007-10-12 at 12:42 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote:
> >> Why not SIGINT?
> 
> > I must be missing something. How would I tell the difference between
> > manual and automatic cancels if we use SIGINT for both cases?
> 
> Why do you need to?  I thought the plan was that DeadlockCheck would
> only try to signal autovac workers.

...thinks...

On Fri, 2007-10-12 at 10:19 -0400, Tom Lane wrote:

> In short: put in the automatic lock cancel for regular vacuums, disable
> it for antiwraparound vacuums, but don't disable manual cancels; and
> definitely don't invent a complicated new set of behaviors around that.

So you mean "dont allow automatic cancels of manually submitted
VACUUMs".

I thought you meant don't disable manually-requested cancels of
autovacuums.

Can you explain further what you meant by "don't disable manual
cancels".

(laughs) wish I had a pound for every time we'd misunderstood each
other, or at least, a pound every time I misunderstood you. :-)

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] fmgr_info_cxt_security() screwing memory ?

2007-10-12 Thread Luis Vargas

Hi, I'm calling an arbitrary user-defined function from the backend. 
Although I can do it via FunctionCallInvoke, I have a weird problem when 
calling fmgr_info. The call results in a argument variable (eventType) 
being cleared. A gdb hardware watch says that the variable is modified by 
fmgr_info_cxt_security (fmgr.c:185): "finfo->fn_extra = NULL"; Any ideas of 
what is happening?


For simplicity, in the following snippet I have hardcoded foid to some
(PGPSQL) function id and removed the rest of the call-related statements.

static void execEventTypeFunc(char *eventType, Oid funcId)
{
FmgrInfo *finfo;
FunctionCallInfoData fcinfo;
Datum   result;
Oid foid = 17283;

finfo = (FmgrInfo *) palloc0(sizeof(FmgrInfo));

fmgr_info(foid, finfo);

...
}

Thanks,

Luis Vargas


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] fmgr_info_cxt_security() screwing memory ?

2007-10-12 Thread Tom Lane

Luis Vargas <[EMAIL PROTECTED]> writes:
> Hi, I'm calling an arbitrary user-defined function from the backend. 
> Although I can do it via FunctionCallInvoke, I have a weird problem when 
> calling fmgr_info. The call results in a argument variable (eventType) 
> being cleared. A gdb hardware watch says that the variable is modified by 
> fmgr_info_cxt_security (fmgr.c:185): "finfo->fn_extra = NULL"; Any ideas of 
> what is happening?

One wonders whether eventType points at already-pfreed memory.  If not,
the problem must lie somewhere in the code you removed from the example.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] Seems we need a post-beta1 initdb already

2007-10-12 Thread Tom Lane

As Martin Pitt pointed out in this pgsql-bugs thread
http://archives.postgresql.org/pgsql-bugs/2007-10/msg00089.php
we have managed to create an ABI break between 8.2 and 8.3 libpq
by renumbering encoding IDs in pg_wchar.h.  Although perhaps this
does not hurt any third-party clients, it does break our own initdb
and psql programs, which turn out to be dependent on libpq and the
backend having the same numbering.  Going forward we should try to
fix things so that the exact values of those IDs need not be considered
part of libpq's ABI, but that doesn't get us out of the fact that 8.2
psql will fail if you try to use it with CVS HEAD libpq.so.

It seems that we are faced with a choice of two evils:

1.  Accept that there's an ABI break and increment libpq.so's major
version number for 8.3.  This will be a PITA for packagers, who will
have to carry a compatibility package to provide 8.2 libpq.so.

2.  Renumber 8.3's encoding IDs to preserve compatibility with the
8.2 values.  It turns out that we can do that, but we will have to
force initdb because the contents of pg_database.encoding will change.

I'm of the opinion that #2 is the lesser evil, but maybe I'm overly
influenced by my Red Hat packaging responsibilities --- I'll personally
have to spend time on a compatibility package if we go with #1.
Other opinions out there?

Also, if we do #2 it means that we have the option to resolve the
contrib/txid mess by pushing txid into the core backend before beta2.
Any votes pro or con on that?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Seems we need a post-beta1 initdb already

2007-10-12 Thread Andrew Dunstan




Tom Lane wrote:

I'm of the opinion that #2 is the lesser evil, but maybe I'm overly
influenced by my Red Hat packaging responsibilities --- I'll personally
have to spend time on a compatibility package if we go with #1.
Other opinions out there?

Also, if we do #2 it means that we have the option to resolve the
contrib/txid mess by pushing txid into the core backend before beta2.
Any votes pro or con on that?


  


#2 by all means. Post-beta initdb is mildly unfortunate; compatibility 
packages are to be avoided far more.


+1 to put txid in core, where it belongs.

cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Seems we need a post-beta1 initdb already

2007-10-12 Thread Darcy Buskermolen

On Friday 12 October 2007 15:41:58 Tom Lane wrote:
> As Martin Pitt pointed out in this pgsql-bugs thread
> http://archives.postgresql.org/pgsql-bugs/2007-10/msg00089.php
> we have managed to create an ABI break between 8.2 and 8.3 libpq
> by renumbering encoding IDs in pg_wchar.h.  Although perhaps this
> does not hurt any third-party clients, it does break our own initdb
> and psql programs, which turn out to be dependent on libpq and the
> backend having the same numbering.  Going forward we should try to
> fix things so that the exact values of those IDs need not be considered
> part of libpq's ABI, but that doesn't get us out of the fact that 8.2
> psql will fail if you try to use it with CVS HEAD libpq.so.
>
> It seems that we are faced with a choice of two evils:
>
> 1.  Accept that there's an ABI break and increment libpq.so's major
> version number for 8.3.  This will be a PITA for packagers, who will
> have to carry a compatibility package to provide 8.2 libpq.so.
>
> 2.  Renumber 8.3's encoding IDs to preserve compatibility with the
> 8.2 values.  It turns out that we can do that, but we will have to
> force initdb because the contents of pg_database.encoding will change.
>
> I'm of the opinion that #2 is the lesser evil, but maybe I'm overly
> influenced by my Red Hat packaging responsibilities --- I'll personally
> have to spend time on a compatibility package if we go with #1.
> Other opinions out there?
>
> Also, if we do #2 it means that we have the option to resolve the
> contrib/txid mess by pushing txid into the core backend before beta2.
> Any votes pro or con on that?

I vote that adding txid to the backend now will be of better value for all 
parties involved than not, given it's coming without any particular pain. 
(though I may be a bit biased on this).  By having it in core it's one less 
thing that skytools/slony needs to detect and error out on with an message 
saying install contrib/txid during their install phases.  Not to mention this 
puts having replication that much more likely to be available in shared 
hosting environments who by all reports loth installing contrib for their 
customers.



>
>   regards, tom lane
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Seems we need a post-beta1 initdb already

2007-10-12 Thread Michael Glaesemann



On Oct 12, 2007, at 17:41 , Tom Lane wrote:


Also, if we do #2 it means that we have the option to resolve the
contrib/txid mess by pushing txid into the core backend before beta2.
Any votes pro or con on that?


+1

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Seems we need a post-beta1 initdb already

2007-10-12 Thread Devrim GÜNDÜZ

Hi,

On Fri, 2007-10-12 at 18:41 -0400, Tom Lane wrote:
> 
> I'm of the opinion that #2 is the lesser evil, but maybe I'm overly
> influenced by my Red Hat packaging responsibilities --- I'll
> personally
> have to spend time on a compatibility package if we go with #1.
> Other opinions out there?
> 
> Also, if we do #2 it means that we have the option to resolve the
> contrib/txid mess by pushing txid into the core backend before beta2.
> Any votes pro or con on that?

I am also for #2 (and yes, I want to avoid yet another compat rpm
problem...)

Regards,
-- 
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/




signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Hash index todo list item

2007-10-12 Thread Kenneth Marshall

On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote:
> On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
> > Dear PostgreSQL Hackers:
> > 
> > After following the hackers mailing list for quite a while,
> > I am going to start investigating what will need to be done
> > to improve hash index performance. Below are the pieces of
> > this project that I am currently considering:
> > 
> > 1. Characterize the current hash index implementation against
> >the BTree index, with a focus on space utilization and
> >lookup performance against a collection of test data. This
> >will give a baseline performance test to evaluate the impact
> >of changes. I initially do not plan to bench the hash creation
> >process since my initial focus will be on lookup performance.
> > 
> 
> Here are very basic results for a table with 1.6m entries:
> 
> postgres=# CREATE TABLE dict (word varchar(100));
> CREATE TABLE
> 
> postgres=# COPY dict FROM '/tmp/words';
> COPY 1648379
> postgres=# select count(*) from dict;
>   count  
> -
>  1648379
> (1 row)
> 
> Time: 11187.418 ms
> postgres=# select count(*) from dict;
>   count  
> -
>  1648379
> (1 row)
> 
> Time: 6040.912 ms
> postgres=# CREATE INDEX wordhash ON dict USING hash (word);
> CREATE INDEX
> Time: 11108707.160 ms
> 
> postgres=# select * from dict where word = 'avatar';
>   word  
> 
>  avatar
> (1 row)
> 
> Time: 79.823 ms
> postgres=# select * from dict where word = 'zebra';
>  word  
> ---
>  zebra
> (1 row)
> 
> Time: 9.864 ms
> postgres=# select * from dict where word = 'turkey'; 
>   word  
> 
>  turkey
> (1 row)
> 
> Time: 18.418 ms
> Time: 1.045 ms
> Time: 1.257 ms
> Time: 1.080 ms
> 
> postgres=# CREATE INDEX wordbtree ON dict USING btree (word);
> CREATE INDEX
> 
> Time: 25438.884 ms
> 
> postgres=# select * from dict where word = 'avatar';
>   word  
> 
>  avatar
> (1 row)
> 
> Time: 13.400 ms
> postgres=# select * from dict where word = 'zebra';
>  word  
> ---
>  zebra
> (1 row)
> 
> Time: 1.173 ms
> postgres=# select * from dict where word = 'turkey';
>   word  
> 
>  turkey
> (1 row)
> 
> Time: 1.186 ms
> Time: 1.103 ms
> Time: 1.099 ms
> Time: 1.108 ms
> 
> --
> Size of table =   87556096
> 
> Size of hash index = 268451840
> Size of btree index = 53510144
> 
> From my very small sample on an unloaded machine, a hash index lookup
> took the least amount of time. It had a much larger initial time which
> could be attributable to cache population effects. The size is 5X that
> of the Btree index. I will continue to improve the test suite as more
> granularity is needed. If anyone has a good data generator, please let
> me know. Otherwise I will just roll my own.
> 
> Regards,
> Ken
> 
I have just re-ran the previous benchmark against 8.3beta1 (versus 8.2)
including the hash index sorted build patch and the results are below:

test=# CREATE TABLE dict (word varchar(100));
CREATE TABLE
Time: 24.463 ms
test=# COPY dict FROM '/tmp/cracklib-words';
LOG:  checkpoints are occurring too frequently (21 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
COPY 1648379
Time: 44470.235 ms
test=# select count(*) from dict;
  count  
-
 1648379
(1 row)

Time: 4553.924 ms
test=# CREATE INDEX wordhash ON dict USING hash (word);
CREATE INDEX
Time: 85905.960 ms
test=# select * from dict where word = 'avatar';
  word  

 avatar
(1 row)

Time: 592.906 ms
test=# select * from dict where word = 'zebra';
 word  
---
 zebra
(1 row)

Time: 21.458 ms
test=# select * from dict where word = 'turkey';
  word  

 turkey
(1 row)

Time: 22.532 ms
Time: 1.224 ms
Time: 1.200 ms
Time: 1.264 ms
test=# CREATE INDEX wordbtree ON dict USING btree (word);
CREATE INDEX
Time: 33714.874 ms
test=# select * from dict where word = 'avatar';
  word  

 avatar
(1 row)

Time: 24.296 ms
test=# select * from dict where word = 'zebra';
 word  
---
 zebra
(1 row)

Time: 1.400 ms
test=# select * from dict where word = 'turkey';
  word  

 turkey
(1 row)

Time: 28.302 ms
Time: 1.284 ms
Time: 1.313 ms

---
Size of table = 69877760

Size of hash index =   268451840
Size of btree index =   48570368

I just wanted to have this baseline for future patches since
8.3 is just around the bend. As expected, the sorted build
process is a huge improvement from the unsorted original.

Regards,
Ken Marshall

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Another Idea: Try Including snapshot with TOAS (was: Including Snapshot Info with Indexes)

2007-10-12 Thread Gokulakannan Somasundaram

Hi Simon/Hannu,

a) I accept with storing ctid in place of oid. That would definitely improve
the performance. I guess it would also remove the restriction of the size on
TOASTABLE ATTRIBUTES. I guess different chunks have to be linked together,
without the index.

b) But i don't understand how storing the visibility info in TOAST table
would help you. In that case you may need to update it for every delete/
update happening in the main table. Only thing, it would help is if someone
wants to do a full table scan on TOAST table alone. May be Vacuum of TOAST
tables can be done independently. But do you think it is worth the loss of
performance in Update/Delete?

Thanks,
Gokul.

On 10/8/07, Simon Riggs <[EMAIL PROTECTED]> wrote:
>
> On Mon, 2007-10-08 at 14:58 +0300, Hannu Krosing wrote:
>
> > 1. get rid of indexes for TOAST tables
> >
> > instead of TOAST tuple id store CTID of first TOAST block directly, and
> > use something like skip lists inside the TOAST block headers to access
> > next TOAST tuples.
>
> It should be possible to optimise TOAST for when there is just a single
> chunk that is toasted. Since we often (and by default) compress data
> stored in TOAST tables this would be a frequently used optimisation.
>
> Instead of storing the TOAST OID, which is then looked-up in the index
> to find the TOAST tid, we can just store the tid directly in the toast
> pointer on the main heap. That way we'll get faster read and write
> access for a good proportion of rows by avoiding the toast index and
> going straight to the toast heap.
>
> We'd need a different kind of toast pointer which would be 2 bytes
> longer than the normal pointer. I think we have a spare flag bit to
> indicate this.
>
> That's just a rough sketch, I've not checked the details on this.
>
> --
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com
>
>

Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-12 Thread Gokulakannan Somasundaram

On 10/12/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> Andreas Joseph Krogh <[EMAIL PROTECTED]> writes:
> > Will $SUBJECT make it possible for count(*) to use index? This is a much
> > wanted feature.
>
> If you mean "count(*) will become instantaneous", no it won't.  It would
> get faster, but probably not by more than a factor of 10 or so,
> corresponding to the size ratio between index and table.  Remember that
> the proposal is going to bloat indexes pretty badly: a simple index on
> an int4 column will double in size, more or less.  So that ratio will
> be much less favorable than it is today.

I accept that the indexes will be bigger in size for this approach. You
might need more disk-space  and you might need more memory to accomodate the
same amount of information. But i think disk costs and memory costs have
come down a lot, People can afford to buy more disk and memory. But the only
fact that remains true is that the disk, the last mechanical device is slow
in addressing Random I/Os. So this proposal is aimed  at occupying more
memory and disk space to reduce Random I/Os. Say, if we are accomodating 200
tuples per index page in today's index(for a typical configuration), and as
you said in the worst case (only one index field), we will be occupying 100
tuples per index page. But we would take away probably 100 random I/Os (say
with bitmap scan it reduces to 75). 1GB of memory is around $100 and 1GB of
disk is around $1. But one hour of Performance tuner would cost around $200
:)). So that's the trade-off for the enterprises, if they want to shift
between the two indexes.

Personally I think the bloat problem will doom this entire approach.
> The distributed costs of that will outweigh the speedups that can be
> achieved for specific queries.  The OP is free to try to prove this
> fear wrong, but no amount of discussion will constitute such a proof;
> only a testable implementation.
>
> regards, tom lane
>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org
>

Re: [HACKERS] Another Idea: Try Including snapshot with TOAS (was: Including Snapshot Info with Indexes)

2007-10-12 Thread Simon Riggs

On Sat, 2007-10-13 at 11:12 +0530, Gokulakannan Somasundaram wrote:

> b) But i don't understand how storing the visibility info in TOAST
> table would help you. In that case you may need to update it for every
> delete/ update happening in the main table. Only thing, it would help
> is if someone wants to do a full table scan on TOAST table alone. May
> be Vacuum of TOAST tables can be done independently. But do you think
> it is worth the loss of performance in Update/Delete? 

No

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

72 matches

Mail list logo