Re: [HACKERS] Covering Indexes

2012-07-06 Thread Csaba Nagy
Hi all,

 On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler da...@justatheory.com 
 wrote:
 I don't see the virtue of this in this case.  Since the index is not
 unique, why not just put the index on (a,b,c,d) and be done with it?
 Is there some advantage to be had in inventing a way to store c and d
 in the index without having them usable for indexing?

Why not restrict it to UNIQUE indexes ?

For not unique indexes it has no advantages (it could be in fact indexed
on all columns anyway as an implementation detail).

For the unique case the problem of many identical entries mentioned by
Tom is not relevant, so the additional data will only bloat the index
but not otherwise affect the index performance.

Would this get close enough to index-covered table ? _That_ would be
interesting - I have a really big table (table/index size: 370G/320G,
~8*10^9 rows) which is basically using double space because it's primary
key is covering all columns of the table.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Csaba Nagy
Hi Tomas,

On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote:
 No, the multi-column statistics do not require constant updating. There
 are cases where a sampling is perfectly fine, although you may need a
 bit larger sample. Generally if you can use a multi-dimensional
 histogram, you don't need to scan the whole table.

In the cases where sampling is enough, you can do that to the updates
too: do a sampling on the changes, in that you only process every Nth
change to make it to the estimator. If you can also dynamically tune the
N to grow it as the statistics stabilize, and lower it if you detect
high variance, even better.

If the analyze process could be decoupled from the backends, and maybe
just get the data passed over to be processed asynchronously, then that
could be a feasible way to have always up to date statistics when the
bottleneck is IO and CPU power is in excess. If that then leads to
better plans, it could really be a win exceeding the overhead.

If this analyze process (or more of them) could also just get the data
from the modified buffers in a cyclic way, so that backends need nothing
extra to do, then I don't see any performance disadvantage other than
possible extra locking contention on the buffers and non-determinism of
the actual time when a change makes it to the statistics. Then you just
need to get more CPU power and higher memory bandwidth to pay for the
accurate statistics.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Csaba Nagy
On Tue, 2011-01-11 at 07:14 -0500, Noah Misch wrote:
 On Tue, Jan 11, 2011 at 09:24:46AM +, Simon Riggs wrote:
  I have a concern that by making the ALTER TABLE more complex that we
  might not be able to easily tell if a rewrite happens, or not.

What about add EXPLAIN support to it, then whoever wants to know what it
does should just run explain on it. I have no idea how hard that would
be to implement and if it makes sense at all, but I sure would like to
see a plan for DDLs too to estimate how long it would take.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-10 Thread Csaba Nagy
On Fri, 2011-01-07 at 12:32 +0100, t...@fuzzy.cz wrote:
 the problem is you will eventually need to drop the results and rebuild
 it, as the algorithms do not handle deletes (ok, Florian mentioned an
 algorithm L_0 described in one of the papers, but I'm not sure we can use
 it).

Yes, but even then you can start with much better cards if you already
have an estimate of what it looks like, based on the fact that you did
continuous updating of it. For example you'll have a pretty good
estimate of the bounds of the number of distinct values, while if you
really start from scratch you have nothing to start with but assume that
you must cope with the complete range between all values are distinct -
there's only a few of them.

 I'm not sure a constantly running background process is a good idea. I'd
 prefer storing an info about the modified tuples somewhere, and starting
 analyze only when a given threshold is reached. I'm not sure how to do
 that, though.
 
 Another thing I'm not sure about is where to store those intermediate
 stats (used to get the current estimate, updated incrementally).

The forks implementation proposed in other responses is probably the
best idea if usable. It will also solve you the problem of memory
limitations, at the expense of more resources used if the structure
grows too big, but it will still be probably fast enough if it stays
small and in cache.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2011-01-07 Thread Csaba Nagy
On Thu, 2010-12-30 at 21:02 -0500, Tom Lane wrote:
 How is an incremental ANALYZE going to work at all?

How about a kind of continuous analyze ?

Instead of analyzing just once and then drop the intermediate results,
keep them on disk for all tables and then piggyback the background
writer (or have a dedicated process if that's not algorithmically
feasible) and before writing out stuff update the statistics based on
the values found in modified buffers. Probably it could take a random
sample of buffers to minimize overhead, but if it is done by a
background thread the overhead could be minimal anyway on multi-core
systems.

Not sure this makes sense at all, but if yes it would deliver the most
up to date statistics you can think of.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2010-12-15 Thread Csaba Nagy
On Wed, 2010-12-15 at 10:39 +, Simon Riggs wrote:
 Perhaps a more useful definition would be
 
 EXCHANGE TABLE target WITH source;
 
 which just swaps the heap and indexes of each table.
 You can then use TRUNCATE if you want to actually destroy data.

Yes please, that's exactly what I would have needed in many occasions.

But one problem would be when the replaced table is the _parent_ for a
foreign key relationship. I don't think you can have that constraint
pre-verified on the replacement table and simply replacing the content
could leave the child relations with orphans.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2010-12-15 Thread Csaba Nagy
On Tue, 2010-12-14 at 14:36 -0500, Robert Haas wrote:
  Well, you have to do that for DROP TABLE as well, and I don't see any
  way around doing it for REPLACE WITH.
 
 Sure, but in Simon's proposal you can load the data FIRST and then
 take a lock just long enough to do the swap.  That's very different
 from needing to hold the lock during the whole data load.

Except Simon's original proposal has this line in it:

* new_table is TRUNCATEd.

I guess Simon mixed up new_table and old_table, and the one which
should get truncated is the replaced one and not the replacement,
otherwise it doesn't make sense to me.

BTW, I would have also used such a feature on multiple occasions in the
past and expect I would do in the future too.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another proposal for table synonyms

2010-12-01 Thread Csaba Nagy
Hi all,

On Tue, 2010-11-30 at 12:05 -0800, Josh Berkus wrote:
 Can you explain, for our benefit, the use case for this?  Specifically,
 what can be done with synonyms which can't be done with search_path and
 VIEWs?

I had a few cases where synonyms for user/data base names would have
helped me slightly (not something I couldn't work around).

The actual use case was when I wanted to change the name of a data base
and user names (just a configuration coherency thing) - using a synonym
I could have done it without downtime by creating the synonym first,
then reconfiguring each application machine individually (they are
redundant, and can be restarted transparently). Without the synonyms, I
had to wait for the next full downtime (which we do quite rarely) and
reconfigure the DB and all application boxes at the same time.

Ideally the user/DB name synonym would be like a hard link, equivalent
in all respects to the original name, so that you can delete the
original name and the synonym continues to work.

Likely the pg_hba.conf would need to still distinguish between the
DB/user names and their synonyms - not sure if that could be useful or
would be a PITA.

Of course this has nothing to do with the table synonyms - there I
didn't have yet any reason I would use one...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DELETE with LIMIT (or my first hack)

2010-11-30 Thread Csaba Nagy
Hi all,

The workaround recommended some time ago by Tom is:

DELETE FROM residents_of_athens WHERE ctid = any(array(SELECT ctid FROM
residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1));

It is about as efficient as the requested feature would be, just uglier
to write down. I use it all the time when batch-deleting something large
(to avoid long running transactions and to not crash slony). It also
helps to vacuum frequently if you do that on large amount of data...

Cheers,
Csaba.

On Tue, 2010-11-30 at 00:05 -0500, Robert Haas wrote:
 On Mon, Nov 29, 2010 at 11:25 PM, Andrew Dunstan and...@dunslane.net wrote:
 
 
  On 11/29/2010 10:19 PM, Robert Haas wrote:
 
  For example, suppose we're trying to govern an ancient Greek
  democracy:
 
  http://en.wikipedia.org/wiki/Ostracism
 
  DELETE FROM residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1;
 
  I'm not sure this is a very good example. Assuming there isn't a tie, I'd do
  it like this:
 
  DELETE FROM residents_of_athens
  WHERE ostracism_votes = 6000
 and ostracism_votes =
  (SELECT max(ostracism_votes)
   FROM residents_of_athens);
 
 That might be a lot less efficient, though, and sometimes it's not OK
 to delete more than one record.  Imagine, for example, wanting to
 dequeue the work item with the highest priority.  Sure, you can use
 SELECT ... LIMIT to identify one and then DELETE it by some other key,
 but DELETE .. ORDER BY .. RETURNING .. LIMIT would be cool, and would
 let you do it with just one scan.
 
  I can't say I'd be excited by this feature. In quite a few years of writing
  SQL I don't recall ever wanting such a gadget.
 
 It's something I've wanted periodically, though not badly enough to do
 the work to make it happen.
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DELETE with LIMIT (or my first hack)

2010-11-30 Thread Csaba Nagy
Hi Robert,

On Tue, 2010-11-30 at 09:19 -0500, Robert Haas wrote:
 That's a very elegant hack, but not exactly obvious to a novice user
 or, say, me.  So I think it'd be nicer to have the obvious syntax
 work.

I fully agree - but you first have to convince core hackers that this is
not just a foot-gun. This was discussed many times in the past, patches
were also offered (perhaps not complete one, but proving that there is
an itch getting scratched):

http://archives.postgresql.org/pgsql-patches/2002-09/msg00255.php

The reaction:

http://archives.postgresql.org/pgsql-patches/2002-09/msg00256.php

There are other discussions too, if I remember correctly Tom once
admitted that the core of implementing the feature would likely consist
in letting it work, as the infrastructure is there to do it but it is
actively disabled. I can't find the mail now though.

So it is really an ideological thing and not lack of demand or
implementation attempts... I for myself can't write working C code
anyway, so I got my peace with the workaround - I wish you good luck
arguing Tom :-)

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-23 Thread Csaba Nagy
Hi all,

Some time ago I was also interested in this feature, and that time I
also thought about complete setup possibility via postgres connections,
meaning the transfer of the files and all configuration/slave
registration to be done through normal backend connections.

In the meantime our DBs are not able to keep in sync via WAL
replication, that would need some kind of parallel WAL restore on the
slave I guess, or I'm not able to configure it properly - in any case
now we use slony which is working. In fact the way slony is doing the
configuration could be a good place to look...

On Wed, 2010-09-22 at 13:16 -0400, Robert Haas wrote:
  I guarantee you there is a way around the cascade slave problem.
 
 And that would be...?

 * restrict the local file configuration to a replication ID;

 * make all configuration refer to the replica ID;

 * keep all configuration in a shared catalog: it can be kept exactly
the same on all replicas, as each replication node will only care
about the configuration concerning it's own replica ID;

 * added advantage: after take-over the slave will change the configured
master to it's own replica ID, and if the old master would ever connect
again, it could easily notice that and give up;

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-23 Thread Csaba Nagy
On Thu, 2010-09-23 at 12:02 +0300, Heikki Linnakangas wrote:
 On 23/09/10 11:34, Csaba Nagy wrote:
  In the meantime our DBs are not able to keep in sync via WAL
  replication, that would need some kind of parallel WAL restore on the
  slave I guess, or I'm not able to configure it properly - in any case
  now we use slony which is working.
 
 It would be interesting to debug that case a bit more. Was bottlenecked 
 by CPU or I/O, or network capacity perhaps?

Unfortunately it was quite long time ago we last tried, and I don't
remember exactly what was bottlenecked. Our application is quite
write-intensive, the ratio of writes to reads which actually reaches the
disk is about 50-200% (according to the disk stats - yes, sometimes we
write more to the disk than we read, probably due to the relatively
large RAM installed). If I remember correctly, the standby was about the
same regarding IO/CPU power as the master, but it was not able to
process the WAL files as fast as they were coming in, which excludes at
least the network as a bottleneck. What I actually suppose happens is
that the one single process applying the WAL on the slave is not able to
match the full IO the master is able to do with all it's processors.

If you're interested, I could try to set up another try, but it would be
on 8.3.7 (that's what we still run). On 9.x would be also interesting,
but that would be a test system and I can't possibly get there the load
we have on production...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-23 Thread Csaba Nagy
On Thu, 2010-09-23 at 16:18 +0300, Heikki Linnakangas wrote:
 There's a program called pg_readahead somewhere on pgfoundry by NTT that 
 will help if it's the single-threadedness of I/O. Before handing the WAL 
 file to the server, it scans it through and calls posix_fadvise for all 
 the blocks that it touches. When the server then replays it, the data 
 blocks are already being fetched by the OS, using the whole RAID array.

That sounds useful, thanks for the hint !

But couldn't this also be directly built in to WAL recovery process ? It
would probably help a lot for recovering from a crash too. We did have
recently a crash and it took hours to recover.

I will try it out as soon as I get the time to set it up...

[searching pgfoundry] 

Unfortunately I can't find it, and google is also not very helpful. Do
you happen to have some links to it ?

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring synchronous replication

2010-09-23 Thread Csaba Nagy
On Thu, 2010-09-23 at 11:43 -0400, Tom Lane wrote:
  What other problems are there that mean we *must* have a file?
 
 Well, for one thing, how do you add a new slave?  If its configuration
 comes from a system catalog, it seems that it has to already be
 replicating before it knows what its configuration is.

Or the slave gets a connection string to the master, and reads the
configuration from there - it has to connect there anyway...

The ideal bootstrap for a slave creation would be: get the params to
connect to the master + the replica ID, and the rest should be done by
connecting to the master and getting all the needed thing from there,
including configuration.

Maybe you see some merit for this idea: it wouldn't hurt to get the
interfaces done so that the master could be impersonated by some WAL
repository serving a PITR snapshot, and that the same WAL repository
could connect as a slave to the master and instead of recovering the WAL
stream, archive it. Such a WAL repository would possibly connect to
multiple masters and could also get regularly snapshots too. This would
provide a nice complement to WAL replication as PITR solution using the
same protocols as the WAL standby. I have no idea if this would be easy
to implement or useful for anybody.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Differential backup

2010-04-27 Thread Csaba Nagy
Hi all,

On Tue, 2010-04-27 at 11:07 -0400, Merlin Moncure wrote:
 The block level case seems pretty much covered by the hot standby feature.

One use case we would have is to dump only the changes from the last
backup of a single table. This table takes 30% of the DB disk space, it
is in the order of ~400GB, and it's only inserted, never updated, then
after ~1 year the old entries are archived. There's ~10M new entries
daily in this table. If the backup would be smart enough to only read
the changed blocks (in this case only for newly inserted records), it
would be a fairly big win...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgadmin-hackers] Feature request: limited deletions

2010-04-08 Thread Csaba Nagy
Hi all,

On Thu, 2010-04-08 at 07:45 -0400, Robert Haas wrote:
  2010/4/8 Thom Brown thombr...@gmail.com:
   So you could write:
  
   DELETE FROM massive_table WHERE id  4000 LIMIT 1;

 I've certainly worked around the lack of this syntax more than once.
 And I bet it's not even that hard to implement.

The fact that it's not implemented has nothing to do with it's
complexity (in fact it is probably just a matter of enabling it) -
you'll have a hard time to convince some old-time hackers on this list
that the non-determinism inherent in this kind of query is
acceptable ;-)

There is a workaround to do it, which works quite good in fact:

delete from massive_table where ctid = any(array(select ctid from
massive_table WHERE id  4000 LIMIT 1));

Just run an explain on it and you'll see it won't get any better, but
beware that it might be less optimal than you think, as you will be
likely sequential scanning the table for each chunk unless you put some
selective where conditions on it too - and then you'll still scan the
whole deleted part and not just the next chunk - the deleted records
won't go out of the way magically, you need to vacuum, and that's
probably a problem too on a big table. So most likely it will help you
less than you think on a massive table, the run time per chunk will
increase with each chunk unless you're able to vacuum efficiently. In
any case you need to balance the chunk size with the scanned portion of
the table so you get a reasonable run time per chunk, and not too much
overhead of the whole chunking process...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to beta1

2010-03-19 Thread Csaba Nagy
Hi all,

On Thu, 2010-03-18 at 10:18 -0700, Josh Berkus wrote:
 Or, let's put it another way: I've made my opinion clear in the past
 that I think that we ought to ship with a minimal postgresql.conf with
 maybe 15 items in it.  If we are going to continue to ship with
 postgresql.conf kitchen sick version, however, it should include
 vacuum_defer_cleanup_age.

But considering that these are samples anyway, what is preventing to
ship 2 versions, one labeled minimal and one labeled full ?

The minimal one should only contain absolutely must-change items, and
the full version should contain all. As simple as that. I don't think
there's any value in anything in the middle...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Csaba Nagy
Hi all,

On Mon, 2010-02-22 at 10:29 +, Greg Stark wrote:
 On Mon, Feb 22, 2010 at 8:18 AM, Gokulakannan Somasundaram
 gokul...@gmail.com wrote:
  a) IOT has both table and index in one structure. So no duplication of data
  b) With visibility maps, we have three structures a) Table b) Index c)
  Visibility map. So the disk footprint of the same data will be higher in
  postgres ( 2x + size of the visibility map).
 
 These sound like the same point to me. I don't think we're concerned
 with footprint -- only with how much of that footprint actually needs
 to be scanned. 

For some data the disk foot-print would be actually important: on our
data bases we have one table which has exactly 2 fields, which are both
part of it's primary key, and there's no other index. The table is
write-only, never updated and rarely deleted from.

The disk footprint of the table is 30%-50% of the total disk space used
by the DB (depending on the other data). This amounts to about 1.5-2TB
if I count it on all of our DBs, and it has to be fast disk too as the
table is heavily used... so disk space does matter for some. 

And yes, I put the older entries in some archive partition on slower
disks, but I just halve the problem - the data is growing exponentially,
and about half of it is always in use. I guess our developers are just
ready to get this table out of postgres and up to hadoop...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-04 Thread Csaba Nagy
On Wed, 2009-12-30 at 17:16 +0100, Tom Lane wrote:
 I think the cleanest solution to this would be to make ANALYZE
 cheaper, perhaps by finding some way for it to work incrementally.

What if when inserting/deleting a tuple, some random sample of them
would be passed into an auto-analyze buffer ?

Then a special process (the auto-analyze daemon) would process them and
update the statistics incrementally based on the new values found (which
might or might not be mathematically feasible).

The overhead for each backend process would be kept in limits by the
rate at which you randomly send or not send the change to the analyze
buffer.

The processing overhead would be kept in limits by the processing rate
of the auto-analyze process, which can be made to periodically sleep or
it could be made to span multiple processes (on multiprocessor systems).

If the buffer is full, then you skip putting in it... so it also could
autotune itself to a sustainable rate.


Of course as with all my other posts on hackers, this is all mostly
hand-waving, I have no clue about the feasibility of all this with
regard to the current state of the code (which I didn't read, I
unfortunately found myself hating reading C code beyond reason, and
writing any of it till now resumed to copy-paste-modify).

Cheers,
Csaba.


Csaba Nagy
Software Engineer 
 
 
eCircle 
P: +49 (0)89 / 120 09-783 | F: +49 (0)89 / 120 09-750
E: c.n...@ecircle.com
Nymphenburger Str. 86, 80636 München  
 
Stay in touch
Web: www.ecircle.com/de | Newsletter: www.ecircle.com/index.php?id=63L=0

Für Hilfe mit dem eC-messenger wenden Sie sich bitte an unseren 
Support: support...@ecircle.com.

Neuste Untersuchungen
Ein unschlagbares Doppel: E-mail-Marketing  Webanalyse
Download Whitepaper: www.ecircle.com/index.php?id=61L=0
 
eCircle AG, HRB 136 334, Handelsregister München Vorstand: 
Volker Wiewer (Vorsitzender), Thomas Wilke, Lars Wössner, 
Alexander Meyer Vorsitzender des Aufsichtsrates: Dr. Mark Wössner  
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deadlock error in INSERT statements

2009-10-07 Thread Csaba Nagy
On Wed, 2009-10-07 at 16:34 +0200, Gnanam wrote:
 NOTE: I've seen deadlock errors in UPDATE statement but why it is throwing
 in INSERT statements.

It is because of the foreign key. Inserting a child row will lock the
corresponding parent row, and if you insert multiple rows with different
parents in the same transaction, and do that in different concurrent
transactions but in different order of the parent rows, you can get a
deadlock. If you keep in mind that the parent row is locked on the
insert of a child row, you will figure out what's happening...

BTW, I don't think the hackers list is the right one for this kind of
question, better use the general list...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] moving system catalogs to another tablespace

2009-10-06 Thread Csaba Nagy
Hi all,

On Tue, 2009-10-06 at 16:58 +0200, Tom Lane wrote:
 Yeah, I have sometimes thought that pg_largeobject shouldn't be
 considered a system catalog at all.  It's more nearly like a toast
 table, ie, it's storing out of line user data.

pg_largeobject in it's current form has serious limitations, the biggest
one is that it can't have triggers, and thus it can't be replicated by
trigger based replication like slony. 

I ended up rolling my own large object table, modeling exactly the
behavior of pg_largeobject but on the client side, except I can
replicate it... and a few other simple things like easily duplicating an
entry from client side code, and easier control of the large object ID
ranges - BTW, OID is not the best data type for a client visible primary
key, then better BIGINT, oid is unsigned and in Java for example won't
cleanly map to any data type (java long is twice as big as needed and
int is signed and won't work for all OID values - we finally had to use
long, but then BIGINT is a better match). Considering that the postgres
manual says: using a user-created table's OID column as a primary key
is discouraged, I don't see why use OID as the primary key for a table
which can potentially outgrow the OID range.

The backup is also not a special case now, it just dumps the table. I
don't know what were the reasons of special casing pg_largeobject, but
from a usability POV is fairly bad.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication patch for CommitFest 2009-09

2009-09-17 Thread Csaba Nagy
On Thu, 2009-09-17 at 10:08 +0200, Heikki Linnakangas wrote:
 Robert Haas suggested a while ago that walreceiver could be a
 stand-alone utility, not requiring postmaster at all. That would allow
 you to set up streaming replication as another way to implement WAL
 archiving. Looking at how the processes interact, there really isn't
 much communication between walreceiver and the rest of the system, so
 that sounds pretty attractive.

Just a small comment in this direction: what if the archive would be
itself a postgres DB, and it would collect the WALs in some special
place (together with some meta data, snapshots, etc), and then a slave
could connect to it just like to any other master ? (except maybe it
could specify which snapshot to to start with and possibly choosing
between different archived WAL streams).

Maybe it is completely stupid what I'm saying, but I see the archive as
just another form of a postgres server, with the same protocol from the
POV of a slave. While I don't have the clue to implement such a thing, I
thought it might be interesting as an idea while discussing the
walsender/receiver interface...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output v4

2009-08-12 Thread Csaba Nagy
On Tue, 2009-08-11 at 23:58 +0200, Andrew Dunstan wrote:
 Well, I don't think that the fact that we are producing machine-readable 
 output means we can just ignore the human side of it. It is more than 
 likely that such output will be read by both machines and humans. 
 Obviously, we need to make sure that it meets machine processing needs 
 first, but once we have done that we should not ignore the human 
 requirements.

XML is easy to transform to about anything else. I would vote for the
XML output to only focus on machine readability and completeness, and
then if needed provide style sheets to make it human readable. Then
anybody could have his preferred style to look at it. If there would be
a tool to format the XML according to some style sheet (to make it easy
for those who don't care about XML and style sheets), I would volunteer
to provide the XSL style sheets for different formats on request.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output v4

2009-08-12 Thread Csaba Nagy
On Wed, 2009-08-12 at 15:42 +0200, Andrew Dunstan wrote:
 Have you actually looked at a logfile with this in it? A simple 
 stylesheet won't do at all. What you get is not an XML document but a 
 text document with little bits of XML embedded in it. So you would need 
 a program to parse that file and either turn it into a single legal XML 
 document or pass each piece of XML individually to your XSLT processor. 
 Bleah.

I'm pretty sure you will never find a human readable format which is
easily extracted from the logs by a program. But if you format the XML
in a (very human unreadable) one-line-without-breaks format then it will
be a lot easier extracted by a program and formatted at your will.

 And all this because you pose a false dichotomy between correctness and 
 completeness on one hand and human readability on the other. I don't 
 accept that at all. I think we can and should improve human readability 
 without sacrificing anything on the correctness and completeness front. 
 In fact, that also needs improving, and we can do them both at the same 
 time.

I really really doubt that. I would go here on the UNIX approach of
piping the things through the right tools, each one doing a simple and
good job for it's single and well defined purpose. So let the explain
spit out a line of XML without much thought about formatting but
focusing on completeness, making it easy for tools to get that line, and
then let the tools do the formatting depending on what you want to do
with the information. Each part will be simpler than you would put in a
directly human readable XML (if that's possible at all) approach, which
will anyway not cover all the uses and tastes.

 I want to be able to have machine readable explain output, but I also 
 want to be able to browse the logs without wasting more brain cells than 
 necessary and without having to use external tools other than by 
 standard text browser (less). As Pooh Bear said, Both please!

I argue that a sufficiently complicated explain output will never be
easily navigated in a text browser, however much you would like it. If
you do a where clause with 100 nested ANDs (which occasionally happens
here), I don't think you'll be able to cleanly show that in a text
browser - it will simply not fit in no matter how you format it. But
using the right GUI tool (even a generic XML one) it would be easy to
just temporarily collapse the whole top AND and have a clean view of the
rest.

Cheers,
Csaba.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output v4

2009-08-12 Thread Csaba Nagy
On Wed, 2009-08-12 at 16:51 +0200, Csaba Nagy wrote:
 I argue that a sufficiently complicated explain output will never be
 easily navigated in a text browser, however much you would like it. If
 you do a where clause with 100 nested ANDs (which occasionally happens
 here), I don't think you'll be able to cleanly show that in a text
 browser - it will simply not fit in no matter how you format it. But
 using the right GUI tool (even a generic XML one) it would be easy to
 just temporarily collapse the whole top AND and have a clean view of the
 rest.

Just a small note: I don't know how the machine-readable auto-explain
output actually looks like, so I just assumed individual conditions will
have their own XML node. But even if they don't have and the AND thing
would be flat in the explain output, the same argument applies to even a
few 10s of joins or sub-selects (and we do have that too). Sometimes
collapsing parts of the plan would help big time in having a clean
picture, which (starting with a certain complexity of the query) you
will have a hard time to do in plain text regardless of the formatting.
I do have plenty of explain outputs (from 8.2 postgres) which don't fit
on my screen in text form even without further details I gather you can
get in the machine readable form.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output v4

2009-08-12 Thread Csaba Nagy
On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote:
 That will just make things worse. And it will break if the XML includes 
 any expression that contains a line break.

Then escape the expressions using CDATA or such... I'm sure it would be
possible to make sure it's one line and rely on that. That's part of
being machine readable, being able to rely on getting it at all without
too much parsing magic...

 I repeat, I want to be able to have a log file that is both machine 
 processable and not utterly unreadable by a human. And I do not accept 
 at all that this is impossible. Nor do I accept I should need some extra 
 processing tool to read the machine processable output without suffering 
 brain damage. If we were to adopt your approach I bet you would find 
 that nobody in their right mind would use the machine readable formats.

Then why you bother calling it machine readable at all ? Would you
really read your auto-explain output on the DB server ? I doubt that's
the common usage scenario, I would expect that most people would let a
tool extract/summarize it and definitely process it somewhere else than
on the DB machine, with the proper tool set. For ad-hoc explain analysis
which I think it's the typical use case for on-the-DB-server inspection
of text-format explain output you would surely use something else than
what is called machine readable format...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output v4

2009-08-12 Thread Csaba Nagy
On Wed, 2009-08-12 at 17:31 +0200, Csaba Nagy wrote:
 On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote:
  That will just make things worse. And it will break if the XML includes 
  any expression that contains a line break.
 
 Then escape the expressions using CDATA or such... I'm sure it would be
 possible to make sure it's one line and rely on that. That's part of
 being machine readable, being able to rely on getting it at all without
 too much parsing magic...

Looks like today I'm talking consistently stupid... CDATA of course
won't help in avoiding line breaks, but it would be still possible to
define an XML entity for that. This starts to be too complicated anyway,
so probably not worth it. In any case I still think making it easy for a
tool to extract/parse the information should be higher priority than
human readability - because the information itself is not really human
readable without external help except for simple queries, and I doubt
the main use case is simple queries.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output v4

2009-08-12 Thread Csaba Nagy
On Wed, 2009-08-12 at 17:41 +0200, Andrew Dunstan wrote:
 
 Csaba Nagy wrote:
  Then why you bother calling it machine readable at all ? Would you
  really read your auto-explain output on the DB server ? I doubt that's
  the common usage scenario, I would expect that most people would let a
  tool extract/summarize it and definitely process it somewhere else than
  on the DB machine, with the proper tool set.

 Sure I would. I look at log files almost every day to find out things.  
 Why should I have to wade through a pile of utterly unreadable crap to 
 find it?

I look at log files every day too (not almost, but literally every day),
but I really can't imagine myself looking at an explain output directly
in the log file. Even with the output of an 8.2 server which has less
detail than I think the new formats have, I always copy the text from
the psql prompt to some friendlier tool where I can play around with it,
delete parts of it if needed for the sake of clear overview and where I
can easily switch between line-wrap or not and such. I simply don't
believe that a remotely human presentation of the thing worths the
slightest compromise in machine readability. That said, I would like to
finish this discussion here because it gets pointless, I agree to let us
disagree :-)

 Auto-explain lets you have *one* output format. To follow your approach, 
 I will have to change that, and have two log files, one machine 
 processable and one human readable. Triple bleah.

By ad-hoc I didn't mean reading the auto-explain log, that's surely no
ad-hoc operation... I would make that a mandatory daily routine which is
better handled by a tool which serves you directly the plans of the
worst performing queries sorted by runtime for example and highlighting
the obvious planner mis-estimates. By ad-hoc I mean a query you examine
on the psql command line, and there you can expect human readable of
course without considerations to machine readability.

 I have not suggested anything that would break the machine readability. 
 You seem to think that making the machine readable output remotely human 
 friendly is somehow going to detract from its machine processability. 
 But that's just silly, frankly. I do not want and should not have to 
 choose between a format that is machine readable and one that is to some 
 extent human readable.

OK, if you can do that it's fine... but I'm afraid that will lead to
compromises on the machine readability side and will only delay the
whole thing.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] machine-readable explain output v4

2009-08-12 Thread Csaba Nagy
On Wed, 2009-08-12 at 18:07 +0200, Andrew Dunstan wrote:
 
 Csaba Nagy wrote:
  On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote:
 Well, the right solution would actually be NOT to use CDATA but to 
 replace a literal linefeed with the XML numeric escape #x0a; , but I 
 really don't think it's necessary.

Yes you're right, with the extraction too... 

 The extraction tools will be simple whether or not we put everything on 
 one line.

Well I normally avoid anything containing caffeine, and today I had my
first coffee of the year and 2 cokes too. I would say ignore the whole
discussion of on the grounds of abnormal hyperactivity on my part...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax

2009-04-21 Thread Csaba Nagy
On Tue, 2009-04-21 at 11:43 -0400, Robert Haas wrote:
 This doesn't sound like a very good idea, because the planner cannot
 then rely on the overflow table not containing tuples that ought to be
 within some other partition.
 
 The big win that is associated with table partitioning is using
 constraint exclusion to avoid unnecessary partitions scans.

Well it could always check 2 partitions: the overflow and the one
selected by the constraint exclusion. If the overflow is kept empty by
properly setting up the partitions so that all insertions always go to
one of the active partitions, that would be cheap enough too while still
providing a way to catch unexpected data. Then when a new partition is
defined, there's no need to shuffle around data immediately, but there
could be a maintenance command to clean up the overflow... not to
mention that you could define a trigger to create the new partition once
you get something in the overflow (how cool would that be if it would
work ?).

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block-level CRC checks

2008-10-01 Thread Csaba Nagy
On Wed, 2008-10-01 at 16:57 +0100, Gregory Stark wrote:
 I wonder if we could do something clever here though. Only one process
 is busy
 calculating the checksum -- it just has to know if anyone fiddles the hint
 bits while it's busy.

What if the hint bits are added at the very end to the checksum, with an
exclusive lock to them ? Then the exclusive lock should be short
enough... only it might be deadlock-prone as any lock upgrade...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy
On Fri, 2008-09-12 at 09:38 +0100, Simon Riggs wrote:
 If you request a block, we check to see whether there is a lookaside
 copy of it prior to the tuple removals. We then redirect the block
 request to a viewpoint relation's block. Each viewpoint gets a separate
 relfilenode. We do the switcheroo while holding cleanup lock on block.

Wouldn't it make sense to also have a hint bit on the pages which are
copied away ? Then instead of looking up a hash table, you first would
look up that bit, and if not set you won't look up the hash table at
all.  Then when you clean up the lookaside copies you clear those bits
too...

That would probably perform somewhat better for reading than always
looking up a potentially big hash table, and the cost of setting the
hint is probably a lot less than copying away the page in the first
place. Resetting the hint bit might be a bit more expensive.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy
On Fri, 2008-09-12 at 12:31 +0100, Richard Huxton wrote:
 There was a suggestion (Simon - from you?) of a transaction voluntarily
 restricting itself to a set of tables.

While thinking about how easy it would be for the DBA to specify the set
of tables a single query is accessing, first I thought that it should be
straight enough to look at the query itself for that. Then I thought
what about views, rules, triggers, user functions etc. ? All those have
the potential to access more than you see in the query itself. And then
the actually interesting question: what will the slave do with views,
rules, triggers ? I guess triggers are out of the question to be
executed, what about rules ? Probably must be also ignored... user
functions will probably get errors if they try to update something...
Views should probably function correctly.

So in any case the functionality available for querying slaves would be
less than for the primary. This is probably good enough for most
purposes...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy
 I think that enabling long-running queries this way is both
 low-hanging
 fruit (or at least medium-height-hanging ;) ) and also consistent to
 PostgreSQL philosophy of not replication effort. As an example we trust
 OS's file system cache and don't try to write our own.

I have again questions (unfortunately I only have questions usually):

* how will the buffers keep 2 different versions of the same page ?
* how will you handle the creation of snapshots ? I guess there's no portable 
and universal API for that (just guessing), or there is some POSIX thing which 
is supported or not by the specific FS ? So if the FS is not supporting it, you 
skip the snapshot step ? And if there's no universal API, will it be handled by 
plugins providing a specified API for snapshotting the FS ?

I hope my continuous questioning is not too annoying...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy
On Fri, 2008-09-12 at 15:08 +0300, Hannu Krosing wrote:
  * how will the buffers keep 2 different versions of the same page ?
 
 As the FS snapshot is mounted as a different directory, it will have
 it's own buffer pages.

Lack of knowledge about this shows my ignorance about the implementation
of the page buffers...

  * how will you handle the creation of snapshots ? 
 
 probably an external command, possibly shell script. 
 similar to current archive_command for wal copying
 
 maybe 'create_fs_snapshot_command' and 'destroy_fs_snapshot_command'
[snip]
 Yes, the simplest one being external command. As FS snapshots are
 supposed to happen not-too often, at least not every second, just having
 external commands may be enough.

You could restrict the creation of snapshots to some minimum amount of
time between them, and maybe also restrict the maximum number of
concurrent snapshots possible. Then if the time limit (as calculated
from the last open snapshot) is currently not met, any new query could
reuse that last snapshot. The time intervals do not need to be evenly
distributed BTW, it could be a function of the already opened snapshots,
like increase the minimum interval exponentially with the number of
already opened snapshots. That would help to catch more long running
queries to just a few snapshots.

  I hope my continuous questioning is not too annoying...
 
 On the contrary, much appreciated. :)

Ok, then I'll continue :-) I would like to see this feature succeed, but
there's slim chance I'll ever code well in C...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-12 Thread Csaba Nagy
On Fri, 2008-09-12 at 17:24 +0300, Hannu Krosing wrote:
 On Fri, 2008-09-12 at 17:08 +0300, Heikki Linnakangas wrote:
  Hmm, built-in rsync capability would be cool. Probably not in the first 
  phase, though..
 
 We have it for WAL shipping, in form of GUC archive_command  :)
 
 Why not add full_backup_command ?

I see the current design is all master-push centered, i.e. the master is
in control of everything WAL related. That makes it hard to create a
slave which is simply pointed to the server and takes all it's data from
there...

Why not have a design where the slave is in control for it's own data ?
I mean the slave could ask for the base files (possibly through a
special function deployed on the master), then ask for the WAL stream
and so on. That would easily let a slave cascade too, as it could relay
the WAL stream and serve the base backup too... or have a special WAL
repository software with the same interface as a normal master, but
having a choice of base backups and WAL streams. Plus that a slave in
control approach would also allow multiple slaves at the same time for a
given master...

The way it would work would be something like:

* configure the slave with a postgres connection to the master;
* the slave will connect and set up some meta data on the master
identifying itself and telling the master to keep the WAL needed by this
slave, and also get some meta data about the master's details if needed;
* the slave will call a special function on the slave and ask for the
base backup to be streamed (potentially compressed with special
knowledge of postgres internals);
* once the base backup is streamed, or possibly in parallel,  ask for
streaming the WAL files;
* when the base backup is finished, start applying the WAL stream, which
is cached in the meantime, and it it's streaming continues;
* keep the master updated about the state of the slave, so the master
can know if it needs to keep the WAL files which were not yet streamed;
* in case of network error, the slave connects again and starts to
stream the WAL from where it was left;
* in case of extended network outage, the master could decide to
unsubscribe the slave when a certain time-out happened;
* when the slave finds itself unsubscribed after a longer disconnection,
it could ask for a new base backup based on differences only... some
kind of built in rsync thingy;

The only downside of this approach is that the slave machine needs a
full postgres super user connection to the master. That could be a
security problem in certain scenarios. The master-centric scenario needs
a connection in the other direction, which might be seen as more secure,
I don't know for sure...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-11 Thread Csaba Nagy
On Thu, 2008-09-11 at 15:23 +0300, Heikki Linnakangas wrote:
 I'd imagine that even if applying the WAL on the slave is blocked, it's 
 still streamed from the master to the slave, and in case of failover the 
 slave will fast-forward before starting up as the new master.

Which begs the question: what happens with a query which is running on
the slave in the moment when the slave switches from recovery mode and
starts up ? Should the running queries be canceled if they are blocking
applying of WAL, to allow start-up, or let them finish ?

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-11 Thread Csaba Nagy
On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote:
 One problem with this, BTW, is that if there's a continuous stream of 
 medium-length transaction in the slave, each new snapshot taken will 
 prevent progress in the WAL replay, so the WAL replay will advance in 
 baby steps, and can fall behind indefinitely.

Why would it fall behind indefinitely ? It only should fall behind to
the blocking horizon, which should be the start of the longest
currently running transaction... which should be continually advancing
and not too far in the past if there are only medium length transactions
involved. Isn't normal WAL recovery also doing baby-steps, one WAL
record a time ? ;-)

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-11 Thread Csaba Nagy
On Thu, 2008-09-11 at 16:19 +0300, Heikki Linnakangas wrote:
 Well, yes, but you can fall behind indefinitely that way. Imagine that 
 each transaction on the slave lasts, say 10 minutes, with a new 
 transaction starting every 5 minutes. On the master, there's a table 
 that's being vacuumed (or HOT-updated) frequently, say after each 
 transaction for simplicity. What can happen is that every transaction 
 that finishes on the slave will only let the WAL replay advance by one 
 XID before blocking on the snapshot of the next slave transaction. The 
 WAL replay will advance at a rate of 0.2 TPM, while the master is 
 generating 1.0 TPM.

Aha, now I see where I was mistaken... I thought in terms of time and
not transaction IDs. So the time distance between the slave transactions
does not matter at all, only the distance in recovered XIDs matter for
the blocking horizon... and if the WAL recovery is blocked, the
blocking horizon is stalled as well, so the next transaction on the
slave will in fact require the same blocking horizon as all currently
running ones. Now I got it... and that means in fact that if you have
continuously overlapping small transactions, the blocking horizon
could be even blocked forever, as there'll always be a query running,
and the new queries will always have the snapshot of the currently
running ones because WAL recovery is stalled... or at least that's what
I understand from the whole thing...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-11 Thread Csaba Nagy
On Thu, 2008-09-11 at 15:33 +0200, Dimitri Fontaine wrote:
 What would forbid the slave to choose to replay all currently lagging WALs 
 each time it's given the choice to advance a little?

Well now that I think I understand what Heikki meant, I also think the
problem is that there's no choice at all to advance, because the new
queries will simply have the same snapshot as currently running ones as
long as WAL reply is blocked... further blocking the WAL reply. When
saying this I suppose that the snapshot is in fact based on the last
recovered XID, and not on any slave-local XID. In that case once WAL
recovery is blocked, the snapshot is stalled too, further blocking WAL
recovery, and so on...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Csaba Nagy
On Tue, 2008-09-09 at 20:59 +0200, Zeugswetter Andreas OSB sIT wrote:
 All in all a useful streamer seems like a lot of work.

I mentioned some time ago an alternative idea of having the slave
connect through a normal SQL connection and call a function which
streams the WAL file from the point requested by the slave... wouldn't
that be feasible ? All the connection part would be already there, only
the streaming function should be implemented. It even could use SSL
connections if needed. Then you would have one normal backend per slave,
and they should access either the files directly or possibly some shared
area where the WAL is buffered for this purpose... the streaming
function could also take care of signaling the up-to-dateness of the
slaves in case of synchronous replication.

There could also be some system table infrastructure to track the
slaves. There could also be some functions to stream the files of the DB
through normal backends, so a slave could be bootstrapped all the way
from copying the files through a simple postgres backend connection...
that would make for the easiest possible setup of a slave: configure a
connection to the master, and hit run... and last but not least the
same interface could be used by a PITR repository client for archiving
the WAL stream and occasional file system snapshots.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command execution

2008-07-03 Thread Csaba Nagy
On Thu, 2008-07-03 at 23:15 +1000, Aaron Spiteri wrote:
 Inside foo there was a INSERT and UPDATE, and the INSERT failed but  
 the UPDATE succeeded would the UPDATE be rolled back?

Just to add to the other answers, if the INSERT is before the UPDATE in
the function, the function execution stops when the INSERT fails, and so
the UPDATE will never be executed in the first place...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching between terminals

2008-07-03 Thread Csaba Nagy
On Thu, 2008-07-03 at 19:56 +0530, cinu wrote:
 Could anyone please tell me where I am going wrong and if there is a
 way I can get the same behaviour that I am getting while I am
 executing the through psql prompt.

You're mistake is that you think a transaction is related to your
terminal, but it is in fact tied to the psql session you are running...

Your first example is running one psql instance per terminal, hence one
transaction per terminal, while in your second example the transaction
is terminated each time psql finishes to run. Basically what you're
asking for is to keep a transaction opened by one session (the first
psql execution) and connect to it with the second session (the second
psql call) and continue the transaction which was opened by the first
one... which I'm pretty sure is wrong to want. It is likely possible to
do (using PREPARE TRANSACTION), but even likelier that it is a wrong
thing to do in normal circumstances. If you'll say what you really want
to do, I bet you'll get a lot more useful advices...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Runtime checking of MCV (Was: ... histogram bucket numdistinct statistics)

2008-06-11 Thread Csaba Nagy
On Tue, 2008-06-10 at 19:03 -0400, Tom Lane wrote:
 Given such an MCV list, the planner will always make the right choice
 of whether to do index or seqscan ... as long as it knows the value
 being searched for, that is.  Parameterized plans have a hard time here,
 but that's not really the fault of the statistics.

This is maybe the best example where multiple (sub)plans could be glued
together with some kind of plan fork node, so that the actual plan to be
executed would be decided based on the parameter values and checking the
statistics at runtime instead of plan time for parameterized plans... so
the planner creates alternative (sub)plans (e.g. seqscan vs index scan)
for the cases where the parameters are MCV or not, and then place them
in different branches of a runtime check of the parameter values vs the
statistics. Of course the number of branches must be limited, this would
be the challenge of such a feature... to cover the parameter space with
the minimal number of plan branches so that disastrous plans for special
parameter values are avoided. It would also be possible perhaps to
gradually grow the alternative counts as a reaction to the actual
parameter values used by queries, so that only the parameter space
actually in use by queries is covered.

In fact I would be interested in experimenting with this. Would it be
possible to add new planner behavior as external code ? I would expect
not, as the planner is in charge also for the correctness of the results
and any external code would put that correctness at risk I guess... in
any case, I'll go and check the source.

BTW, there was a discussion about global prepared statements/caching of
query plans, is there any advance on that ? Thorough  planning would
make the most sense in that context, possibly by using a special syntax
for the application to signal the need for such planning for the most
problematic (not necessarily the most used though) queries.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Csaba Nagy
On Wed, 2008-06-04 at 11:13 +0300, Heikki Linnakangas wrote:
 Hmm, WAL version compatibility is an interesting question. Most minor 
 releases hasn't changed the WAL format, and it would be nice to allow 
 running different minor versions in the master and slave in those cases. 
 But it's certainly not unheard of to change the WAL format. Perhaps we 
 should introduce a WAL version number, similar to catalog version?

Would that also cover possible differences in page size, 32bit OS vs.
64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
things can have an influence on how the data is written and possibly
make the WAL incompatible with other postgres instances, even if the
exact same version...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Fwd: Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Csaba Nagy
[Looks like this mail missed the hackers list on reply to all, I wonder
how it could happen... so I forward it]

On Thu, 2008-05-29 at 17:00 +0100, Dave Page wrote:
 Yes, we're talking real-time streaming (synchronous) log shipping.

Is there any design already how would this be implemented ?

Some time ago I was interested in this subject and thought about having
a normal postgres connection where the slave would issue a query to a
special view which would simply stream WAL records as bytea from a
requested point, without ever finishing. This would have the advantage
(against a custom socket streaming solution) that it can reuse the
complete infrastructure of connection making/managing/security (think
SSL for no sniffing) of the postgres server. It would also be a public
interface, which could be used by other possible tools too (think PITR
management application/WAL stream repository). Another advantage would
be that a PITR solution could be serving as a source for the WAL stream
too, so the slave could either get the real time stream from the master,
or rebuild a PITR state from a WAL repository server, using the same
interface...

Probably some kind of WAL subscription management should be also
implemented, so that the slave can signal the master which WAL records
it already applied and can be recycled on the master, and it would be
nice if there could be multiple subscribers at the same time. Some
subscriber time-out could be also implemented, while marking the
subscription as timed out, so that the slave can know that it has to
rebuild itself...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] constraint exclusion analysis caching

2008-05-09 Thread Csaba Nagy
On Fri, 2008-05-09 at 08:47 -0400, Andrew Dunstan wrote:
 However, I wondered if we couldn't mitigate this by caching the results 
 of constraint exclusion analysis for a particular table + condition. I 
 have no idea how hard this would be, but in principle it seems silly to 
 keep paying the same penalty over and over again.

This would be a perfect candidate for the plan-branch based on actual
parameters capability, in association with globally cached plans
mentioned here:

http://archives.postgresql.org/pgsql-hackers/2008-04/msg00920.php

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
   The hairiness is in the plan dependence (or independence) on parameter  
 values, ideally we only want to cache plans that would be good for all  
 parameter values, only the user knows that precisely. Although it could be  
 possible to examine the column histograms...

If cached plans would be implemented, the dependence on parameter values
could be solved too: use special fork nodes in the plan which execute
different sub-plans depending on special parameter values/ranges,
possibly looking up the stats at runtime, so that the plan is in a
compiled state with the decision points wired in.

This of course would mean a lot heavier planning and possibly a lot
bigger plans, but you could afford that if you cache the plan. You could
even have a special command to plan a query this way.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
On Mon, 2008-04-14 at 16:54 +0300, Heikki Linnakangas wrote:
 Figuring out the optimal decision points is hard, and potentially very 
 expensive. There is one pretty simple scenario though: enabling the use 
 of partial indexes, preparing one plan where a partial index can be 
 used, and another one where it can't. Another such case is col LIKE ? 
 queries, where ? is actually a prefix query, foo%.

Another point is when the cardinality distribution of some key's values
is very skewed, with some values very frequent and the majority of
values being unique. There you could check the stats at execution time
just for deciding to go for the low cardinality plan or the high one...

 As an optimization, we could decide the decision points on the prepare 
 message, and delay actually planning the queries until they're needed. 
 That way we wouldn't waste time planning queries for combinations of 
 parameters that are never used.

... or plan the query with the actual parameter value you get, and also
record the range of the parameter values you expect the plan to be valid
for. If at execution time the parameter happens to be out of that range,
replan, and possibly add new sublpan covering the extra range. This
could still work with prepared queries (where you don't get any
parameter values to start with) by estimating the most probable
parameter range (whatever that could mean), and planning for that.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
On Mon, 2008-04-14 at 16:10 +0200, Csaba Nagy wrote:
 ... or plan the query with the actual parameter value you get, and also
 record the range of the parameter values you expect the plan to be valid
 for. If at execution time the parameter happens to be out of that range,
 replan, and possibly add new sublpan covering the extra range. This
 could still work with prepared queries (where you don't get any
 parameter values to start with) by estimating the most probable
 parameter range (whatever that could mean), and planning for that.

More on that: recording the presumptions under which the (cached!)plan
is thought to be valid would also facilitate setting up dependencies
against statistics, to be checked when you analyze tables... and if the
key value which you depend on with your query changed, the analyze
process could possibly replan it in the background.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
 ... or plan the query with the actual parameter value you get, and also
 record the range of the parameter values you expect the plan to be valid
 for. If at execution time the parameter happens to be out of that range,
 replan, and possibly add new sublpan covering the extra range. This
 could still work with prepared queries (where you don't get any
 parameter values to start with) by estimating the most probable
 parameter range (whatever that could mean), and planning for that.

Another thought: if the cached plans get their own table (as it was
suggested) then you could also start gathering parameter range
statistics meaningfully... and on the next replan you know what to
optimize your planning efforts for.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans

2008-04-14 Thread Csaba Nagy
On Mon, 2008-04-14 at 10:55 -0400, Mark Mielke wrote:
 The other ideas about automatically deciding between plans based on 
 ranges and such strike me as involving enough complexity and logic, that 
 to do properly, it might as well be completely re-planned from the 
 beginning to get the most benefit.

... except if you hard-wire the most common alternative plans, you still
get the benefit of cached plan for a wider range of parameter values.
Not to mention that if you know you'll cache the plan, you can try
harder planning it right, getting possibly better plans for complex
queries... you could argue that complex queries tend not to be repeated,
but we do have here some which are in fact repeated a lot in batches,
then discarded. So I guess a cached plan discard/timeout mechanism would
also be nice.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-14 Thread Csaba Nagy
On Mon, 2008-04-14 at 17:08 +0200, PFC wrote:
   Those Decision nodes could potentially lead to lots of decisions 
 (ahem).
   What if you have 10 conditions in the Where, plus some joined ones ? 
 That  
 would make lots of possibilities...

Yes, that's true, but most of them are likely not relevant for the end
result. In any real life query there are a few parameters which are
really important for what plan you should choose... the key here is that
you should spend more time on finding the possibilities for a cached
plan than you do for a one shot query. 

In principle one-shot planning should be the default and caching should
be something the user has to chose deliberately. I would really like a
special command to plan and cache a query without actually executing it,
possibly having a parameter how hard to try... for e.g. you could expend
the extra cycles to eliminate all redundancies from boolean expressions,
in lists, to get the parse tree in a canonical format - all things which
can make planning easier. All these lose in one-shot queries, but once
you cache you can really do a lot of smarts which were no-no before...

   Consider several types of queries :
 
   - The small, quick query which returns one or a few rows : in this 
 case,  
 planning overhead is large relative to execution time, but I would venture  
 to guess that the plans always end up being the same.

Consider a 'select a where b like $1' - the parameter $1 will
considerably affect the query plan. A query can't go much simpler...

   - The query that takes a while : in this case, planning overhead is nil 
  
 compared to execution time, better replan every time with the params.

I guess these queries are not the ones targeted by this feature. In fact
for these queries it really doesn't matter if you cache or not, except:
if you know you're gonna cache, you'll expend more effort planning
right, and that could still matter for a query which runs long. Note
that if you don't cache, planning harder will lose in the long run, only
once you cache you can afford to plan harder...

   - The complex query that still executes fast because it doesn't process 
 a  
 lot of rows and postgres finds a good plan (for instance, a well optimized  
 search query). Those would benefit from reducing the planning overhead,  
 but those also typically end up having many different plans depending on  
 the search parameters. Besides, those queries are likely to be dynamically  
 generated. So, would it be worth it to add all those features just to  
 optimize those ? I don't know...

We have here dynamically generated queries which are specifically
chunked to be executed in small increments so none of the queries runs
too long (they would block vacuuming vital tables otherwise). Those
chunks would greatly benefit from properly planned and cached plans... 

A real smart system would store canonical plan fragments as response to
(also canonicalized) parse tree fragments, and then assemble the plan
out of those fragments, but that would be indeed really complex (to
design, the resulting code might be simpler than one thinks) ;-)

Cheers,
Csaba.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrent psql API

2008-04-10 Thread Csaba Nagy
On Thu, 2008-04-10 at 05:03 +0930, Shane Ambler wrote:
 I do think it is useful for more than typo's in the \join command. What 
 about a slip where you forget to \g the command. Or you start a query 
 that seems to be taking too long, background it and look into what is 
 happening. This would be more helpful to those that ssh into a machine 
 then run psql from there.

For interactive use in the above mentioned scenario you can use the
'screen' command and start as many psqls as needed ('man screen' to see
what it can do). I would probably always use screen instead of psql's
multisession capability in interactive use. I do want to instantly see
what is currently running, and a psql screen cluttered with multiple
results will not make that easier. Even a list method of what is running
will only help if it actually shows the complete SQL for all running
sessions and that will be a PITA if the SQLs are many and big. Multiple
screens are much better at that.

So from my POV scripting should be the main case for such a feature...
and there it would be welcome if it would be made easy to synchronize
the different sessions.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][OT] Concurrent psql API

2008-04-10 Thread Csaba Nagy
 I find myself doing this frequently with any long-running command,  
 but currently it's a PITA because I'd doing it at the shell level and  
 firing up a new psql: more work than should be necessary, and psql  
 sometimes gets confused when you resume it from the background in  
 interactive mode (stops echoing characters, though maybe this has  
 been fixed).

I would recommend trying out the 'screen' utility (see my other post
too). And here you find a nice .screenrc too which will show you a
status bar of your active session, I find it super cool (and it's well
commented if you don't like it as it is):

http://home.insightbb.com/~bmsims1/Scripts/Screenrc.html

The man page has all commands you need, the most used by me:

Ctrl-a Ctrl-c - open a new session;
Ctrl-a A - name the session 8will show up with that name in the status
bar, note that the second key is a capital A not a);
Ctrl-a Ctrl-a - switch to the last viewed session;
Ctrl-a n - switch to the nth session, where n is a digit 0-9

I usually leave the screen sessions running end detach only the
terminal, and then I can connect again to the already set up sessions
using screen -R. It's a real time saver.

It has many more facilities, and creating a new psql session is just
Ctrl-a Ctrl-c and then type in psql... and you're good to go... I don't
think you can beat that by a large margin with psql-intern commands (you
still need to type in something extra), and you do have added benefits
of clearly separated workflows and a nice overview of it.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY Transform support

2008-04-03 Thread Csaba Nagy
On Thu, 2008-04-03 at 16:44 +0200, PFC wrote:
 CREATE FLATFILE READER mydump (
   id  INTEGER,
   dateTEXT,
   ...
 ) FROM file 'dump.txt'
 (followed by delimiter specification syntax identical to COPY, etc)
 ;

Very cool idea, but why would you need to create a reader object
first ? You should be able to use COPY directly with the target table
being omitted,  meaning the copy will not pump it's result in the target
but be equivalent to a select... and use it in any place where a select
can be used. This would have absolutely no new  syntax, just the rules
changed... 

Now that I had a second look you actually need the field definitions to
meaningfully interpret the file, but then why not use a record
specification instead of the table in the normal COPY command ? I'm not
sure if there's any existing syntax for that but I would guess yes...

In any case, such a feature would help a lot in processing input files
based also on other existing data in the DB.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Permanent settings

2008-02-19 Thread Csaba Nagy
 Are you suggesting we keep appending? So if I set the same parameter 100
 times, it would show up on 100 rows?

What about not touching the config file at all, but write to a separate
file which is completely under the control of postgres and include that
at the end of the config file ? You just said includes are a new feature
which could complicate things, so why not use it actually in your
advantage ;-)

That way disabling the overrides would be as simple as commenting out
the inclusion of the postgres controlled config file. And it would
separate the user writable and machine writable configuration...

Cheers,
Csaba.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Permanent settings

2008-02-19 Thread Csaba Nagy
On Tue, 2008-02-19 at 16:41 +0100, Magnus Hagander wrote:
 The end result wouldn't be as clean as some would expect, but it would
 certainly be easier code-wise. For example, I'm sure someone would get the
 suggestion to go edit postgresql.conf to change a config value, and be
 surprised when it didn't show up as a changed setting because it was
 overridden from another file..

Yes, but at least the override part would be nicely separated in a file,
and could suggestively be named as something like
postgresql.conf.override, and hopefully will stick out sufficiently for
those who edit the config file directly to wonder about it's purpose...
and of course always editable directly too, so you can easily manually
fix foot-shooting  mistakes made from the admin interface. It would be
just simply rewritten each time you change something without regard to
the manual changes, and possibly ignored altogether if your manual
changes violate it's expected layout.

Cheers,
Csaba.




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Postgresql Materialized views

2008-01-14 Thread Csaba Nagy
On Mon, 2008-01-14 at 09:22 +, Simon Riggs wrote:
 So I support Mark Mielke's views on writing code. Anybody who wants to
 code, can. There's probably a project of a size and complexity that's
 right for your first project. 

The main problem is that usually that initial thing is not what you
desperately need today... so the motivation will be pretty low unless
you just have loads of time to start off playing with the code.

 Apparently the guy that invented the new
 scheduling algorithms for Linux wasn't even a coder, but he sat down and
 worked it out.

 This is Hackers: Write some code today, everybody. You *can*.

Certainly everybody  can write code, but the barrier to accept it is
pretty high in the postgres community. So you better be a damn good
coder if you expect your code to be accepted... and even then with
considerable fight for justifying the use case for your feature ;-)

This is all good for a stable product, but it really makes the barrier
between simple users and hackers pretty high.

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Storage Model for Partitioning

2008-01-11 Thread Csaba Nagy
On Fri, 2008-01-11 at 11:34 +, Richard Huxton wrote:
 1. Make an on-disk chunk much smaller (e.g. 64MB). Each chunk is a 
 contigous range of blocks.
 2. Make a table-partition (implied or explicit constraints) map to 
 multiple chunks.
 That would reduce fragmentation (you'd have on average 32MB's worth of 
 blocks wasted per partition) and allow for stretchy partitions at the 
 cost of an extra layer of indirection.

This sounds almost like some kind of clustering index, where the index
contains ranges pointing to blocks of data... if the same index is also
used for inserting (i.e. the free space map is a partial cluster index
on blocks with free space), that would be a coarse clustering solution I
guess...

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Storage Model for Partitioning

2008-01-11 Thread Csaba Nagy
 Which is roughly what Simon's original Dynamic Partitioning would be 
 if it became visible at the planner level (unless I've misunderstood). I 
 was picturing it in my head as a two-dimensional bitmap with 
 value-ranges along one axis and block-ranges along the other. Of course, 
 unlike other indexes it needs visibility information to be of any use.

But why not have it as a normal index of ranges ? I'm not familiar with
the GIST extensions, but this sounds like a set of records (segments in
Simon's terms) indexed by their interval position on a line... isn't
that covered by some GIST index type ?

 Thinking about it, I'm not sure how my thinking would affect a 
 full-table seq-scan. You'd not get blocks back in-order throughout the 
 scan - would that matter?

That could be covered by something like the bitmap scan, just on coarser
level, the bits covering segments instead of blocks.

Cheers,
Csaba.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-07 Thread Csaba Nagy
On Wed, 2008-01-02 at 17:56 +, Simon Riggs wrote:
 Like it?

Very cool :-)

One additional thought: what about a kind of segment fill factor ?
Meaning: each segment has some free space reserved for future
updates/inserts of records in the same range of it's partitioning
constraint. And when inserting/updating you put the new record into the
corresponding segment... just like a very coarse clustering. Then you
could vacuum the segments separately to keep the free space not running
out. For active segments you would then fix the partitioning constraint
range once the fill factor is reached, to allow for keeping it's
constraint even when heavily updating (heavily vacuuming it too as
response to that), and create a new segment for the unbounded range for
new inserts... this would work fine for tables where the constraint is
based on ever increasing keys and accidental inserts in old ranges
(which do happen occasionally in real life).

When the change rate of old segments get down, the segments could be
reorganized to have a smaller fill factor, so that you still allow for
accidental updates but keep space usage efficient. This would be some
similar action as a clustering, but hopefully not blocking (which might
be a hard thing to do)... and later again you could mark some of the
really old things as read only and put them in special segments with no
wasted space.

One problem would be when the segment's free space runs out, so you must
put records from the same constraint range in multiple segments - but
that could still work, you just would have multiple segments covering
the same range, but if the segment fill factor is chosen properly it
should not be the case... you could normally maintain a set of
non-overlapping segments in terms of the partitioning constraint.

Cheers,
Csaba.




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-07 Thread Csaba Nagy
On Mon, 2008-01-07 at 13:59 +0100, Markus Schiltknecht wrote:
 However, for tables which don't fit the use case of SE, people certainly 
 don't want such a fill factor to bloat their tables.

Sure, but it could be configurable and should only be enabled if the
table is marked as partitioned on some condition... I think it would be
a bad idea anyway if the DB would start partitioning on some arbitrary
criteria based on analyzing he table, so the DBA should be the one to
decide on what criteria to partition. In particular it could be a bad
idea on occasions to partition on the clustering criteria for tables
which were clustered once.

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps

2008-01-07 Thread Csaba Nagy
On Mon, 2008-01-07 at 14:20 +0100, Markus Schiltknecht wrote:
 Why is that? AFAIUI, Segment Exclusion combines perfectly well with 
 clustering. Or even better, with an upcoming feature to maintain 
 clustered ordering. Where do you see disadvantages such an optimization 
 for sequential scans?

Well, as I understood it, this would be some kind of special case of
clustering, where the cluster key is expected to be ever increasing in
time and new rows would not be randomly distributed over the complete
possible range. In theory you could also have each segment in turn be
clustered on some other criteria than the partitioning criteria so
indexed access could also be better on the main selection criteria which
could be different than the partitioning criteria. All this is of course
just speculations - but I guess that's what you expected too in this
discussion :-)

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote:
 Features
 - Read Only Tables
 - Compressed Tablespaces

I wonder if instead of read-only tables wouldn't it be better to have
some kind of automatic partitioning which permits to have different
chunks of the table data in different tablespaces, and a freeze command
which effectively moves the data from the (normally small) active chunk
to the archive chunk when it's transaction id is older than a predefined
threshold ?

Then put the active chunk on a high performance file system and the
archive tablespace on a compressed/slow/cheap file system and you're
done. Allow even the archive chunk to be updateable, and put new tuple
data in the active chunk. It would work just fine for cases where the
old data is rarely updated/deleted...

Another advantage I guess would be that active data would more likely
stay in cache, as updated records would stay together and not spread
over the inactive.

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 13:44 +0100, Csaba Nagy wrote:
 Another advantage I guess would be that active data would more likely
 stay in cache, as updated records would stay together and not spread
 over the inactive.

And I forgot to mention that vacuum could mostly skip the archive part,
and only vacuum the active part, which would drastically reduce the cost
of vacuuming big  active tables.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] WORM and Read Only Tables (v0.1)

2007-12-11 Thread Csaba Nagy
On Tue, 2007-12-11 at 14:58 +0200, Hannu Krosing wrote:
 Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy:
  Then put the active chunk on a high performance file system and the
  archive tablespace on a compressed/slow/cheap file system and you're
  done. Allow even the archive chunk to be updateable, and put new tuple
  data in the active chunk. It would work just fine for cases where the
  old data is rarely updated/deleted...
 
 You can't update a table on a read-only (write-once) partition, at least
 not with current header structure.

OK, but that's what I'm challenging, why do you need a write once
partition ? You mean by that tapes ? OK, it means I was thinking in
completely different usage scenarios then...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Ordered Append Node

2007-11-23 Thread Csaba Nagy
On Fri, 2007-11-23 at 12:36 +, Gregory Stark wrote:
 I also did an optimization similar to the bounded-sort case where we check if
 the next tuple from the same input which last contributed the result record
 comes before the top element of the heap. That avoids having to do an insert
 and siftup only to pull out the same record you just inserted. In theory this
 is not an optimization but in practice I think partitioned tables will often
 contain contiguous blocks of key values and queries will often be joining
 against that key and therefore often want to order by it.

If it is an option, you could also do this by a new method on the heap
which adds a new entry and removes the resulting new head in one atomic
operation. That would work with one single comparison for the less than
current head situation, and it would not need to repeat that comparison
if that fails. Also it could directly remove the head and balance the
tree in one go.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Feature Freeze date for 8.4

2007-10-23 Thread Csaba Nagy
On Tue, 2007-10-23 at 11:00 +0200, Rafael Martinez wrote:
 We are always 1 year back the main release. We are testing and planing
 the move to 8.2 now, and it won't happen until desember. In a 6 month
 cycle we will have to jump over every second release.

We here are also just in the process of upgrading to 8.2, so in
principle I agree... except nobody is forced to go through ALL versions
- skipping over every second release is perfectly OK... but: shorter
releases would allow people to install only releases which actually
bring performance/functionality improvements they really need. All other
releases can be skipped... in fact you could possibly go for installing
a specific release only on some of the DBs based on what you need there.

The main question is if shorter release will impact quality - from a
user point of view (and I can only give my opinion in this respect) I
don't see any other possible disadvantage...

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-09 Thread Csaba Nagy
[snip]
 In the case of User-Defined functions, the user should be defining it
 as Deterministic. 

The user CAN already define his functions as
Deterministic=IMMUTABLE... the problem is that many of us will define
functions as immutable, when in fact they are not. And do that by
mistake... and there's nothing postgres can do about that.

 Can we frame a set of guidelines, or may be some test procedure, which
 can declare a certain function as deterministic? 

You mean postgres should check your function if it is really immutable ?
I can't imagine any way to do it correctly in reasonable time :-)
Imagine a function of 10 parameters which returns the sum of the
parameters all the time except for parameters all 1 it will randomly
return a value _once in a thousand executions_... please find a generic
algorithm which spots this function as not immutable in reasonable
execution time ;-)
So this example is a bit extreme, but don't underestimate the user ;-)

 I am just saying from the top of my mind. Even otherwise, if we can
 even restrict this indexing to only Built-in deterministic functions.,
 don't you think it would help the cause of a majority? I have just
 made the proposal to create the index with snapshot a optional one. 

Restrictions like this are always confusing for the end user (i.e. why
can I use built-ins here and not my own ?). I leave to the actual coders
to say anything about code maintenance concerns...

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-09 Thread Csaba Nagy
 I think you're overly pessimistic here ;-) This classification can be done 
 quite 
 efficiently as long as your language is static enough. The trick is not to 
 execute the function, but to scan the code to find all other functions and 
 SQL 
 statements a given function may possibly call. If your function calls no SQL 
 statements, and only other functions already marked IMMUTABLE, then it must 
 be 
 IMMUTABLE itself.

OK, I have a black-box mindset right now due to the problem I'm
currently working on, so I didn't even think about checking the source
code of the function (which is the right thing to do if you have the
source code)... in which case you're right, I was overly pessimistic :-)

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-09 Thread Csaba Nagy
On Tue, 2007-10-09 at 11:22 -0400, Andrew Dunstan wrote:
 
 Csaba Nagy wrote:
  You mean postgres should check your function if it is really immutable ?
  I can't imagine any way to do it correctly in reasonable time :-)

 I would say that in the general case it's analogous to the halting 
 problem, not solvable at all let alone in any reasonable time.

In the light of Florian's mail, I would say that in the context of a
language which can check each of it's constructs if it is immutable or
not, a procedure using only immutable constructs should be itself
immutable... the halting problem is avoided in that you don't really
need to know if/how the procedure works, you only need to know that it
will always work the same ;-) The problem is that in the general case
the languages don't have available checks for this kind of thing, so
either you restrict the immutability check to simple languages (static
enough as Florian would say) or you must allow the user to decide if
the function is immutable or not. In the general case I assume the users
will want the power to decide (and potentially be wrong), and will
expect that if they do mistake, the result won't be catastrophic. I
guess this is the same conclusion as in previous threads about the
subject...

Cheers,
Csaba.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-08 Thread Csaba Nagy
On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote:
 This idea has been discussed to death many times before. Please search
 the archives.

Somewhat related to the visibility in index thing: would it be
possible to have a kind of index-table ? We do have here some tables
which have 2-4 fields, and the combination of them forms the primary key
of the table. There are usually no other indexes on the table, and the
net result is that the PK index duplicates the heap. So it would be cool
if the index would be THE heap somehow...

The most prominent example of this in our DBs is this table:

db \d table_a
  Table public.table_a
  Column   |  Type  | Modifiers 
---++---
   id_1| bigint | not null
   id_2| bigint | not null
Indexes:
pk_table_a PRIMARY KEY, btree (id_1, id_2)

db select reltuples::bigint, relpages from pg_class where
relname='table_a';
 reltuples | relpages 
---+--
 445286464 |   710090
(1 row)

db select reltuples::bigint, relpages from pg_class where
relname='pk_table_a';
 reltuples | relpages 
---+--
 445291072 |   599848
(1 row)

This postgres install is compiled with 32K page size (for the ones who
wonder about the page counts). In any case, it is clear that the index
basically duplicates the heap...

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] GUC variable renaming, redux

2007-09-25 Thread Csaba Nagy
On Mon, 2007-09-24 at 10:55 -0700, Joshua D. Drake wrote:
 IMO, monitor_ seems weird versus track_. To me monitor implies actions
 to be taken when thresholds are met. PostgreSQL doesn't do that.
 PostgreSQL tracks/stores information for application to monitor or
 interact with and those application may doing something based on the
 tracked information.

Not that my opinion would count too much on hackers, but track sounds
better to me too, seems more to the point, not to mention it's shorter
too.

+1 track

Cheers,
Csaba.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server

2007-05-21 Thread Csaba Nagy
 In other words, if I can assure that data exported and then imported
 will always, under all circumstances, compare the same to the original,
 would that be enough of a requirement? In other words, if I offer a
 format that is assured of preserving both mantissa and exponent
 precision and range, as well as all extra attributes (+/-Infinity and
 NaN), but does not guarantee that the semantically identical constructs
 are told apart (+0 vs. -0, and the different NaNs), would that format be
 acceptable?

If you care about the +/- for +/-Infinity, you must also care about +/-0
too, so you get the right type of infinity if you divide with 0... so +0
and -0 are far from being semantically identical.

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Feature freeze progress report

2007-05-03 Thread Csaba Nagy
 We have _ample_ evidence that the problem is lack of people able to
 review patches, and yet there is this discussion to track patches
 better.  It reminds me of someone who has lost their keys in an alley,
 but is looking for them in the street because the light is better there.

Bruce, I guess the analogy fails on the fact that you're not looking for
a key, but for people, and I thought better light will attract people
to find you instead of you to need to search...

I'm an outsider regarding postgres development, so my opinion does not
count, but the gut feeling is that more light attracts better people.
Not to mention it can help people get better...

Cheers,
Csaba.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Feature freeze progress report

2007-05-03 Thread Csaba Nagy
On Thu, 2007-05-03 at 13:51, Bruce Momjian wrote:
 I believe the problem is not that there isn't enough information, but
 not enough people able to do the work.  Seeking solutions in areas that
 aren't helping was the illustration.

Yes Bruce, but you're failing to see that a more structured information
infrastructure will attract more people to do the work which could
eventually solve the problem you're having in the first place
(contradicting your argument that it won't help), at the cost of some
possible additional work (which I actually think you're overestimating
the amount of - it's probably more like a learning curve than actual
additional work).

The fact that there is enough information is not relevant, it must be in
the right place too - too much information or hard to find information
is sometimes worst than no information.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Last chance to object to MVCC-safe CLUSTER

2007-04-12 Thread Csaba Nagy
On Sat, 2007-04-07 at 18:09, Tom Lane wrote:
 Awhile back Csaba Nagy [EMAIL PROTECTED] wrote:
  Making cluster MVCC-safe will kill my back-door of clustering a hot
  table while I run a full DB backup.
 
 Are we agreed that the TRUNCATE-based workaround shown here
 http://archives.postgresql.org/pgsql-hackers/2007-03/msg00606.php
 is an adequate response to this objection?

That workaround should actually work. It is more work but the desired
goal is achieved.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-22 Thread Csaba Nagy
 speaking with pavan off list he seems to think that only 'create
 index' is outside transaction, not the other ddl flavors of it because
 they are generally acquiring a excl lock.  so, in that sense it is
 possibly acceptable to me although still a pretty tough pill to
 swallow (thinking, guc time).  It would also preclude ever integrating
 vanilla 'create index' to create table command, fwiw.

Just to signal that it is in use: we did use create index in
transactions occasionally when we had to do DB schema upgrade on
production systems for application upgrades which span multiple versions
of our application (normally we upgrade versions one by one, but we have
some systems which are upgraded rarely). In these occasions it was
riskier than usually to run the cumulated upgrade scripts outside a
transaction block.

But that was mostly a convenience feature, we could always rearrange our
upgrade scripts to do all the rest first and then all the index creation
at the end if all the rest succeeded... but if implicit index creation
fails (e.g. when adding a new field to a table which happens also to be
a primary key) inside the transaction, that would hurt... mostly in more
work/more risks of extended downtime, but it will have a factor of
inconvenience.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Stats for multi-column indexes

2007-03-21 Thread Csaba Nagy
On Tue, 2007-03-20 at 18:12, Josh Berkus wrote:
 Tom,
 
  Actually, I think you don't particularly need stats for that in most
  cases --- if the planner simply took note that the FK relationship
  exists, it would know that each row of the FK side joins to exactly
  one row of the PK side, which in typical cases is sufficient.
 
 Is it?  What about the other direction?  Currently, doesn't the planner 
 assume that the rowcount relationship is 1 to ( child total rows / 
 parent total rows) ?  That's ok for tables with relatively even 
 distribution, but not for skewed ones.

Wouldn't that be improved if the MCVs/histogram of the FK column are
taken into account ? Considering that the FK part is unique, the
skewness in the relationship is completely determined by the FK parts
histogram. That would give at least a lower/upper bound and MCVs to the
relationship.

Cheers,
Csaba.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Stats for multi-column indexes

2007-03-21 Thread Csaba Nagy
This should read:

 Considering that the FK part is unique, the
 ^^PK^^
 skewness in the relationship is completely determined by the FK part's
 histogram. That would give at least a lower/upper bound and MCVs to the
 relationship.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-15 Thread Csaba Nagy
On Thu, 2007-03-15 at 17:01, A.M. wrote:
 It seems to me that postgresql is especially well-suited to run DDL  
 at runtime, so what's the issue?

The issue is that some applications are not well suited to run DDL at
runtime :-)

As I already mentioned in another post in this thread, our application
also has a requirement of user defined fields in one table. Problem is,
that table is so accessed in peak hours, that it is simply impossible to
take an exclusive lock on it without causing an extended perceived
downtime of the application. And guess what, users will always want to
add new fields in peak hours...

We did solve this in our case with some application logic, but a generic
solution would be nice ;-)

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-14 Thread Csaba Nagy
On Wed, 2007-03-14 at 16:08, [EMAIL PROTECTED] wrote:
 On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote:
  David Fetter [EMAIL PROTECTED] writes:
   CREATE TABLE symptom (
   symptom_id SERIAL PRIMARY KEY, /* See above. */
   ...
   );
  
   CREATE TABLE patient_presents_with (
   patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
   symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
   UNIQUE(patient_id, symptom_id)
   );
  
  I'm just glad I don't have your doctor. I hope mine doesn't think symptoms 
  are
  all boolean values.
 
 Where is the boolean above? It is M:N, with each having whatever data
 is required.

The boolean is assumed in the symptoms table. In any case, even if it's
not a boolean value, even if maybe the symptoms table is a complex one
on it's own, it still is one single type for all symptoms of all
patients. The real problem is that in some real world applications you
have a mix of wildly varying types of attributes a user might want to
use, and you can't know what those will be beforehand... the symptoms
thing is simple to solve in the way David did it, but there really are
other situations which a simple m:n can't easily cover. How would you
handle a data base of user settings for 10K different applications and
100M different users where each application must be able to store it's
own (type safe !!) settings in the same structure, and applications come
and go with their own settings ? Come up with a good solution to this
combined with queries like give me all the users who have this set of
settings set to these values running fast, and then you're talking.

Cheers,
Csaba.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-14 Thread Csaba Nagy
On Wed, 2007-03-14 at 16:50, David Fetter wrote:
 On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote:
  David Fetter [EMAIL PROTECTED] writes:
  
   CREATE TABLE symptom (
   symptom_id SERIAL PRIMARY KEY, /* See above. */
   ...
   );
  
   CREATE TABLE patient_presents_with (
   patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
   symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
   UNIQUE(patient_id, symptom_id)
   );
  
  I'm just glad I don't have your doctor.  I hope mine doesn't think
  symptoms are all boolean values.
 
 What's in the symptom table is up to the doctor.

OK, and here's the problem: each doctor might want to put something else
in the symptom table. Each doctor might want to do it in a type safe
way, e.g. so that the application enforces an enumeration of
high/moderate/low for the symptom fever (or maybe another doctor wants
it in exact degrees)... you can all stuff it in a string field, but you
know how reliable that will be.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread Csaba Nagy
On Tue, 2007-03-13 at 00:43, Richard Huxton wrote:
 Josh Berkus wrote:
  I really don't see any way you could implement UDFs other than EAV that 
  wouldn't be immensely awkward, or result in executing DDL at runtime.
 
 What's so horrible about DDL at runtime? Obviously, you're only going to 
 allow specific additions to specific schemas/tables, but why not?

Well, exclusively locking the table for DDL is not always possible in
production systems. We also shortly had a solution where we added new
columns on the fly, and we had to ditch it... it was simply not working.
The users were usually adding columns in the peek business hours, and in
the same hours it was simply impossible to take an exclusive lock on
that table. I think DDL will actually also exclusively lock parent
tables of FK relationships (I might be mistaken here, but I think I have
seen such lock), which is even worse.

After it caused extensive perceived downtime for hours, we simply
redesigned the feature so that the users need now to announce beforehand
how many different types of new columns they will need and we just
create a few extra of them, and assign them to the users needs as they
need it... the unused columns stay null and hopefully don't have too big
overhead, but it's the simplest solution we found which actually works.
When the user runs out of spare columns, the admin will create some new
spares in quiet hours.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] CLUSTER and MVCC

2007-03-09 Thread Csaba Nagy
On Fri, 2007-03-09 at 12:29, Heikki Linnakangas wrote:
 Csaba, you mentioned recently 
 (http://archives.postgresql.org/pgsql-hackers/2007-03/msg00027.php) that 
 you're actually using the MVCC-violation to clean up tables during a 
 backup. Can you tell us a bit more about that? Would you be upset if we 
 shut that backdoor?

My use case: a queue-like table (in fact a 'task' table) which is very
frequently inserted/updated/deleted. This table tends to be bloated in
the presence of any long running transaction... the only transactional
behavior we need from this table is to make sure that when we insert
something in this table in a transaction (possibly together with other
actions) and then commit/rollback, it commits/rolls back the insert.
CLUSTER's violation of MVCC does not affect this, as CLUSTER will not be
able to lock the table if another transaction inserted something in it
(the inserting transaction will have a lock on the table). Selections on
this table are not critical for us, it just doesn't matter which job
processor is getting which task and in what order... (actually it does
matter, but CLUSTER won't affect that either).

So what I do is execute CLUSTER once in 5 minutes on this table. This
works just fine, and keeps the table size small even if I have long
running transactions in progress. The DB backup is one of such
unavoidable long running transactions, and I use the table exclusion
switch to exclude this task table from the backup so it won't get locked
by it and let CLUSTER still do it's job (I had a rudimentary patch to do
this even before the feature was introduced to pg_dump). The table can
be dumped separately which is a brief operation, but I would have anyway
to clear it on a crash...

Now I could try and disable the CLUSTER cron job and see if i get
problems, as last it was disabled with postgres 7.4, maybe something
changed in between... but I can tell for sure that last time I enabled
it it really fixed our load on the DB server...

Wouldn't be possible to do it like Simon (IIRC) suggested, and add a
parameter to enable/disable the current behavior, and use the MVCC
behavior as default ?

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] CLUSTER and MVCC

2007-03-09 Thread Csaba Nagy
On Fri, 2007-03-09 at 13:42, Gregory Stark wrote:
 Csaba Nagy [EMAIL PROTECTED] writes:
 
  Wouldn't be possible to do it like Simon (IIRC) suggested, and add a
  parameter to enable/disable the current behavior, and use the MVCC
  behavior as default ?
 
 Doing it in CLUSTER would be weird. However perhaps it would be useful to have
 some sort of stand-alone tool that just bumped all the xmin/xmax's. It would
 have to be super-user-only and carry big warning labels saying it breaks MVCC.

Well, the current behavior of CLUSTER is just perfect for what I'm using
it. If anything else would do the job, I would be happy to use it
instead...

 But it would be useful any time you have a table that you want to exempt a
 particular table from serializable snapshots. Basically a per-table way to
 force a read-committed snapshot on. Though, actually it's not quite a
 read-committed snapshot is it? Anyone using an old serializable snapshot will
 see what, no tuples at all?

I'm afraid what I need has nothing to do with serializable snapshots...
I still want the table to be completely transactional except if somebody
can get an exclusive lock on it, it can be compacted regardless of other
running transactions. I'm not sure how to express this in other way...
it means something like: no transaction cares about the content of the
table until it gets some kind of lock on it. In other words the table's
state is not connected with the state of other tables until I actually
do something on it...

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] CLUSTER and MVCC

2007-03-09 Thread Csaba Nagy
On Fri, 2007-03-09 at 14:00, Alvaro Herrera wrote:
 But I'm not really seeing the problem here.  Why isn't Csaba's problem
 fixed by the fact that HOT reduces the number of dead tuples in the
 first place?  If it does, then he no longer needs the CLUSTER
 workaround, or at least, he needs it to a much lesser extent.

Is this actually true in the case of HOT + long running transactions ? I
was supposing HOT has the same problems in the presence of long running
transactions...

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] CLUSTER and MVCC

2007-03-09 Thread Csaba Nagy
 Hmm. You could use something along these lines instead:
 
 0. LOCK TABLE queue_table
 1. SELECT * INTO queue_table_new FROM queue_table
 2. DROP TABLE queue_table
 3. ALTER TABLE queue_table_new RENAME queue_table
 
 After all, it's not that you care about the clustering of the table, you 
 just want to remove old tuples.

... and then restart the app so all my pooled connections drop their
cached plans ;-)

Seriously, that won't work. If a session tries to insert a new row after
I lock the table to clean it up, I still want it to be able to insert
after the cleanup is finished... if I drop the table it tries to insert
to, it will fail.

 As a long term solution, it would be nice if we had more fine-grained 
 bookkeeping of snapshots that are in use in the system. In your case, 
 there's a lot of tuples that are not visible to pg_dump because xmin is 
 too new, and also not visible to any other transaction because xmax is 
 too old. If we had a way to recognize situations like that, and vacuum 
 those tuples, much of the problem with long-running transactions would 
 go away.

In the general case that won't work either in a strict MVCC sense... if
you have an old transaction, you should never clean up a dead tuple
which could be still visible to it.

  Wouldn't be possible to do it like Simon (IIRC) suggested, and add a
  parameter to enable/disable the current behavior, and use the MVCC
  behavior as default ?
 
 I guess we could, but I don't see why should encourage using CLUSTER for 
 that. A more aggressive, MVCC-breaking version of VACUUM would make more 
 sense to me, but I don't like the idea of adding break-MVCC flags to 
 any commands.

Well, if there would be any other way to avoid the table bloat I would
agree.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Interaction of PITR backups and Bulk operationsavoiding WAL

2007-03-09 Thread Csaba Nagy
On Fri, 2007-03-09 at 17:47, Tom Lane wrote:
 I don't think that people are very likely to need to turn archiving on
 and off on-the-fly.

We did need occasionally to turn archiving on on-the-fly. It did happen
that I started up a new DB machine and I did not have yet the log
archive available, so I had to wait with configuring that, but the
machine went on-line before the archive machine was ready... and then
later I had to switch on archiving. It was very convenient that I could
do it without a restart.

It's true that has been rare occasion, more often you just need to
change the archive command (e.g. to archive to a different location if
the archive repository goes down).

It's somewhat moot for us as we changed to use Slony (which is a heavy
beast but once it works it's great).

Cheers,
Csaba.




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Revitalising VACUUM FULL for 8.3

2007-03-01 Thread Csaba Nagy
On Thu, 2007-03-01 at 13:02, Simon Riggs wrote:
 I would like to introduce the concept of utility transactions. This is
 any transaction that touches only one table in a transaction and is not
 returning or modifying data. All utility transactions wait until they
 are older than all non-utility transactions before they commit. A
 utility transaction would currently be any VACUUM, VACUUM FULL and
 CREATE INDEX CONCURRENTLY. That is safe because each of those commands
 executes in its own transaction and doesn't touch more than one table at
 a time. Once each knows there is no chance of being interfered with, it
 can continue its work and commit. This technique is already in use for
 CREATE INDEX CONCURRENTLY, so just needs to be extended to all other
 utilities - but in a way that allows them to recognise each other. This
 extends upon the thought that VACUUMs already recognise other VACUUMs
 and avoid using them as part of their Snapshot. 

Wouldn't this be deadlock prone ? What if a non-utility transaction
(which could even be started before the vacuum full) blocks on the table
being vacuumed, then if the vacuum wants to wait until all non-utility
transactions finish will deadlock.

 The utility transaction concept would make new VACUUM FULL MVCC-safe and
 would also make most executions of CLUSTER MVCC-safe also (the implicit
 top-level transaction cases).

Making cluster MVCC-safe will kill my back-door of clustering a hot
table while I run a full DB backup.

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Revitalising VACUUM FULL for 8.3

2007-03-01 Thread Csaba Nagy
On Thu, 2007-03-01 at 13:56, Simon Riggs wrote:
  Wouldn't this be deadlock prone ? What if a non-utility transaction
  (which could even be started before the vacuum full) blocks on the table
  being vacuumed, then if the vacuum wants to wait until all non-utility
  transactions finish will deadlock.
 
 Exactly the same as CREATE INDEX CONCURRENTLY, which has a much more
 frequent use case than VACUUM FULL does, even after I've made the
 proposed changes.
 
 The situation, as I understand it, would be that the utility command
 waits on another transaction to complete. As soon as that other
 transaction touches the locked table it will detect a simple deadlock
 and the non-utility statement will abort.

Fair enough.

   The utility transaction concept would make new VACUUM FULL MVCC-safe and
   would also make most executions of CLUSTER MVCC-safe also (the implicit
   top-level transaction cases).
  
  Making cluster MVCC-safe will kill my back-door of clustering a hot
  table while I run a full DB backup.
 
 Wow. I'll take that as a request for a NOWAIT option on utility
 commands, rather than a suggestion that we shouldn't strive to make
 things safe in the default case.

Yes please... if the current behavior is possible to be triggered (by
that NOWAIT for eg.), it would be actually good to have the MVCC
behavior as default.

Thanks,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] New feature request: FlashBack Query

2007-02-21 Thread Csaba Nagy
 Do 97% of transactions commit because Oracle has slow rollbacks and  
 developers are working around that performance issue, or because they  
 really commit?
 
 I have watched several developers that would prefer to issue numerous  
 selects to verify things like foreign keys in the application in  
 order to avoid a rollback.

Most of the code we have will not afford a rollback because it can be
part of a much bigger transaction which would have much higher
performance penalty if retried than a simple rollback. And you know that
in postgres you can't roll back just the last insert, you will crash the
whole transaction with it... and it's simply a performance bottleneck to
retry in a high contention scenario (which is usually so in our case).

So I would say we don't avoid rollbacks because of the cost of the
rollback, but because of the cost of the retry...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps

2007-02-19 Thread Csaba Nagy
 One option that I've heard before is to have vacuum after a single iteration
 (ie, after it fills maintenance_work_mem and does the index cleanup and the
 second heap pass), remember where it was and pick up from that point next
 time.

From my experience this is not acceptable... I have tables for which the
index cleanup takes hours, so no matter how low I would set the
maintenance_work_mem (in fact I set it high enough so there's only one
iteration), it will take too much time so the queue tables get overly
bloated (not happening either, they get now special cluster
treatment).

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-08 Thread Csaba Nagy
On Thu, 2007-02-08 at 17:47, Marc Munro wrote:
 [snip] One of the causes of deadlocks in Postgres is that its referential
 integrity triggers can take locks in inconsistent orders.  Generally a
 child record will be locked before its parent, but not in all cases.
[snip]

The problem is that eliminating the deadlock is still not the complete
cake... the interlocking still remains, possibly leading to degraded
performance on high contention on very common parent rows. The real
solution would be when an update on the parent table's non-referenced
fields is not interlocking at all with updates of the child rows... and
I think there were some proposals to do that.

In fact one possibility to avoid this problem is vertical partitioning,
i.e. separating the non-key columns in a parallel table and updating
them there. However this is a choice only when you know it beforehand
and you're not inheriting the schema from other DBs...

Cheers,
Csaba.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Referential Integrity and SHARE locks

2007-02-02 Thread Csaba Nagy
On Fri, 2007-02-02 at 10:51, Simon Riggs wrote:
[snip]
 Why do we need a SHARE lock at all, on the **referenc(ed)** table?
 
 It sounds like if we don't put a SHARE lock on the referenced table then
 we can end the transaction in an inconsistent state if the referenced
 table has concurrent UPDATEs or DELETEs. BUT those operations do impose
 locking rules back onto the referencing tables that would not be granted
 until after any changes to the referencing table complete, whereupon
 they would restrict or cascade. So an inconsistent state doesn't seem
 possible to me.
 
 What am I missing?

Well, here we do have a patch (deployed on production servers) which
does not put the shared lock on the referenced table, and it lets in
occasionally rows in the referencing tables which do not have parent
rows in the referenced table. I'm not sure what is the mechanism, but it
does happen, I can assure you. It happens rare enough that is not
disturbing for us, compared to the deadlocks which happen without the
patch - that's another matter...

In our application we never update any key ids, so only deletes/inserts
come in play, and I guess it happens when a referenced row is deleted
just between a newly inserted child row checks that the parent row
exists and the row is really inserted. Or something like that...

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Referential Integrity and SHARE locks

2007-02-02 Thread Csaba Nagy
 You say below the cut that you're not updating keys, so presumably it's 
 other columns. Which leads me to something I've wondered for a while - 
 why do we lock the whole row? Is it just a matter of not optimised that 
 yet or is there a good reason why locking just some columns isn't 
 practical.

For the conditions of generating the deadlock, see:

http://archives.postgresql.org/pgsql-general/2006-12/msg00029.php

The reason of the occasional orphan rows is not completely clear to me,
but it must be some kind of race condition while
inserting/deleting/?updating concurrently the parent/child tables.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] savepoint improvements

2007-01-22 Thread Csaba Nagy
On Sat, 2007-01-20 at 18:08, Merlin Moncure wrote:
[snip]
 To be honest, I'm not a huge fan of psql tricks (error recovery being
 another example)  but this could provide a solution.  in your opnion,
 how would you use \if to query the transaction state?

Wouldn't it make sense to introduce instead something like:

\set language plpgsql
... and then redirect to plpgsql all you type ?

That would give you the possibility to execute things in your favorite
language directly from psql without creating a function.

Cheers,
Csaba.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Frequent Update Project: Design Overview ofHOTUpdates

2006-11-13 Thread Csaba Nagy
[snip]
 IMHO *most* UPDATEs occur on non-indexed fields. [snip]
 
 If my assumption is badly wrong on that then perhaps HOT would not be
 useful after all. If we find that the majority of UPDATEs meet the HOT
 pre-conditions, then I would continue to advocate it.

Just to confirm that the scenario is valid: our application has almost
all it's updates affecting only non-indexed columns. There are a few
exceptions, but the vast majority is non-indexed, and that holds to the
execution frequency too, not just for the count of tables/queries.

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Deadlock with pg_dump?

2006-10-27 Thread Csaba Nagy
On Fri, 2006-10-27 at 09:23, Albe Laurenz wrote:
  [ Memo to hackers: why is it that log_min_error_statement = error
  isn't the default? ]
 
 To avoid spamming the logs with every failed SQL statement?

And it would be hurting applications where query failure is taken as a
valid path (as inserting first and update if failing)...

Cheers,
Csaba.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


  1   2   >