Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Naz Gassiep
I would be *very* concerned that system time is not a guaranteed 
monotonic entity. Surely a counter or other internally managed mechanism 
would be a better solution.


Furthermore, what would be the ramifications of master and slave system 
times being out of sync?


Finally what if system time is rolled forward a few minutes as part of a 
correction and there were transactions completed in that time? There is 
a change, albeit small, that two transactions will have the same 
timestamp. More importantly, this will throw all kinds of issues in when 
the slave sees transactions in the future. Even with regular NTP syncs, 
drift can cause a clock to be rolled forward a few milliseconds, 
possibly resulting in duplicate transaction IDs.


In summary, I don't think the use of system time has any place in 
PostgreSQL's internal consistency mechanisms, it is too unreliable an 
environment property. Why can't a counter be used for this instead?


- Naz.

Jan Wieck wrote:
For a future multimaster replication system, I will need a couple of 
features in the PostgreSQL server itself. I will submit separate 
proposals per feature so that discussions can be kept focused on one 
feature per thread.


For conflict resolution purposes in an asynchronous multimaster 
system, the last update definition often comes into play. For this 
to work, the system must provide a monotonically increasing timestamp 
taken at the commit of a transaction. During replication, the 
replication process must be able to provide the remote nodes timestamp 
so that the replicated data will be as of the time it was written on 
the remote node, and not the current local time of the replica, which 
is by definition of asynchronous later.


To provide this data, I would like to add another log directory, 
pg_tslog. The files in this directory will be similar to the clog, but 
contain arrays of timestamptz values. On commit, the current system 
time will be taken. As long as this time is lower or equal to the last 
taken time in this PostgreSQL instance, the value will be increased by 
one microsecond. The resulting time will be added to the commit WAL 
record and written into the pg_tslog file.


If a per database configurable tslog_priority is given, the timestamp 
will be truncated to milliseconds and the increment logic is done on 
milliseconds. The priority is added to the timestamp. This guarantees 
that no two timestamps for commits will ever be exactly identical, 
even across different servers.


The COMMIT syntax will get extended to

COMMIT [TRANSACTION] [WITH TIMESTAMP timestamptz];

The extension is limited to superusers and will override the normally 
generated commit timestamp. This will be used to give the replicating 
transaction on the replica the exact same timestamp it got on the 
originating master node.


The pg_tslog segments will be purged like the clog segments, after all 
transactions belonging to them have been stamped frozen. A frozen xid 
by definition has a timestamp of epoch. To ensure a system using this 
timestamp feature has enough time to perform its work, a new GUC 
variable defining an interval will prevent vacuum from freezing xid's 
that are younger than that.


A function get_commit_timestamp(xid) returning timpstamptz will return 
the commit time of a transaction as recorded by this feature.



Comments, changes, additions?

Jan



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] tsearch in core patch, for inclusion

2007-01-26 Thread Naz Gassiep

Andrew Dunstan wrote:

I am constantly running into this:

Q. Does PostgreSQL have full text indexing?
A. Yes it is in contrib.
Q. But that isn't part of core.
A. *sigh*

Where on the website can I see what plugins are included with 
PostgreSQL?


Where on the website can I see the Official PostgreSQL Documentation for
Full Text Indexing?

With TSearch2 in core will that fix the many upgrade problems associated
with using TSearch2?


  


contrib is a horrible misnomer. Can we maybe bite the bullet and call 
it something else?
After years of PG use, I am still afraid to use contrib modules because 
it just *feels* like voodoo. I have spent much time reading this mailing 
list and on IRC with PG users, and I know that contrib modules are on 
the whole tested and safe, but the lack of web documentation and any 
indication of what they do other than check the notes that come with 
the source makes me just feel like they are use and cross fingers 
type thing.


I don't know how hard it would be to implement, but perhaps contrib 
modules could be compiled in a similar way to Apache modules. E.g., 
./configure --with-modulename   with the onus for packaging them 
appropriately falling onto the shoulders of the module authors. I feel 
that even a basic module management system like this would greatly 
increase awareness of and confidence in the contrib modules. Oh, and

+1 on renaming contrib
+1 on the need for a comprehensive list of them
+1 on the need for more doc on the website about each of them, onus 
falling on module authors, perhaps require at least a basic doc patch as 
a requirement for /contrib inclusion.


- Naz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-26 Thread Heikki Linnakangas
I'd like to see still more evidence that it's a problem before we start 
changing that piece of code. It has served us well for years.


Bruce Momjian wrote:

Is there a TODO here?

---

Heikki Linnakangas wrote:

Pavan Deolasee wrote:

Another simpler solution for VACUUM would be to read the entire CLOG file
in local memory. Most of the transaction status queries can be satisfied
from
this local copy and the normal CLOG is consulted only when the status is
unknown (TRANSACTION_STATUS_IN_PROGRESS)

The clog is only for finished (committed/aborted/crashed) transactions.
If a transaction is in progress, the clog is never consulted. Anyway,
that'd only be reasonable for vacuums, and I'm actually more worried if
we had normal backends thrashing the clog buffers.



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] autovacuum process handling

2007-01-26 Thread Markus Schiltknecht

Hi,

Alvaro Herrera wrote:

Yeah.  For what I need, the launcher just needs to know when a worker
has finished and how many workers there are.


Oh, so it's not all that less communication. My replication manager also 
needs to know when a worker dies. You said you are using a signal from 
manager to postmaster to request a worker to be forked. How do you do 
the other part, where the postmaster needs to tell the launcher which 
worker terminated?


For Postgres-R, I'm currently questioning if I shouldn't merge the 
replication manager process with the postmaster. Of course, that would 
violate the postmaster does not touch shared memory constraint.


I suggest you don't.  Reliability from Postmaster is very important.


Yes, so? As long as I can't restart the replication manager, but 
operation of the whole DBMS relies on it, I have to take the postmaster 
dows as soon as it detects a crashed replication manager.


So I still argue that reliability is getting better than status quo, if 
I'm merging these two processes (because of less code for communication 
between the two).


Of course, the other way to gain reliability would be to make the 
replication manager restartable. But restarting the replication manager 
means recovering data from other nodes in the cluster, thus a lot of 
network traffic. Needless to say, this is quite an expensive operation.


That's why I'm questioning, if that's the behavior we want. Isn't it 
better to force the administrators to look into the issue and probably 
replace a broken node instead of having one node going amok by 
requesting recovery over and over again, possibly forcing crashes of 
other nodes, too, because of the additional load for recovery?



But it would make some things a lot easier:

 * What if the launcher/manager dies (but you potentially still have
   active workers)?

   Maybe, for autovacuum you can simply restart the launcher and that
   one detects workers from shmem.

   With replication, I certainly have to take down the postmaster as
   well, as we are certainly out of sync and can't simply restart the
   replication manager. So in that case, no postmaster can run without a
   replication manager and vice versa. Why not make it one single
   process, then?


Well, the point of the postmaster is that it can notice when one process
dies and take appropriate action.  When a backend dies, the postmaster
closes all others.  But if the postmaster crashes due to a bug in the
manager (due to both being integrated in a single process), how do you
close the backends?  There's no one to do it.


That's a point.

But again, as long as the replication manager won't be able to restart, 
you gain nothing by closing backends on a crashed node.



In my case, the launcher is not critical.  It can die and the postmaster
should just start a new one without much noise.  A worker is critical
because it's connected to tables; it's as critical as a regular backend.
So if a worker dies, the postmaster must take everyone down and cause a
restart.  This is pretty easy to do.


Yeah, that's the main difference, and I see why your approach makes 
perfect sense for the autovacuum case.


In contrast, the replication manager is critical (to one node), and a 
restart is expensive (for the whole cluster).



 * Startup races: depending on how you start workers, the launcher/
   manager may get a database is starting up error when requesting
   the postmaster to fork backends.
   That probably also applies to autovacuum, as those workers shouldn't
   work concurrently to a startup process. But maybe there are other
   means of ensuring that no autovacuum gets triggered during startup?


Oh, this is very easy as well.  In my case the launcher just sets a
database OID to be processed in shared memory, and then calls
SendPostmasterSignal with a particular value.  The postmaster must only
check this signal within ServerLoop, which means it won't act on it
(i.e., won't start a worker) until the startup process has finished.


It seems like your launcher is perfectly fine with requesting workers 
and not getting them. The replication manager currently isn't. Maybe I 
should make it more fault tolerant in that regard...



I guess your problem is that the manager's task is quite a lot more
involved than my launcher's.  But in that case, it's even more important
to have them separate.


More involved with what? It does not touch shared memory, it mainly 
keeps track of the backends states (by getting a notice from the 
postmaster) and does all the necessary forwarding of messages between 
the communication system and the backends. It's main loop is similar to 
the postmasters, mainly consisting of a select().



I don't understand why the manager talks to postmaster.  If it doesn't,
well, then there's no concurrency issue gone, because the remote
backends will be talking to *somebody* anyway; be it postmaster, or
manager.


As with your launcher, I only send one message: the worker 

Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Markus Schiltknecht

Hi,

Jan Wieck wrote:
The replication system I have in mind will have another field type of 
the balance nature, where it will never communicate the current value 
but only deltas that get applied regardless of the two timestamps.


I'd favor a more generally usable conflict resolution function 
interface, on top of which you can implement both, the last update 
wins as well as the balance conflict resolution type.


Passing the last common ancestor and the two conflicting heads to the 
conflict resolution function (CRF) should be enough. That would easily 
allow to implement the balance type (as you can calculate both 
deltas). And if you want to rely on something as arbitrary as a 
timestamp, you'd simply have to add a timestamp column to your table and 
let the CRF decide uppon that.


This would allow pretty much any type of conflict resolution, for 
example: higher priority cleanup transactions, which change lots of 
tuples and should better not be aborted later on. Those could be 
implemented by adding a priority column and having the CRF respect that 
one, too.


To find the last common ancestor tuple, transaction ids and MVCC are 
enough. You wouldn't need to add timestamps. You'd only have to make 
sure VACUUM doesn't delete tuples you still need.


Regards

Markus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Heikki Linnakangas

Jan Wieck wrote:
But it is a datum that needs to be collected at the moment where 
basically the clog entry is made ... I don't think any external module 
can do that ever.


How atomic does it need to be? External modules can register callbacks 
that get called right after the clog update and removing the xid from 
MyProc entry. That's about as close to making the clog entry you can get.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Recursive Queries

2007-01-26 Thread Hubert FONGARNAND
The CONNECT BY patch from evgen potemkin has been ported to  pg 8.2...
and it's now in BSD License...

I will test it on our test environement

Le jeudi 25 janvier 2007 à 11:08 +, Gregory Stark a écrit :

 Hm, having skimmed through the Evgen Potemkin's recursive queries patch I find
 it quite different from what I was expecting. My own thinking was headed off
 in a different direction.
 
 Basically the existing patch reimplements a new kind of join which implements
 a kind of nested loop join (with newer versions adding a kind of hash join)
 which feeds a new kind of tuplestore called a tupleconn.
 
 I was thinking to have a new node above a standard join. The new node would
 repeatedly feed back down to the join the results of the previous iteration
 and reexecute the join to get the next generation.
 
 I think my approach is more in line with the DB2/ANSI WITH style query which
 is expected to do a breadth-first search. The Oracle CONNECT BY syntax is
 expected to do a depth first search.
 
 I have two major issues with the repeated-join model though. 
 
 a) Ideally we would want to switch between nested loop, merge join, and hash
 join depending on the size of the previous generation. That means the join
 node wouldn't be the same type of join for all the iterations. This is
 important since in most applications you're traversing either up or down a
 tree and are likely starting with very few nodes but often ending up with very
 broad levels with many nodes. No single type of join will be appropriate for
 the whole plan execution.
 
 b) I do want to be able to support depth-first searching too. I'm not sure how
 to reconcile that with the repeated-join conceptual model. We could always
 resort the entire result set after generating it but that seems like an
 unsatisfactory solution.
 
___
Ce message et les �ventuels documents joints peuvent contenir des informations 
confidentielles.
Au cas o� il ne vous serait pas destin�, nous vous remercions de bien vouloir 
le supprimer et en aviser imm�diatement l'exp�diteur. Toute utilisation de ce 
message non conforme � sa destination, toute diffusion ou publication, totale 
ou partielle et quel qu'en soit le moyen est formellement interdite.
Les communications sur internet n'�tant pas s�curis�es, l'int�grit� de ce 
message n'est pas assur�e et la soci�t� �mettrice ne peut �tre tenue pour 
responsable de son contenu.


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Markus Schiltknecht

Hi,

Nice proposal. I'd support that enhancement and could make use of such 
triggers in Postgres-R as well, at least to provide these triggers to 
the user.


Jan Wieck wrote:
Good question. I don't know. I'd rather error on the safe side and make 
it multiple states, for now I only have Normal and Replica mode.


Are these triggers intended to help implement async replication or are 
these for users to be able to take action on remote replay of a 
transaction (i.e. on the replica)? Does that give a further distinction?


In Postgres-R, I mostly use the terms 'local' and 'remote'. Also, 
normal mode can easily be confused with non-replicated mode, thus 
I'd not mix that with replicated, local transaction mode (even if it's 
mostly equal, as in this case). My naming proposal would thus be:


A   fires always (i.e. fires N times, where N = nr of nodes)
L   fires on the transaction local node (i.e. only exactly once)
R   fires on the remote nodes only (i.e. (N - 1) times)
0   fires never

'1' for fires on both nodes seems confusing as well, because it's not 
like in single node DB operation, in that one event can fire the trigger 
multiple times (on different nodes). The current, single node PostgreSQL 
should thus use '0' or 'L'.


Regards

Markus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] crash on 8.2 and cvshead - failed to add item to the

2007-01-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
I'm afraid the bug has been there for ages, but the 90%-fillfactor on 
rightmost page patch made it much more likely to get triggered.


But that patch has been there for ages too; the only new thing in 8.2 is
that the fillfactor is configurable, but its default is the same.  So
I'm still wondering why the bug isn't seen in 8.1.  (Joe, did you try
anything older than 8.1?)


The hardcoded fillfactor was 90% when building an index, and that's 
still the default. However, when inserting to an existing index, the 
fillfactor on the rightmost page was 2/3. It was changed to use the 
user-configurable fillfactor, which now defaults to 90%.


Hmm. Now that I think of it, we might have the same bug in nbtsort.c. 
I'll have a look...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Thu, 2007-01-25 at 22:19 -0500, Jan Wieck wrote:

 The idea is to clone an existing serializable transactions snapshot 
 visibility information from one backend to another. The semantics would 
 be like this:
 
  backend1: start transaction;
  backend1: set transaction isolation level serializable;
  backend1: select pg_backend_pid();
  backend1: select publish_snapshot(); -- will block

Great idea. It can also be used by pg_dump to publish its snapshot so
that we can make VACUUM continue to process effectively while it pg_dump
is running.

Two questions:
- why does it have to block? I don't see any reason - the first process
can begin doing useful work. The second process might fail or itself be
blocked by something.

- why just serializable snapshots?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Gregory Stark
Jan Wieck [EMAIL PROTECTED] writes:

 backend1: select publish_snapshot(); -- will block

 backend2: start transaction;
 backend2: set transaction isolation level serializable;
 backend2: select clone_snapshot(pid); -- will unblock backend1

It seems simpler to have a current_snapshot() function that returns an bytea
or a new snapshot data type which set_current_snapshot(bytea) took to change
your snapshot. Then you could use tables or out-of-band communication to pass
around your snapshots however you please. 

set_current_snapshot() would have to sanity check that the xmin of the new
snapshot isn't older than the current globaloldestxmin. 

That could be handy for debugging purposes too. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Theo Schlossnagle

Jan, et. al.,

On Jan 26, 2007, at 2:37 AM, Naz Gassiep wrote:
I would be *very* concerned that system time is not a guaranteed  
monotonic entity. Surely a counter or other internally managed  
mechanism would be a better solution.


As you should be concerned.  Looking on my desk through the last few  
issues in IEEE Transactions on Parallel and Distributed Systems, I  
see no time synch stuff for clusters of machines that is actually  
based on time.  Almost all rely on something like a Lamport timestamp  
or some relaxation thereof.  A few are based off a tree based pulse.   
Using actual times is fraught with problems and is typically  
inappropriate for cluster synchronization needs.


Furthermore, what would be the ramifications of master and slave  
system times being out of sync?


I'm much more concerned with the overall approach.  The algorithm for  
replication should be published in theoretic style with a thorough  
analysis of its assumptions and a proof of correctness based on those  
assumptions.  Databases and replication therein are definitely  
technologies that aren't off-the-cuff, and rigorous academic  
discussion and acceptance before they will get adopted.  People  
generally will not adopt technologies to store mission critical data  
until they are confident that it will both work as designed and work  
as implemented -- the second is far less important as the weakness  
there are simply bugs.


I'm not implying that this rigorous dissection of replication design  
hasn't happened, but I didn't see it referenced anywhere in this  
thread.  Can you point me to it?  I've reviewed many of these papers  
and would like to better understand what you are aiming at.


Best regards,

Theo Schlossnagle



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Jan Wieck

On 1/26/2007 2:37 AM, Naz Gassiep wrote:
I would be *very* concerned that system time is not a guaranteed 
monotonic entity. Surely a counter or other internally managed mechanism 
would be a better solution.


Such a counter has only local relevance. How do you plan to compare 
the two separate counters on different machines to tell which 
transaction happened last?


Even if the system clock isn't monotonically increasing, the described 
increment system guarantees the timestamp used to appear so. Granted, 
this system will not work too well on a platform that doesn't allow to 
slew the system clock.




Furthermore, what would be the ramifications of master and slave system 
times being out of sync?


The origin of a transaction must scan all tuples it updates and make 
sure that the timestamp it uses for commit appears in the future with 
respect to them.




Finally what if system time is rolled forward a few minutes as part of a 
correction and there were transactions completed in that time? There is 
a change, albeit small, that two transactions will have the same 
timestamp. More importantly, this will throw all kinds of issues in when 
the slave sees transactions in the future. Even with regular NTP syncs, 
drift can cause a clock to be rolled forward a few milliseconds, 
possibly resulting in duplicate transaction IDs.


In summary, I don't think the use of system time has any place in 
PostgreSQL's internal consistency mechanisms, it is too unreliable an 
environment property. Why can't a counter be used for this instead?


This is nothing used for PostgreSQL's consistency. It is a vehicle 
intended to be used to synchronize the last update wins decision 
process of an asynchronous multimaster system. If not with a timestamp, 
how would you make sure that the replication processes of two different 
nodes will come to the same conclusion as to which update was last? 
Especially considering that the replication might take place hours after 
the original transaction happened.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Simon Riggs
On Thu, 2007-01-25 at 18:16 -0500, Jan Wieck wrote:

 To provide this data, I would like to add another log directory, 
 pg_tslog. The files in this directory will be similar to the clog, but 
 contain arrays of timestamptz values. On commit, the current system time 
 will be taken. As long as this time is lower or equal to the last taken 
 time in this PostgreSQL instance, the value will be increased by one 
 microsecond. The resulting time will be added to the commit WAL record 
 and written into the pg_tslog file.

A transaction time table/log has other uses as well, so its fairly
interesting to have this.

  COMMIT [TRANSACTION] [WITH TIMESTAMP timestamptz];
 
 The extension is limited to superusers and will override the normally 
 generated commit timestamp. 

I don't think its acceptable to override the normal timestamp. That
could lead to non monotonic time values which could screw up PITR. My
view is that you still need PITR even when you are using replication,
because the former provides recoverability and the latter provides
availability.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Jan Wieck

On 1/26/2007 8:06 AM, Gregory Stark wrote:

Jan Wieck [EMAIL PROTECTED] writes:


backend1: select publish_snapshot(); -- will block

backend2: start transaction;
backend2: set transaction isolation level serializable;
backend2: select clone_snapshot(pid); -- will unblock backend1


It seems simpler to have a current_snapshot() function that returns an bytea
or a new snapshot data type which set_current_snapshot(bytea) took to change
your snapshot. Then you could use tables or out-of-band communication to pass
around your snapshots however you please. 


set_current_snapshot() would have to sanity check that the xmin of the new
snapshot isn't older than the current globaloldestxmin. 


That would solve the backend to backend IPC problem nicely.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Jan Wieck

On 1/26/2007 8:26 AM, Simon Riggs wrote:

On Thu, 2007-01-25 at 18:16 -0500, Jan Wieck wrote:

To provide this data, I would like to add another log directory, 
pg_tslog. The files in this directory will be similar to the clog, but 
contain arrays of timestamptz values. On commit, the current system time 
will be taken. As long as this time is lower or equal to the last taken 
time in this PostgreSQL instance, the value will be increased by one 
microsecond. The resulting time will be added to the commit WAL record 
and written into the pg_tslog file.


A transaction time table/log has other uses as well, so its fairly
interesting to have this.


 COMMIT [TRANSACTION] [WITH TIMESTAMP timestamptz];

The extension is limited to superusers and will override the normally 
generated commit timestamp. 


I don't think its acceptable to override the normal timestamp. That
could lead to non monotonic time values which could screw up PITR. My
view is that you still need PITR even when you are using replication,
because the former provides recoverability and the latter provides
availability.


Without that it is rendered useless for conflict resolution purposes.

The timestamp used does not necessarily have much to do with the real 
time at commit. Although I'd like it to be as close as possible. This 
timestamp marks the age of the new datum in an update. Since the 
replication is asynchronous, the update on the remote systems will 
happen later, but the timestamp recorded with that datum must be the 
timestamp of the original transaction, not the current time when it is 
replicated remotely. All we have to determine that is the xmin in the 
rows tuple header, so that xmin must resolve to the original 
transactions timestamp.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] HAVING push-down

2007-01-26 Thread Simon Riggs
I've just read a paper that says PostgreSQL doesn't do this. My reading
of the code is that we *do*  evaluate the HAVING clause prior to
calculating the aggregates for it. I thought I'd check to resolve the
confusion.

- - -

If not, it seems fairly straightforward to push down some or all of a
HAVING clause so that the qual clause is tested prior to aggregation,
not after aggregation. This could, for certain queries, significantly
reduce the amount of effort that the final Agg node performs.

We might think about deeper push-down within the query, but since the
Agg node already has the havingQual, it seems a straightforward act to
decide whether to apply it before or after the aggregation.

We already do find_unaggregated_cols(), so little additional analysis
seems required.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-01-26 kell 12:25, kirjutas Simon Riggs:
 On Thu, 2007-01-25 at 22:19 -0500, Jan Wieck wrote:
 
  The idea is to clone an existing serializable transactions snapshot 
  visibility information from one backend to another. The semantics would 
  be like this:
  
   backend1: start transaction;
   backend1: set transaction isolation level serializable;
   backend1: select pg_backend_pid();
   backend1: select publish_snapshot(); -- will block
 
 Great idea. It can also be used by pg_dump to publish its snapshot so
 that we can make VACUUM continue to process effectively while it pg_dump
 is running.

Do you mean we that vacuum would clean up tuples still visible to
pgdump ?

 Two questions:
 - why does it have to block? I don't see any reason - the first process
 can begin doing useful work. The second process might fail or itself be
 blocked by something.

As I see it, it has to block so that it's transaction woud not end so
that the system knows that it can't yet remove tuples in that snapshot.

And it should block util all its consumers have ended their use of the
published snapshot

 - why just serializable snapshots?

There s probably no point to aquire it into read-commited transaction
when the next command will revert to its own snapshot anyway.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-26 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 I'd like to see still more evidence that it's a problem before we start 
 changing that piece of code. It has served us well for years.

So the TODO could be investigate whether caching pg_clog and/or
pg_subtrans in local memory can be useful for vacuum performance.

 Bruce Momjian wrote:
 Is there a TODO here?
 
 ---
 
 Heikki Linnakangas wrote:
 Pavan Deolasee wrote:
 Another simpler solution for VACUUM would be to read the entire CLOG file
 in local memory. Most of the transaction status queries can be satisfied
 from
 this local copy and the normal CLOG is consulted only when the status is
 unknown (TRANSACTION_STATUS_IN_PROGRESS)
 The clog is only for finished (committed/aborted/crashed) transactions.
 If a transaction is in progress, the clog is never consulted. Anyway,
 that'd only be reasonable for vacuums, and I'm actually more worried if
 we had normal backends thrashing the clog buffers.


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [pgsql-patches] pg_dump pretty_print

2007-01-26 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Peter Eisentraut replied:

 The harm here is that under undefined circumstances a dump file
 will not be a proper and robust representation of the original
 database, which would add significant confusion and potential for error.

What undefined circumstances are we talking here? If there is a chance
that pg_get_viewdef and company do not output a version that can be
read again by the database because we simply changed the whitespace, that
sounds like a serious bug to be fixed, not a reason to reject this
optional flag.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200701251003
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iD8DBQFFuXd2vJuQZxSWSsgRA9VDAJ9S1b+4DJomO3Bmij4wvida9wtgfgCeID16
qeoNrrehtTGIeJeL8T+mx9M=
=VecV
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 16:09 +0200, Hannu Krosing wrote:
 Ühel kenal päeval, R, 2007-01-26 kell 12:25, kirjutas Simon Riggs:

  Two questions:
  - why does it have to block? I don't see any reason - the first process
  can begin doing useful work. The second process might fail or itself be
  blocked by something.
 
 As I see it, it has to block so that it's transaction woud not end so
 that the system knows that it can't yet remove tuples in that snapshot.
 
 And it should block util all its consumers have ended their use of the
 published snapshot

Agreed that the Snapshot must be visible to all, but thats no reason why
the original call has to block, just that we must do something to
prevent the Snapshot from disappearing from view.

  - why just serializable snapshots?
 
 There s probably no point to aquire it into read-commited transaction
 when the next command will revert to its own snapshot anyway.

But the stated use case was to share snapshots, which seems valid
whatever the type of Snapshot. One of the stated cases was parallel
query...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum process handling

2007-01-26 Thread Alvaro Herrera
Markus Schiltknecht wrote:
 Hi,
 
 Alvaro Herrera wrote:
 Yeah.  For what I need, the launcher just needs to know when a worker
 has finished and how many workers there are.
 
 Oh, so it's not all that less communication. My replication manager also 
 needs to know when a worker dies. You said you are using a signal from 
 manager to postmaster to request a worker to be forked. How do you do 
 the other part, where the postmaster needs to tell the launcher which 
 worker terminated?

I haven't done that yet, since the current incarnation does not need it.
But I have considered using some signal like SIGUSR1 to mean something
changed in your processes, look into your shared memory.  The
autovacuum shared memory area would contain PIDs (or maybe PGPROC
pointers?) of workers; so when the launcher goes to check that it
notices that one worker is no longer there, meaning that it must have
terminated its job.

 For Postgres-R, I'm currently questioning if I shouldn't merge the 
 replication manager process with the postmaster. Of course, that would 
 violate the postmaster does not touch shared memory constraint.
 
 I suggest you don't.  Reliability from Postmaster is very important.
 
 Yes, so? As long as I can't restart the replication manager, but 
 operation of the whole DBMS relies on it, I have to take the postmaster 
 dows as soon as it detects a crashed replication manager.

Sure.  But you also need to take down all regular backends, and bgwriter
as well.  If the postmaster just dies, this won't work cleanly.

 That's why I'm questioning, if that's the behavior we want. Isn't it 
 better to force the administrators to look into the issue and probably 
 replace a broken node instead of having one node going amok by 
 requesting recovery over and over again, possibly forcing crashes of 
 other nodes, too, because of the additional load for recovery?

Maybe what you want, then, is that when the replication manager dies,
then the postmaster should close all processes and then shut itself
down.  This also can be arranged easily.

But just crashing the postmaster because the manager sees something
wrong is certainly not a good idea.

 Well, the point of the postmaster is that it can notice when one process
 dies and take appropriate action.  When a backend dies, the postmaster
 closes all others.  But if the postmaster crashes due to a bug in the
 manager (due to both being integrated in a single process), how do you
 close the backends?  There's no one to do it.

 That's a point.
 
 But again, as long as the replication manager won't be able to restart, 
 you gain nothing by closing backends on a crashed node.

Sure you do -- they won't corrupt anything :-)  Plus, what use are
running backends in a multimaster environment, if they can't communicate
with the outside?  Much better would be, AFAICS, to shut everyone down
so that the users can connect to a working node.

 I guess your problem is that the manager's task is quite a lot more
 involved than my launcher's.  But in that case, it's even more important
 to have them separate.
 
 More involved with what? It does not touch shared memory, it mainly 
 keeps track of the backends states (by getting a notice from the 
 postmaster) and does all the necessary forwarding of messages between 
 the communication system and the backends. It's main loop is similar to 
 the postmasters, mainly consisting of a select().

I meant more complicated.  And if it has to listen on a socket and
forward messages to remote backends, it certainly is a lot more
complicated than the current autovac launcher.

 I don't understand why the manager talks to postmaster.  If it doesn't,
 well, then there's no concurrency issue gone, because the remote
 backends will be talking to *somebody* anyway; be it postmaster, or
 manager.
 
 As with your launcher, I only send one message: the worker request. But 
 the other way around, from the postmaster to the replication manager, 
 there are also some messages: a database is ready message and a 
 worker terminated messages. Thinking about handling the restarting 
 cycle, I would need to add a database is restarting messages, which 
 has to be followed by another database is ready message.
 
 For sure, the replication manager needs to keep running during a 
 restarting cycle. And it needs to know the database's state, so as to be 
 able to decide if it can request workers or not.

I think this would be pretty easy to do if you made the remote backends
keep state in shared memory.  The manager just needs to get a signal to
know that it should check the shared memory.  This can be arranged
easily: just have the remote backends signal the postmaster, and have
the postmaster signal the manager.  Alternatively, have the manager PID
stored in shared memory and have the remote backends signal (SIGUSR1 or
some such) the manager.  (bgwriter does this: it announces its PID in
shared memory, and the backends signal it when they want a CHECKPOINT).

 I 

Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 16:09 +0200, Hannu Krosing wrote:
 Ühel kenal päeval, R, 2007-01-26 kell 12:25, kirjutas Simon Riggs:

  Great idea. It can also be used by pg_dump to publish its snapshot so
  that we can make VACUUM continue to process effectively while it pg_dump
  is running.
 
 Do you mean we that vacuum would clean up tuples still visible to
 pgdump ?

No, that would break MVCC. But we may have done lots of updates/deletes
that are *not* visible to any Snapshot, yet are not yet removable
because they are higher than OldestXmin but we don't know that because
previously the Snapshot details were not available. ISTM that this
proposal is a way of making the Snapshot limits publicly available so
that they can be used by VACUUM. Sure it isn't every backend, but the
details may be useful. So this is an additional benefit to this
proposal. (There's a hole in the above idea, so don't jump on my back to
explain it - I see it and am trying to work out a way around it...)

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Stephen Frost
* Jan Wieck ([EMAIL PROTECTED]) wrote:
 On 1/26/2007 2:37 AM, Naz Gassiep wrote:
 I would be *very* concerned that system time is not a guaranteed 
 monotonic entity. Surely a counter or other internally managed mechanism 
 would be a better solution.
 
 Such a counter has only local relevance. How do you plan to compare 
 the two separate counters on different machines to tell which 
 transaction happened last?

I'd also suggest you look into Lamport timestamps...  Trusting the
system clock just isn't practical, even with NTP.  I've developed
(albeit relatively small) systems using Lamport timestamps and would be
happy to talk about it offlist.  I've probably got some code I could
share as well.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] crash on 8.2 and cvshead - failed to add item to the

2007-01-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'm still wondering why the bug isn't seen in 8.1.

 The hardcoded fillfactor was 90% when building an index, and that's 
 still the default. However, when inserting to an existing index, the 
 fillfactor on the rightmost page was 2/3. It was changed to use the 
 user-configurable fillfactor, which now defaults to 90%.

Ah.  I thought I remembered that those had been two separate changes,
but you're right, 8.1 and before always split 1:1 or 2:1.  So it'd take
a really nasty corner case to expose the bug there.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Andrew Dunstan

Stephen Frost wrote:

I'd also suggest you look into Lamport timestamps...  Trusting the
system clock just isn't practical, even with NTP.  I've developed
(albeit relatively small) systems using Lamport timestamps and would be
happy to talk about it offlist.  I've probably got some code I could
share as well.
  


that looks like what Oracle RAC uses: 
http://www.lc.leidenuniv.nl/awcourse/oracle/rac.920/a96597/coord.htm


cheers

andrew



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] BUG #2917: spi_prepare doesn't accept typename aliases

2007-01-26 Thread Andrew Dunstan

I wrote:

Tom Lane wrote:


I think parseTypeString() may be the thing to use.  It's what plpgsql
uses...

  


OK, I'll see what I can do.



see attached patch.  If this is OK I will apply it and also fix pltcl 
and plpython similarly, mutatis mutandis.


cheers

andrew
Index: src/pl/plperl/plperl.c
===
RCS file: /cvsroot/pgsql/src/pl/plperl/plperl.c,v
retrieving revision 1.123
diff -c -r1.123 plperl.c
*** src/pl/plperl/plperl.c	21 Nov 2006 16:59:02 -	1.123
--- src/pl/plperl/plperl.c	26 Jan 2007 15:13:05 -
***
*** 2128,2145 
  	PG_TRY();
  	{
  		/
! 		 * Lookup the argument types by name in the system cache
! 		 * and remember the required information for input conversion
  		 /
  		for (i = 0; i  argc; i++)
  		{
! 			List	   *names;
  			HeapTuple	typeTup;
  
! 			/* Parse possibly-qualified type name and look it up in pg_type */
! 			names = stringToQualifiedNameList(SvPV(argv[i], PL_na),
! 			  plperl_spi_prepare);
! 			typeTup = typenameType(NULL, makeTypeNameFromNameList(names));
  			qdesc-argtypes[i] = HeapTupleGetOid(typeTup);
  			perm_fmgr_info(((Form_pg_type) GETSTRUCT(typeTup))-typinput,
  		   (qdesc-arginfuncs[i]));
--- 2128,2152 
  	PG_TRY();
  	{
  		/
! 		 * Resolve argument type names and then look them up by oid 
!  * in the system cache, and remember the required information 
!  * for input conversion.
  		 /
  		for (i = 0; i  argc; i++)
  		{
! 			Oid typeId;
! int32   typmod;
  			HeapTuple	typeTup;
  
! 			parseTypeString(SvPV(argv[i], PL_na), typeId, typmod);
! 
! 			typeTup = SearchSysCache(TYPEOID,
! 	 ObjectIdGetDatum(typeId),
! 	 0,0,0);
! 			if (!HeapTupleIsValid(typeTup))
! elog(ERROR, cache lookup failed for type %u, typeId);
! 
! 			
  			qdesc-argtypes[i] = HeapTupleGetOid(typeTup);
  			perm_fmgr_info(((Form_pg_type) GETSTRUCT(typeTup))-typinput,
  		   (qdesc-arginfuncs[i]));
Index: src/pl/plperl/expected/plperl.out
===
RCS file: /cvsroot/pgsql/src/pl/plperl/expected/plperl.out,v
retrieving revision 1.9
diff -c -r1.9 plperl.out
*** src/pl/plperl/expected/plperl.out	13 Aug 2006 17:31:10 -	1.9
--- src/pl/plperl/expected/plperl.out	26 Jan 2007 15:13:05 -
***
*** 438,444 
  -- Test spi_prepare/spi_exec_prepared/spi_freeplan
  --
  CREATE OR REPLACE FUNCTION perl_spi_prepared(INTEGER) RETURNS INTEGER AS $$
!my $x = spi_prepare('select $1 AS a', 'INT4');
 my $q = spi_exec_prepared( $x, $_[0] + 1);
 spi_freeplan($x);
  return $q-{rows}-[0]-{a};
--- 438,444 
  -- Test spi_prepare/spi_exec_prepared/spi_freeplan
  --
  CREATE OR REPLACE FUNCTION perl_spi_prepared(INTEGER) RETURNS INTEGER AS $$
!my $x = spi_prepare('select $1 AS a', 'INTEGER');
 my $q = spi_exec_prepared( $x, $_[0] + 1);
 spi_freeplan($x);
  return $q-{rows}-[0]-{a};
***
*** 468,470 
--- 468,504 
   4
  (2 rows)
  
+ --
+ -- Test prepare with a type with spaces
+ --
+ CREATE OR REPLACE FUNCTION perl_spi_prepared_double(double precision) RETURNS double precision AS $$
+   my $x = spi_prepare('SELECT 10.0 * $1 AS a', 'DOUBLE PRECISION');
+   my $q = spi_query_prepared($x,$_[0]);
+   my $result;
+   while (defined (my $y = spi_fetchrow($q))) {
+   $result = $y-{a};
+   }
+   spi_freeplan($x);
+   return $result;
+ $$ LANGUAGE plperl;
+ SELECT perl_spi_prepared_double(4.35) as double precision;
+  double precision 
+ --
+  43.5
+ (1 row)
+ 
+ --
+ -- Test with a bad type
+ --
+ CREATE OR REPLACE FUNCTION perl_spi_prepared_bad(double precision) RETURNS double precision AS $$
+   my $x = spi_prepare('SELECT 10.0 * $1 AS a', 'does_not_exist');
+   my $q = spi_query_prepared($x,$_[0]);
+   my $result;
+   while (defined (my $y = spi_fetchrow($q))) {
+   $result = $y-{a};
+   }
+   spi_freeplan($x);
+   return $result;
+ $$ LANGUAGE plperl;
+ SELECT perl_spi_prepared_bad(4.35) as double precision;
+ ERROR:  error from Perl function: type does_not_exist does not exist at line 2.
Index: src/pl/plperl/sql/plperl.sql
===
RCS file: /cvsroot/pgsql/src/pl/plperl/sql/plperl.sql,v
retrieving revision 1.11
diff -c -r1.11 plperl.sql
*** src/pl/plperl/sql/plperl.sql	13 Aug 2006 17:31:10 -	1.11
--- src/pl/plperl/sql/plperl.sql	26 Jan 2007 15:13:05 -
***
*** 316,322 
  -- Test spi_prepare/spi_exec_prepared/spi_freeplan
  --
  CREATE OR REPLACE FUNCTION perl_spi_prepared(INTEGER) RETURNS INTEGER AS $$
!my $x = spi_prepare('select $1 AS a', 'INT4');
 

Re: [HACKERS] HAVING push-down

2007-01-26 Thread Gregory Stark

Simon Riggs [EMAIL PROTECTED] writes:

 I've just read a paper that says PostgreSQL doesn't do this. My reading
 of the code is that we *do*  evaluate the HAVING clause prior to
 calculating the aggregates for it. I thought I'd check to resolve the
 confusion.

 - - -

 If not, it seems fairly straightforward to push down some or all of a
 HAVING clause so that the qual clause is tested prior to aggregation,
 not after aggregation. This could, for certain queries, significantly
 reduce the amount of effort that the final Agg node performs.

You mean in cases like this?

postgres=# explain select  count(*) from customer group by c_w_id,c_d_id,c_id 
having c_w_id = 1 and c_d_id=1 and c_id=1;
 QUERY PLAN 


 GroupAggregate  (cost=0.00..13.61 rows=1 width=12)
   -  Index Scan using pk_customer on customer  (cost=0.00..13.56 rows=4 
width=12)
 Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
(3 rows)

I think we push having clauses into WHERE clauses whenever there are no
aggregates in them.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] BUG #2917: spi_prepare doesn't accept typename aliases

2007-01-26 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 see attached patch.  If this is OK I will apply it and also fix pltcl 
 and plpython similarly, mutatis mutandis.

Looks alright as far as it goes, but I'd suggest making one additional
cleanup while you're in there: get rid of the direct syscache access
altogether, instead using getTypeInputInfo().  The loop body should just
consist of three function calls: parseTypeString, getTypeInputInfo,
perm_fmgr_info.

If you wanted to be a bit more ambitious maybe you could change the fact
that this code is throwing away typmod, which means that declarations
like varchar(32) would fail to work as expected.  Perhaps it should be
fixed to save the typmods alongside the typioparams and then pass them
to InputFunctionCall instead of passing -1.  On the other hand, we don't
currently enforce typmod for any function input or result arguments, so
maybe it's consistent that spi_prepare arguments ignore typmods too.
Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 15:22 +, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
 
  I've just read a paper that says PostgreSQL doesn't do this. My reading
  of the code is that we *do*  evaluate the HAVING clause prior to
  calculating the aggregates for it. I thought I'd check to resolve the
  confusion.
 

 You mean in cases like this?
 
 postgres=# explain select  count(*) from customer group by c_w_id,c_d_id,c_id 
 having c_w_id = 1 and c_d_id=1 and c_id=1;
  QUERY PLAN   
   
 
  GroupAggregate  (cost=0.00..13.61 rows=1 width=12)
-  Index Scan using pk_customer on customer  (cost=0.00..13.56 rows=4 
 width=12)
  Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
 (3 rows)

OK, thanks. I'll feedback to the author of the paper I was reviewing.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] crash on 8.2 and cvshead - failed to add item to the

2007-01-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
To see what's going on, I added some logs to the split code to print out 
the free space on both halves as calculated by findsplitloc, and the 
actual free space on the pages after split. I'm seeing a discrepancy of 
4 bytes on the right half; actual space free on right page after split 
is 4 bytes less than anticipated.


Hm, mis-counting the positions of itempointers maybe?


Found it:

/* Count up total space in data items without actually scanning 'em */
dataitemtotal = rightspace - (int) PageGetFreeSpace(page);

This is 4 bytes off, because PageGetFreeSpace subtracts 
sizeof(ItemIdData) from the actual free space on page. We could do


	dataitemtotal = rightspace - ((int) PageGetFreeSpace(page) 
+sizeof(ItemIdData));


but that again would be 4 bytes off in the other direction if there's 0 
bytes left on the page :(.


IMHO the right fix is to modify PageGetFreeSpace not to do the 
subtraction, it's a hack anyway, but that means we have to go through 
and fix every caller of it. Or we can add a new PageGetReallyFreeSpace 
function and keep the old one for compatibility. What do we want?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Implied Functional index use (redux)

2007-01-26 Thread Simon Riggs
On Thu, 2007-01-25 at 16:20 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  A simpler, alternate proposal is to allow the user to specify whether a
  functional index is transformable or not using CREATE or ALTER INDEX,
  with a default of not transformable. That then leaves the responsibility
  for specifying this with the user, who as we have seen is the really
  only person really capable of judging the whole case on its merits.
 
  e.g. CREATE INDEX fooidx ON foo (foofunc(foocol1)) 
  [TABLESPACE ...] [ENABLE|DISABLE TRANSFORM] [WHERE ...];
 
 This is a foot-gun and nothing else.  I hardly think the average DBA
 will realize such subtleties as numeric equality doesn't guarantee that
 such-and-such works.  If it's not specified by the datatype author
 it's not going to be safe.

OK, no problem.

The most beneficial use case is for string handling: name lookups, case
insensitive indexing and index size reduction generally. If, for some
reason, bpchar were to be excluded then it would take away a great chunk
of benefit.

Two questions:

- Will bpchar be transformable?

- Do you see a clear way forward for specifying the information required
to allow the transform? We need to specify the operator, which might be
taken to include the datatype. (Peter suggested placing this on the
function itself, though I think current precedent is to place on the
operator.) If you can say where you want the info to live, I can work
out the details and repost.

If there's clear benefit and a clear way forward, then we might just be
OK for 8.3. If not, I'll put this back on the shelf again in favour of
other ideas.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Joshua D. Drake
Simon Riggs wrote:
 On Fri, 2007-01-26 at 15:22 +, Gregory Stark wrote:
 Simon Riggs [EMAIL PROTECTED] writes:

 I've just read a paper that says PostgreSQL doesn't do this. My reading
 of the code is that we *do*  evaluate the HAVING clause prior to
 calculating the aggregates for it. I thought I'd check to resolve the
 confusion.

 
 You mean in cases like this?

 postgres=# explain select  count(*) from customer group by 
 c_w_id,c_d_id,c_id having c_w_id = 1 and c_d_id=1 and c_id=1;
  QUERY PLAN  

 
  GroupAggregate  (cost=0.00..13.61 rows=1 width=12)
-  Index Scan using pk_customer on customer  (cost=0.00..13.56 rows=4 
 width=12)
  Index Cond: ((c_w_id = 1) AND (c_d_id = 1) AND (c_id = 1))
 (3 rows)
 
 OK, thanks. I'll feedback to the author of the paper I was reviewing.
 

Care to share the paper in general? It might be beneficial for all of us.

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] crash on 8.2 and cvshead - failed to add item to the

2007-01-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 IMHO the right fix is to modify PageGetFreeSpace not to do the 
 subtraction, it's a hack anyway, but that means we have to go through 
 and fix every caller of it. Or we can add a new PageGetReallyFreeSpace 
 function and keep the old one for compatibility. What do we want?

It'd probably be a good idea to take a look at each caller and see
whether it has a problem with that.  I believe PageGetFreeSpace's
behavior is actually the right thing for many of 'em.  The idea is that
subtracting the 4 bytes is often necessary and always safe/conservative
(but is that true in this case?  We're overestimating dataitemtotal,
can that hurt us?).  Is it worth changing each caller to try to account
exactly for those 4 bytes?

In short, I'm inclined to leave the function alone unless changing it
can be shown to be a win for most callers.  Add a new function
(perhaps PageGetExactFreeSpace would be a better name).

Keep in mind also that we need a minimal-change version for
back-patching.  If this is cleanup rather than bug fix, please
submit it separately.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: Commit timestamp

2007-01-26 Thread Jan Wieck

On 1/26/2007 9:38 AM, Stephen Frost wrote:

* Jan Wieck ([EMAIL PROTECTED]) wrote:

On 1/26/2007 2:37 AM, Naz Gassiep wrote:
I would be *very* concerned that system time is not a guaranteed 
monotonic entity. Surely a counter or other internally managed mechanism 
would be a better solution.


Such a counter has only local relevance. How do you plan to compare 
the two separate counters on different machines to tell which 
transaction happened last?


I'd also suggest you look into Lamport timestamps...  Trusting the
system clock just isn't practical, even with NTP.  I've developed
(albeit relatively small) systems using Lamport timestamps and would be
happy to talk about it offlist.  I've probably got some code I could
share as well.


I think the system I described is a slightly modified Lamport generator. 
The maximum timestamp of any row updated in this transaction, you can 
consider that the counters received from other nodes. Then I make sure 
that the next counter (timestamp) is higher than anything I know so far, 
and I add cluster-wide unique tie breaker to that.


Looking closer, I don't even have to check the timestamps of the rows 
updated. Since a remote transaction replicated will bump the local 
Lamport clock on commit, a local transaction modifying such a row will 
have a timestamp in the future of that remote transaction, even if my 
local clock is limping behind.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Implied Functional index use (redux)

2007-01-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 If there's clear benefit and a clear way forward, then we might just be
 OK for 8.3. If not, I'll put this back on the shelf again in favour of
 other ideas.

I think this is still a long way off, and there are probably more useful
things to work on for 8.3.

Part of my antagonism stems from the fact that by means of the operator
family rewrite I've been getting rid of some longstanding but really
quite unacceptable assumptions about this operator does that.  I don't
want to see us start putting unsupported semantic assumptions back into
the optimizer; rather its assumptions about operator behavior need to be
clearly specified.  As an example, without some careful preliminary
thinking I'd have probably folded all the numeric types into one big
opfamily and thereby broken transitivity :-(, leading to bugs that would
be devilish to figure out.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 I've just read a paper that says PostgreSQL doesn't do this.

What does he mean by that exactly, and which PG version is he looking
at?  As Greg notes, we do know how to push down non-aggregated
conditions, but I'm not sure that's what he's thinking of.  There have
been some relevant bug fixes, eg

2004-07-10 14:39  tgl

* src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test
HAVING condition before computing targetlist of an Aggregate node. 
This is required by SQL spec to avoid failures in cases like  
SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) 
0; AFAICT we have gotten this wrong since day one.  Kudos to Holger
Jakobs for being the first to notice.

Also, it's still true that we run all the aggregate transition functions
in parallel, so if you were hoping to use HAVING on an aggregate
condition to prevent an overflow or something in the state accumulation
function for a targetlist aggregate, you'd lose.  But I don't see any
way to avoid that without scanning the data twice, which we're surely
not gonna do.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 11:16 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  I've just read a paper that says PostgreSQL doesn't do this.
 
 What does he mean by that exactly, and which PG version is he looking
 at?  As Greg notes, we do know how to push down non-aggregated
 conditions, but I'm not sure that's what he's thinking of.  

Yes, it was specifically non-aggregated conditions.

 There have
 been some relevant bug fixes, eg
 
 2004-07-10 14:39  tgl
 
   * src/backend/executor/: nodeAgg.c (REL7_4_STABLE), nodeAgg.c: Test
   HAVING condition before computing targetlist of an Aggregate node. 
   This is required by SQL spec to avoid failures in cases like  
   SELECT sum(win)/sum(lose) FROM ... GROUP BY ... HAVING sum(lose) 
   0; AFAICT we have gotten this wrong since day one.  Kudos to Holger
   Jakobs for being the first to notice.
 
 Also, it's still true that we run all the aggregate transition functions
 in parallel, so if you were hoping to use HAVING on an aggregate
 condition to prevent an overflow or something in the state accumulation
 function for a targetlist aggregate, you'd lose.  But I don't see any
 way to avoid that without scanning the data twice, which we're surely
 not gonna do.

I'll send you the paper off-line, there's some more interesting stuff
also. p.12

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-26 Thread Pavan Deolasee

On 1/26/07, Alvaro Herrera [EMAIL PROTECTED] wrote:


Heikki Linnakangas wrote:
 I'd like to see still more evidence that it's a problem before we start
 changing that piece of code. It has served us well for years.

So the TODO could be investigate whether caching pg_clog and/or
pg_subtrans in local memory can be useful for vacuum performance.



As  Heikki suggested, we should also investigate the same for normal
backends as well.

It would also be interesting to investigate whether early setting of hint
bits
can reduce subsequent writes of blocks. A typical case would be a large
table
being updated heavily for a while, followed by SELECT queries. The SELECT
queries would set hint bits for the previously UPDATEd  tuples (old and new
versions) and thus cause subsequent writes of those blocks for what could
have been read-only queries.

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Chris Browne
[EMAIL PROTECTED] (Markus Schiltknecht) writes:
 Nice proposal. I'd support that enhancement and could make use of such
 triggers in Postgres-R as well, at least to provide these triggers to
 the user.

 Jan Wieck wrote:
 Good question. I don't know. I'd rather error on the safe side and
 make it multiple states, for now I only have Normal and Replica mode.

 Are these triggers intended to help implement async replication or are
 these for users to be able to take action on remote replay of a
 transaction (i.e. on the replica)? Does that give a further
 distinction?

Well, there's specific intent, and then there's general intent...  

If I understand correctly (and I think I do), the various threads that
Jan has been starting do have *specific* intent in that he's got an
implementation in mind that would specifically use the features he's
asking about.

But there is also the general intent that the features be usable
more widely than that.  If some generalization makes this particular
feature useful for Postgres-R as well as Jan's work, that's better
still.

 In Postgres-R, I mostly use the terms 'local' and 'remote'. Also,
 normal mode can easily be confused with non-replicated mode, thus
 I'd not mix that with replicated, local transaction mode (even if it's
 mostly equal, as in this case). My naming proposal would thus be:

 A   fires always (i.e. fires N times, where N = nr of nodes)
 L   fires on the transaction local node (i.e. only exactly once)
 R   fires on the remote nodes only (i.e. (N - 1) times)
 0   fires never

 '1' for fires on both nodes seems confusing as well, because it's
 not like in single node DB operation, in that one event can fire the
 trigger multiple times (on different nodes). The current, single node
 PostgreSQL should thus use '0' or 'L'.

I rather like your L for local and R for remote.

An alternative to A for always would be B, standing for runs
[B]oth on local and remote nodes.

Of course, this is picking at nits; the important question is not what
to call the names of the states, but rather whether the set of states
is both desirable and complete...
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/x.html
Rules  of  the Evil  Overlord  #97.  My  dungeon  cells  will not  be
furnished with  objects that  contain reflective surfaces  or anything
that can be unravelled. http://www.eviloverlord.com/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Chris Browne
[EMAIL PROTECTED] (Gregory Stark) writes:
 Jan Wieck [EMAIL PROTECTED] writes:

 backend1: select publish_snapshot(); -- will block

 backend2: start transaction;
 backend2: set transaction isolation level serializable;
 backend2: select clone_snapshot(pid); -- will unblock backend1

 It seems simpler to have a current_snapshot() function that returns an bytea
 or a new snapshot data type which set_current_snapshot(bytea) took to change
 your snapshot. Then you could use tables or out-of-band communication to pass
 around your snapshots however you please. 

 set_current_snapshot() would have to sanity check that the xmin of the new
 snapshot isn't older than the current globaloldestxmin. 

 That could be handy for debugging purposes too. 

Here's a wild thought...  

Would there be any sense in setting up the ability to declare
expressly a transaction's visibility parameters?

Consider that the Slony-I sl_event table records:
  ev_minxid, ev_maxxid, ev_xip

Grabbing a sample from an instance...
 [ ev_minxid| ev_maxxid| ev_xip ] =  [1377591608 | 1377591612 | 
'1377591608','1377591610']

Would it be plausible to, in effect, assert these things?

To say:
start transaction;
set transaction isolation level serializable;
select set_transaction_visibility(1377591608, 1377591612, [1377591608, 
1377591610]);

And thus assert the visibility that was recorded at that point in
time?

I may very well have the parameters characterized in a wrong way;
please assume an appropriate way instead as needed :-).

This would permit, if I am seeing this right, a way that you could, in
effect, get a form of time travel via this where you'd be able to
arbitrarily point at different forms of data visibility.  The wild
part being that you could assert data visibility declarations that a
normal connection couldn't naturally obtain...
-- 
let name=cbbrowne and tld=linuxdatabases.info in name ^ @ ^ tld;;
http://linuxfinances.info/info/multiplexor.html
Sturgeon's Law: 90% of *EVERYTHING* is crud.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-26 Thread Alvaro Herrera
Pavan Deolasee wrote:
 On 1/26/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 
 Heikki Linnakangas wrote:
  I'd like to see still more evidence that it's a problem before we start
  changing that piece of code. It has served us well for years.
 
 So the TODO could be investigate whether caching pg_clog and/or
 pg_subtrans in local memory can be useful for vacuum performance.
 
 As  Heikki suggested, we should also investigate the same for normal
 backends as well.

Maybe.  An idea that comes to mind is to never cache the latest page,
since it'll most likely result in extra reads anyway because there'll be
a lot of IN_PROGRESS transactions.

Problem to solve: how much memory to dedicate to this?  Could we mmap()
portions of the pg_clog segment, so that the page could be shared across
backends instead of allocating them for each?

 It would also be interesting to investigate whether early setting of
 hint bits can reduce subsequent writes of blocks. A typical case would
 be a large table being updated heavily for a while, followed by SELECT
 queries. The SELECT queries would set hint bits for the previously
 UPDATEd  tuples (old and new versions) and thus cause subsequent
 writes of those blocks for what could have been read-only queries.

This has been suggested before, but I don't see how this could work.
How does the UPDATE transaction go back to the pages it wrote to update
the hint bits, _after_ it committed?

Maybe have the bgwriter update hint bits as it evicts pages out of the
cache?  It could result in pg_clog read traffic for each page that needs
eviction; not such a hot idea.

I don't see how this is related to the above proposal though.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 No, that would break MVCC. But we may have done lots of updates/deletes
 that are *not* visible to any Snapshot, yet are not yet removable
 because they are higher than OldestXmin but we don't know that because
 previously the Snapshot details were not available. ISTM that this
 proposal is a way of making the Snapshot limits publicly available so
 that they can be used by VACUUM.

Certainly not, unless you intend that *every* snapshot *must* be
published, which is an overhead up with which we will not put.

One pretty serious problem with the proposal as written is the part
about the sender blocking until the receiver takes the snap; that means
it's not really a publish in the sense that you can make it available
without worrying about exactly how many readers there might or might not
be.  That alone is sufficient to kill any thought of VACUUM making use
of the info.  I'd feel happier with an implementation more like prepared
transactions: you stuff the information into shared memory and it sits
there, readable by anyone, until such time as you take it down again.
Like prepared xacts, GlobalXmin calculations would need to include these
snapshots (and hence they'd limit vacuums).

A shared-memory area would have to be fixed size, but perhaps backing
files, like those used by prepared xacts, could handle the overflow for
very large xip lists.  Presumably crash safety is not an issue so this
wouldn't require any complicated mechanism.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Implied Functional index use (redux)

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 10:58 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  If there's clear benefit and a clear way forward, then we might just be
  OK for 8.3. If not, I'll put this back on the shelf again in favour of
  other ideas.
 
 I think this is still a long way off, and there are probably more useful
 things to work on for 8.3.
 
 Part of my antagonism stems from the fact that by means of the operator
 family rewrite I've been getting rid of some longstanding but really
 quite unacceptable assumptions about this operator does that.  I don't
 want to see us start putting unsupported semantic assumptions back into
 the optimizer; rather its assumptions about operator behavior need to be
 clearly specified.  As an example, without some careful preliminary
 thinking I'd have probably folded all the numeric types into one big
 opfamily and thereby broken transitivity :-(, leading to bugs that would
 be devilish to figure out.

OK, no problems. All of the above says time, which is becoming rare as
we approach 8.3 anyways. 

I'll pick it up again in 8.4. 

Some notes-to-self for the future:

- ideally want to be able to decide transformability at CREATE INDEX
time; this will reduce planning time for functional index usage when
there is no possible transforms.

- may want to do this by having a special catalog table that holds the
cases that *will* work, to make it both safer and faster to look up.
Sort of like pg_autovacuum - absence means No.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 Ühel kenal päeval, N, 2007-01-25 kell 22:19, kirjutas Jan Wieck:
 The cloning process needs to make sure that the clone_snapshot() call is 
 made from the same DB user in the same database as corresponding 
 publish_snapshot() call was done. 

 Why ? Snapshot is universal and same for whole db instance, so why limit
 it to same user/database ?

Yeah.  Use-case: pg_dumpall could guarantee that it produces consistent
snapshots across multiple databases.  (Not sure I actually want that,
but it's at least arguably useful to someone.)

I think you would want to mark a snapshot with an owner, but that would
be for the purpose of restricting who could take it down, not who could
copy it.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-26 Thread Pavan Deolasee

On 1/26/07, Alvaro Herrera [EMAIL PROTECTED] wrote:



Maybe have the bgwriter update hint bits as it evicts pages out of the
cache?  It could result in pg_clog read traffic for each page that needs
eviction; not such a hot idea.



I thought once we enhance clog so that there are no clog reads,
bgwriter would be able to update hint bits without getting into any deadlock
with pg_clog read.

May be we can have this as a seperate TODO

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] HAVING push-down

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 07:46 -0800, Joshua D. Drake wrote:

 Care to share the paper in general? It might be beneficial for all of us.

I'll ask the author, but don't expect an immediate response.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Gregory Stark

Hm, I had hoped that the DB2/ANSI syntax would only require making WITH a
fully reserved word, and not the other tokens it uses. Certainly for
non-recursive queries that's the case as the only other token it uses is AS
which is already a fully reserved word.

However to fully support the DB2/ANSI syntax we would definitely have an
ambiguity and I think we would have to make CYCLE a fully reserved word
which seems like a much bigger concession than WITH. Observe the following
case:

  WITH RECURSIVE foo (x,y) AS (select 1,2) SEARCH DEPTH FIRST BY x CYCLE x,y 
SET ...

The parser can't search arbitrarily far checking for a SET to see if the CYCLE
is a keyword or a binary operator. Even if it could things like this would be
entirely ambiguous:

  WITH RECURSIVE foo (x,y) AS (select 1,2) SEARCH DEPTH FIRST BY x CYCLE x, y 
CYCLE y SET ...

I'm nowhere near actually implementing this functionality yet so there's no
pressing need for action. In fact I think the search clause is actually an
ANSIism that isn't supported by DB2 itself yet either.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 11:36 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  No, that would break MVCC. But we may have done lots of updates/deletes
  that are *not* visible to any Snapshot, yet are not yet removable
  because they are higher than OldestXmin but we don't know that because
  previously the Snapshot details were not available. ISTM that this
  proposal is a way of making the Snapshot limits publicly available so
  that they can be used by VACUUM.
 
 Certainly not, unless you intend that *every* snapshot *must* be
 published, which is an overhead up with which we will not put.

Agreed, but that's the general case problem.

What I was hoping was that this would provide a mechanism for long
running transactions (LRTs) to publish their min/max Xids. Then if all
backends publish the minimum Xid of any Snapshot they have generated in
the proc array, we'd be able to decide if there are any large holes in
the global set of Snapshots. As a general case that's hard to evaluate,
but in the common case of a lone LRT and all the rest short duration
transactions you can end up with a gap of 250,000+ transactions opening
up between the two. It would be fairly easy to have VACUUM check for
large visibility gaps between groups of transactions and then use that
to improve its effectiveness in the presence of LRTs.

Theoretically we have to keep the chain of intermediate updates around
so it can be traversed by the old transaction, but in practical terms
traversing a long chain of updates isn't sensible. Serializable LRTs
will never traverse the chain anyway (that's a serializability error),
but there are some special cases to consider, hence my mentioning an
unresolved problem previously.

We'd need to be much more careful about the way Snapshots are managed,
so we can be certain that we take them all into account.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 set_current_snapshot() would have to sanity check that the xmin of the new
 snapshot isn't older than the current globaloldestxmin. 

 That would solve the backend to backend IPC problem nicely.

 But it fails on the count of making sure that globaloldestxmin doesn't
 advance past the snap you want to use.  And exactly how will you pass
 a snap through a table?  It won't become visible until you commit ...
 whereupon your own xmin isn't blocking the advance of globaloldestxmin.

Hm, good point. You could always do it in a separate connection, but that
starts to get annoying. I was more envisioning passing it around out-of-band
though, something like:


$db-execute(SET TRANSACTION ISOLATION LEVEL SERIALIZABLE);
$snap = $db-execute(select current_snapshot());

  for each db {
  if (fork())
  $slave[i] = $db-connect();
  $slave[i]-execute(select set_snapshot($snap));
  $slave[i]-execute(copy table[i] to file[i]);
  }


I'm also wondering about something like:

  $db-execute(SET TRANSACTION ISOLATION LEVEL SERIALIZABLE);
  $snap = $db-execute(select current_snapshot());

  if (fork())
  $slave = $db-connect();
  $slave-execute(select set_snapshot($snap);
  $slave-execute(copy tab from hugefile);
  signal parent
  } else {
  while(no signal yet) {
  $rows_loaded_so_far = $db-execute(select count(*) from tab);
  display_progress($rows_loaded_so_far);
  sleep(60);
  }
  }


Sorry for the vaguely perlish pseudocode but it's the clearest way I can think
to write it. I don't think it would make much sense to try to do anything like
this in plpgsql; I think you really do want to be doing it in a language
outside the database where it's easier to open multiple connections and handle
IPC.

I realize the second idea might take more hackery than just setting the
snapshot... In particular as written above it wouldn't work because the slave
would be writing with a new xid that isn't actually in the snapshot.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-26 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Pavan Deolasee wrote:
 It would also be interesting to investigate whether early setting of
 hint bits can reduce subsequent writes of blocks. A typical case would
 be a large table being updated heavily for a while, followed by SELECT
 queries. The SELECT queries would set hint bits for the previously
 UPDATEd  tuples (old and new versions) and thus cause subsequent
 writes of those blocks for what could have been read-only queries.

 This has been suggested before, but I don't see how this could work.
 How does the UPDATE transaction go back to the pages it wrote to update
 the hint bits, _after_ it committed?

I think what he's suggesting is deliberately not updating the hint bits
during a SELECT ... but that's surely misguided.  If you don't set the
hint bit after discovering the transaction commit state, then the next
visitor of the tuple will have to repeat the clog lookup, meaning that
any such policy greatly increases clog read traffic and contention.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Jan Wieck

On 1/26/2007 12:22 PM, Simon Riggs wrote:

On Fri, 2007-01-26 at 11:36 -0500, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:
 No, that would break MVCC. But we may have done lots of updates/deletes
 that are *not* visible to any Snapshot, yet are not yet removable
 because they are higher than OldestXmin but we don't know that because
 previously the Snapshot details were not available. ISTM that this
 proposal is a way of making the Snapshot limits publicly available so
 that they can be used by VACUUM.

Certainly not, unless you intend that *every* snapshot *must* be
published, which is an overhead up with which we will not put.


Agreed, but that's the general case problem.

What I was hoping was that this would provide a mechanism for long
running transactions (LRTs) to publish their min/max Xids. Then if all
backends publish the minimum Xid of any Snapshot they have generated in
the proc array, we'd be able to decide if there are any large holes in
the global set of Snapshots. As a general case that's hard to evaluate,
but in the common case of a lone LRT and all the rest short duration
transactions you can end up with a gap of 250,000+ transactions opening
up between the two. It would be fairly easy to have VACUUM check for
large visibility gaps between groups of transactions and then use that
to improve its effectiveness in the presence of LRTs.


There is a flaw in that theory. If you have a single LTR, then each 
subsequent transactions xmin will be exactly that one, no?



Jan



Theoretically we have to keep the chain of intermediate updates around
so it can be traversed by the old transaction, but in practical terms
traversing a long chain of updates isn't sensible. Serializable LRTs
will never traverse the chain anyway (that's a serializability error),
but there are some special cases to consider, hence my mentioning an
unresolved problem previously.

We'd need to be much more careful about the way Snapshots are managed,
so we can be certain that we take them all into account.




--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Gregory Stark

Woah, I just realized it's much worse than that. I think the syntax in the
ANSI is not parsable in LALR(1) at all. Note the following:

WITH RECURSIVE foo (a,b) AS (subq) SEARCH BREADTH FIRST BY a,b,c(x,z),d(y,z) AS 
(subq) SELECT ...

To determine whether c is the name of a new with list element it has to
scan as far ahead as the , before the d. Note that d here is in fact not
part of the search clause at all, it's the name of a second with list
element.

bleagh.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Jan Wieck

On 1/26/2007 11:58 AM, Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:

On 1/26/2007 8:06 AM, Gregory Stark wrote:

It seems simpler to have a current_snapshot() function that returns an bytea
or a new snapshot data type which set_current_snapshot(bytea) took to change
your snapshot. Then you could use tables or out-of-band communication to pass
around your snapshots however you please. 


set_current_snapshot() would have to sanity check that the xmin of the new
snapshot isn't older than the current globaloldestxmin. 



That would solve the backend to backend IPC problem nicely.


But it fails on the count of making sure that globaloldestxmin doesn't
advance past the snap you want to use.  And exactly how will you pass
a snap through a table?  It won't become visible until you commit ...
whereupon your own xmin isn't blocking the advance of globaloldestxmin.


The client receives the snapshot information as a result from the 
function call to current_snapshot(). The call to 
set_current_snapshot(snap) errors out if snap's xmin is older than 
globaloldestxmin. It is the client app that has to make sure that the 
transaction that created snap is still in progress.


I didn't say passing anything through a table.

Take a modified pg_dump as an example. It could write multiple files. A 
pre-load sql with the first part of the schema. Then a post-load sql 
with the finalization of same (creating indexes, adding constraints). It 
then builds a list of all relations to COPY, starts n threads each 
writing a different file. Each thread connects to the DB and adjusts the 
snapshot to the one of the main transaction (which is still open). Then 
each thread grabs the next table to dump from the list and writes the 
COPY data to its output file. The threads exit when the list of tables 
is empty. The main thread waits until the last thread has joined and 
commits the main transaction.


Wouldn't be too hard to write a script that restores that split dump in 
parallel as well.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [pgsql-patches] pg_dump pretty_print

2007-01-26 Thread Tom Lane
Greg Sabino Mullane [EMAIL PROTECTED] writes:
 Peter Eisentraut replied:
 The harm here is that under undefined circumstances a dump file
 will not be a proper and robust representation of the original
 database, which would add significant confusion and potential for error.

 What undefined circumstances are we talking here? If there is a chance
 that pg_get_viewdef and company do not output a version that can be
 read again by the database because we simply changed the whitespace, that
 sounds like a serious bug to be fixed, not a reason to reject this
 optional flag.

The original definition of the prettyprint flag was that it'd produce a
version that was nice to look at but not guaranteed to parse back
exactly the same; in particular it might omit parentheses that perhaps
were really needed to ensure the same parsing.  (I think there might be
some other issues too ... but whitespace is NOT one of them.)  It's
possible that the current prettyprint code is smart enough to never make
such an error --- and then again it's possible that it isn't.  Like
Peter, I've not got much confidence in that code, and don't want to
trust pg_dump's correctness to it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Andrew Dunstan

Gregory Stark wrote:

Woah, I just realized it's much worse than that. I think the syntax in the
ANSI is not parsable in LALR(1) at all. Note the following:

WITH RECURSIVE foo (a,b) AS (subq) SEARCH BREADTH FIRST BY a,b,c(x,z),d(y,z) AS 
(subq) SELECT ...

To determine whether c is the name of a new with list element it has to
scan as far ahead as the , before the d. Note that d here is in fact not
part of the search clause at all, it's the name of a second with list
element.

bleagh.

  


Can you post the rules you have so far that you're playing around with? 
(Also maybe the rules from the standard - I don't have a copy handy).


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-26 Thread Pavan Deolasee

On 1/26/07, Tom Lane [EMAIL PROTECTED] wrote:



I think what he's suggesting is deliberately not updating the hint bits
during a SELECT ...



No, I was suggesting doing it in bgwriter so that we may not need to that
during
a SELECT. Of course, we need to investigate more and have numbers to prove
the need. Also you have already expressed concerns that doing so in bgwriter
is deadlock
prone. So there is certainly more work needed for any such scheme to work.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com


[HACKERS] NULL value in subselect in UNION causes error

2007-01-26 Thread Jan Wieck

Checked it against HEAD and 8.2:

postgres=# select 1, 1, 1 union select * from (select 2, null, 2) two;
ERROR:  failed to find conversion function from unknown to integer


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-26 Thread Kenneth Marshall
On Wed, Jan 24, 2007 at 07:30:05PM -0500, Tom Lane wrote:
 Kenneth Marshall [EMAIL PROTECTED] writes:
  Not that I am aware of. Even extending the relation by one additional
  block can make a big difference in performance
 
 Do you have any evidence to back up that assertion?
 
 It seems a bit nontrivial to me --- not the extension part exactly, but
 making sure that the space will get used promptly.  With the current
 code the backend extending a relation will do subsequent inserts into
 the block it just got, which is fine, but there's no mechanism for
 remembering that any other newly-added blocks are available --- unless
 you wanted to push them into the FSM, which could work but the current
 FSM code doesn't support piecemeal addition of space, and in any case
 there's some question in my mind about the concurrency cost of increasing
 FSM traffic even more.
 
 In short, it's hardly an unquestionable improvement, so we need some
 evidence.
 
   regards, tom lane
 

My comment was purely based on the reduction in fragmentation of the
file behind the relation. A result that I have seen repeatedly in file
related data processing. It does sound much more complicated to make the
additional space available to other backends. If one backend was doing
many inserts, it might still be of value even for just that backend. As
you mention, testing is needed to see if there is enough value in this
process.

Ken


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] New feature proposal

2007-01-26 Thread Sorin Schwimmer
Dear Developers,

Thanks for your answers. I didn't know about
generate_series, but it looks to be exactly what
I was suggesting.

Regards,
Sorin Schwimmer


 

Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Gregory Stark

Andrew Dunstan [EMAIL PROTECTED] writes:

 Can you post the rules you have so far that you're playing around with? (Also
 maybe the rules from the standard - I don't have a copy handy).

This is the best compromise I've come up with so far. It makes CYCLE a
reserved word and requires a CYCLE clause if there's a SEARCH clause.

--- gram.y  09 Jan 2007 02:14:14 +  2.573
+++ gram.y  26 Jan 2007 20:02:21 +  
@@ -350,6 +350,10 @@
 %type node   xml_root_version opt_xml_root_standalone
 %type booleandocument_or_content xml_whitespace_option
 
+%type node   common_table_expression
+%type list   with_cte_list cte_list
+%type boolean recursive_is_depth_first
+
 
 /*
  * If you make any token changes, update the keyword table in
@@ -364,7 +368,7 @@
ASSERTION ASSIGNMENT ASYMMETRIC AT AUTHORIZATION
 
BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
-   BOOLEAN_P BOTH BY
+   BOOLEAN_P BOTH BREADTH BY
 
CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
@@ -376,7 +380,7 @@
 
DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS
-   DESC DISABLE_P DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP
+   DEPTH DESC DISABLE_P DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP
 
EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ESCAPE EXCEPT EXCLUDING
EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT
@@ -416,11 +420,11 @@
 
QUOTE
 
-   READ REAL REASSIGN RECHECK REFERENCES REINDEX RELATIVE_P RELEASE RENAME
+   READ REAL REASSIGN RECHECK RECURSIVE REFERENCES REINDEX RELATIVE_P 
RELEASE RENAME
REPEATABLE REPLACE RESET RESTART RESTRICT RETURNING RETURNS REVOKE RIGHT
ROLE ROLLBACK ROW ROWS RULE
 
-   SAVEPOINT SCHEMA SCROLL SECOND_P SECURITY SELECT SEQUENCE
+   SAVEPOINT SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE
SERIALIZABLE SESSION SESSION_USER SET SETOF SHARE
SHOW SIMILAR SIMPLE SMALLINT SOME STABLE STANDALONE_P START STATEMENT
STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING SUPERUSER_P
@@ -5681,6 +5685,25 @@

list_nth($3, 0), list_nth($3, 1));
$$ = $1;
}
+   | with_cte_list simple_select   
{ $$ = $2; }
+   | with_cte_list select_clause sort_clause
+   {
+   insertSelectOptions((SelectStmt *) $2, 
$3, NIL,
+   
NULL, NULL);
+   $$ = $2;
+   }
+   | with_cte_list select_clause opt_sort_clause 
for_locking_clause opt_select_limit
+   {
+   insertSelectOptions((SelectStmt *) $2, 
$3, $4,
+   
list_nth($5, 0), list_nth($5, 1));
+   $$ = $2;
+   }
+   | with_cte_list select_clause opt_sort_clause 
select_limit opt_for_locking_clause
+   {
+   insertSelectOptions((SelectStmt *) $2, 
$3, $5,
+   
list_nth($4, 0), list_nth($4, 1));
+   $$ = $2;
+   }
;
 
 select_clause:
@@ -5742,6 +5765,72 @@
}
;
 
+/*
+ * ANSI standard WITH clause looks like:
+ * WITH [ RECURSIVE ] query name [ (column,...) ] AS (query) [SEARCH or 
CYCLE clause] 
+ * 
+ * It seems with_cte_list has to be a separate token or else there's a s/r
+ * conflict between RECURSIVE and the cte name. This means we'll need a silly
+ * node just to hold the list and the recursive flag :( it doesn't seem like
+ * making RECURSIVE a fully reserved word would be very nice as it's probably a
+ * common column name.
+ *
+ * For now we don't support recursive so just ignore it and pass up the list
+ *
+ */
+with_cte_list:
+ WITH cte_list
+   { 
+   $$ = $2; 
+   }
+  | WITH RECURSIVE cte_list 
+   { 
+   elog(WARNING, WITH RECURSIVE not supported 
yet);  
+   $$ = $3; 
+   }
+ ;
+
+cte_list:
+ common_table_expression   
{ $$ = list_make1($1); }
+   | 

[HACKERS] VC2005 build and pthreads

2007-01-26 Thread Gevik Babakhani
Folks,

I would like to build pg on VC2005. How do I use pthreads that is
mentioned in the README file. Do I need the DLL? Sources? LIB?
Where do I install or copy them..

Regards,
Gevik



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] NULL value in subselect in UNION causes error

2007-01-26 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 Checked it against HEAD and 8.2:
 postgres=# select 1, 1, 1 union select * from (select 2, null, 2) two;
 ERROR:  failed to find conversion function from unknown to integer

It's always done that.   The SQL spec would tell you that you have to
cast the null to some specific type.  We allow untyped nulls in contexts
where we can derive a type from reasonably nearby context, but an
integer two levels up and over in another union arm isn't very nearby
IMHO.  So I'm not particularly concerned about making this work ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Martijn van Oosterhout
On Fri, Jan 26, 2007 at 05:10:01PM +, Gregory Stark wrote:
 However to fully support the DB2/ANSI syntax we would definitely have an
 ambiguity and I think we would have to make CYCLE a fully reserved word
 which seems like a much bigger concession than WITH. Observe the following
 case:
 
   WITH RECURSIVE foo (x,y) AS (select 1,2) SEARCH DEPTH FIRST BY x CYCLE x,y 
 SET ...
 
 The parser can't search arbitrarily far checking for a SET to see if the CYCLE
 is a keyword or a binary operator.

Er, CYCLE isn't a binary operator, and users can't make binary
operators that are words, so I'm not sure of the problem here.
I think the parser can tell that the expression ends at the word
cycle.

Or am I missing obvious?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] NULL value in subselect in UNION causes error

2007-01-26 Thread Jan Wieck

On 1/26/2007 3:41 PM, Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:

Checked it against HEAD and 8.2:
postgres=# select 1, 1, 1 union select * from (select 2, null, 2) two;
ERROR:  failed to find conversion function from unknown to integer


It's always done that.   The SQL spec would tell you that you have to
cast the null to some specific type.  We allow untyped nulls in contexts
where we can derive a type from reasonably nearby context, but an
integer two levels up and over in another union arm isn't very nearby
IMHO.  So I'm not particularly concerned about making this work ...


That explains. Thanks.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] VC2005 build and pthreads

2007-01-26 Thread Martijn van Oosterhout
On Fri, Jan 26, 2007 at 09:34:10PM +0100, Gevik Babakhani wrote:
 Folks,
 
 I would like to build pg on VC2005. How do I use pthreads that is
 mentioned in the README file. Do I need the DLL? Sources? LIB?
 Where do I install or copy them..

Err, pthreads is a threads library for Unix, I don't think Windows has
that, nor can I think of a situation where you'd need to worry about
threads anyway?

Have a nice day.
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[HACKERS] pg_restore exclude schema from being droped option

2007-01-26 Thread Kostis Mentzelos

Hi list,

I am using pg_dump and pg_restore to backup and restore a database but 
there is something

that I believe is missing from the restore process:
an option in pg_restore to exclude a schema from being dropped when -c 
option is defined.


And here is why:

Suppose that I have a database with about 12 tables of customer data 
(address, notes, configuration ... no more than 10.000 rows each) and 50 
tables of history data (history files with about 1.000.000 rows each).
Now, to backup the database I choose to create 2 scripts, BackupData.sh 
to backup all small tables and BackupHist.sh to backup history tables. 
When I call pg_restore -c to restore data tables, pg_restore report a 
failure because it is trying to drop a schema that it is not empty. So 
it would be very helpful to have an option to exclude the schema (for 
example: public) from being dropped.


I now that I there are some alternatives for example: pg_restore -l, 
comment out the drop schema line and pg_restore -L or put data tables 
and hist tables into separate schemas but an option to pg_restore would 
be a lot easier, I guess.


What do you thing?

regards,
Kostis Mentzelos

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] PostgreSQL Data Loss

2007-01-26 Thread BluDes

Hi everyone,
 I have a problem with one of my costomers.
I made a program that uses a PostgreSQL (win32) database to save its data.
My customer claims that he lost lots of data reguarding his own clients 
and that those data had surely been saved on the database.
My first guess is that he is the one who deleted the data but wants to 
blame someone else, obviously I can't prove it.


Could it be possible for PostgreSQL to lose its data? Maybe with a file 
corruption? Could it be possible to restore these data?


My program does not modify or delete data since its more like a log that 
only adds information. It is obviously possible to delete these logs but 
it requires to answer yes to 2 different warnings, so the data can't 
be deleted accidentally.


I have other customers with even 10 times the amount of data of the one 
who claimed the loss but no problems with them.
He obviously made no backups (and claims whe never told him to do them 
so we are responsible even for this) though the program has a dedicated 
Backup-section.


Any suggestion?

Daniele

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] VC2005 build and pthreads

2007-01-26 Thread Gevik Babakhani
pthreads in needed to buold PG in vc++ 2005
please read pgsql/src/tools/msvc/README

Have a nice day.

On Fri, 2007-01-26 at 21:47 +0100, Martijn van Oosterhout wrote:
 On Fri, Jan 26, 2007 at 09:34:10PM +0100, Gevik Babakhani wrote:
  Folks,
  
  I would like to build pg on VC2005. How do I use pthreads that is
  mentioned in the README file. Do I need the DLL? Sources? LIB?
  Where do I install or copy them..
 
 Err, pthreads is a threads library for Unix, I don't think Windows has
 that, nor can I think of a situation where you'd need to worry about
 threads anyway?
 
 Have a nice day.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Martijn van Oosterhout
Ok, looking at your example:

WITH RECURSIVE foo (a,b) AS (subq) SEARCH BREADTH FIRST BY a,b , c(x,z),d(y,z) 
AS (subq) SELECT ...

What you're trying to say is that the c is a with list element,
not a cycle column. But the parser will see that as soon as it hits
the open parenthesis, since a cycle column is always just a column
name.

Also, the AS is the with list element doesn't appear to be optional,
I assume you left that out after the c(x,z) for clarity.

I think bison should be able to handle this as long as the name in
common_table_expression matches exactly the same things as whatever
columnList uses. It can the merge the two parse paths, allowing it to
see further.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PERFORM] how to plan for vacuum?

2007-01-26 Thread Jim Nasby

On Jan 25, 2007, at 10:33 AM, Ray Stell wrote:

On Thu, Jan 25, 2007 at 08:04:49AM -0800, Joshua D. Drake wrote:


It really depends on the system. Most of our systems run anywhere  
from

10-25ms. I find that any more than that, Vacuum takes too long.



How do you measure the impact of setting it to 12 as opposed to 15?


If you've got a tool that will report disk utilization as a  
percentage it's very easy; I'll decrease the setting until I'm at  
about 90% utilization with the system's normal workload (leaving some  
room for spikes, etc). Sometimes I'll also tune the costs if reads  
vs. writes are a concern.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] VC2005 build and pthreads

2007-01-26 Thread Magnus Hagander
Martijn van Oosterhout wrote:
 On Fri, Jan 26, 2007 at 09:34:10PM +0100, Gevik Babakhani wrote:
 Folks,

 I would like to build pg on VC2005. How do I use pthreads that is
 mentioned in the README file. Do I need the DLL? Sources? LIB?
 Where do I install or copy them..
 
 Err, pthreads is a threads library for Unix, I don't think Windows has
 that, nor can I think of a situation where you'd need to worry about
 threads anyway?

There is a pthreads for win32 as well.
However, you don't need it to build, unless you build ecpg. I forgot to
update the README when I put that patch in .-)

If you want it, it's on ftp://sources.redhat.com/pub/pthreads-win32.
IIRC, that's actually mentioned in the README file. You need the lib and
headers.

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] No ~ operator for box, point

2007-01-26 Thread Jim Nasby

On Jan 25, 2007, at 6:26 PM, Tom Lane wrote:

Martijn van Oosterhout kleptog@svana.org writes:

On Thu, Jan 25, 2007 at 01:59:33PM -0500, Merlin Moncure wrote:

On 1/25/07, Jim C. Nasby [EMAIL PROTECTED] wrote:

decibel=# select box '((0,0),(2,2))' ~ point '(1,1)';
ERROR:  operator does not exist: box ~ point


I don't see a reason, although you can do it with polygon and not  
box.


Seems like an old oversight.


Ok. If I ever get some time I'll submit a patch to bring everything  
in-line (there's other missing operators as well).



Also, I can't find the ~ operator defined for polygon in the
documentation, am I missing something?


~ is deprecated, contains is preferentially spelled @ now.


Ok, I'll keep that in mind.
--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 Er, CYCLE isn't a binary operator, and users can't make binary
 operators that are words, so I'm not sure of the problem here.

Well, the problem typically is not being able to tell whether an
operator is supposed to be infix or postfix; hence keywords that can
terminate arbitrary expressions usually have to be reserved words.
However, now that I look at the syntax I think Greg may be misreading
it.  I see

search or cycle clause ::=
  search clause
| cycle clause
| search clause cycle clause

search clause ::=
SEARCH recursive search order SET sequence column

recursive search order ::=
  DEPTH FIRST BY sort specification list
| BREADTH FIRST BY sort specification list

sequence column ::= column name

cycle clause ::=
  CYCLE cycle column list SET cycle mark column TO cycle
  mark value DEFAULT non-cycle mark value USING path column

cycle column list ::= cycle column [ {commacycle column}...]

cycle column ::= column name

cycle mark column ::= column name

path column ::= column name

cycle mark value ::= value expression

non-cycle mark value ::= value expression

and so CYCLE would come *after* SET sequence column not before it.
It looks to me like we'd have to promote SET to fully reserved status,
but that probably isn't going to surprise anyone.  DEFAULT and USING
already are fully reserved.  I don't see anything else here that looks
like it should need to be reserved.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_restore exclude schema from being droped option

2007-01-26 Thread Tom Lane
Kostis Mentzelos [EMAIL PROTECTED] writes:
 Now, to backup the database I choose to create 2 scripts, BackupData.sh 
 to backup all small tables and BackupHist.sh to backup history tables. 
 When I call pg_restore -c to restore data tables, pg_restore report a 
 failure because it is trying to drop a schema that it is not empty.

Why (or how) is the schema part of the backup at all?  If you used
pg_dump's -t switch to select the tables to back up, there should not be
a schema entry in the dump.  So there's something you're not telling us
about how you are using the tools.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 12:43 -0500, Jan Wieck wrote:

 There is a flaw in that theory. If you have a single LTR, then each 
 subsequent transactions xmin will be exactly that one, no?

You got me. My description was too loose, but you also got the rough
picture. We'll save the detail for another day, but we all know its a
bridge we will have to cross one day, soon. I wasn't meaning to raise
this specific discussion now, just to say that publishing snapshots for
known LRTs is one way by which we can solve the LRT/VACUUMing issue.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] BUG #2917: spi_prepare doesn't accept typename aliases

2007-01-26 Thread Jim Nasby

On Jan 26, 2007, at 9:31 AM, Tom Lane wrote:
If you wanted to be a bit more ambitious maybe you could change the  
fact

that this code is throwing away typmod, which means that declarations
like varchar(32) would fail to work as expected.  Perhaps it  
should be

fixed to save the typmods alongside the typioparams and then pass them
to InputFunctionCall instead of passing -1.  On the other hand, we  
don't
currently enforce typmod for any function input or result  
arguments, so

maybe it's consistent that spi_prepare arguments ignore typmods too.
Thoughts?


I'd like to see us move towards supporting that; both for function  
parameters/results as well as inside functions. It'd be nice if both  
cases got fixed at once, but IMHO fixing only one now would be better  
than fixing none.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Jim Nasby

On Jan 26, 2007, at 5:13 AM, Markus Schiltknecht wrote:

In Postgres-R, I mostly use the terms 'local' and 'remote'.


Note that those terms only make sense if you limit yourself to  
thinking the master is pushing data out to the slave...


I think it'd make the most sense if the name reflected whether the  
trigger should be fired by a replication process or not; that way it  
doesn't really matter if it's a master or a slave... if the data in  
the table is being modified by a replication process then you don't  
fire the trigger/rule, according to the setting. But maybe there is  
some need to discern between origin and target...


Also, if enums will be in 8.3, perhaps they can be used instead of  
char?

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding pg_rewrite.ev_enabled

2007-01-26 Thread Jim Nasby

On Jan 25, 2007, at 5:33 PM, Jan Wieck wrote:
A new per session GUC variable, restricted to superusers, will  
define if the session is in origin or replica mode.


It would be nice if we had a separate role for replication services  
so that we weren't exposing superuser so much. IIRC Oracle even uses  
2 roles; one for administration of replication and one that the  
replication code actually runs under.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PostgreSQL Data Loss

2007-01-26 Thread Heikki Linnakangas

BluDes wrote:

I made a program that uses a PostgreSQL (win32) database to save its data.


What version of PostgreSQL is this?

My customer claims that he lost lots of data reguarding his own clients 
and that those data had surely been saved on the database.
My first guess is that he is the one who deleted the data but wants to 
blame someone else, obviously I can't prove it.


Did he lose all data in one table, or just some rows? Or is there some 
other pattern?


Could it be possible for PostgreSQL to lose its data? 


Not when properly installed.

Maybe with a file corruption? 


I doubt it. You'd almost certainly get warnings or errors if there's 
corruption.



Could it be possible to restore these data?


The first thing to do is to take a filesystem-level physical copy of the 
data directory to prevent further damage. Copy the data directory to 
another system for forensics.


You might be able to get a picture of what happened by looking at the 
WAL logs using the xlogviewer tool in pgfoundry.


You can also modify the PostgreSQL source code so that it shows also row 
versions marked as deleted, and recover the deleted data. I can't 
remember exactly how to do it, maybe others who have done it can fill 
in. A row stays physically in the file until the table is vacuumed; 
hopefully it hasn't been.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Jan Wieck

On 1/26/2007 4:39 PM, Jim Nasby wrote:

On Jan 26, 2007, at 5:13 AM, Markus Schiltknecht wrote:

In Postgres-R, I mostly use the terms 'local' and 'remote'.


Note that those terms only make sense if you limit yourself to  
thinking the master is pushing data out to the slave...


I think it'd make the most sense if the name reflected whether the  
trigger should be fired by a replication process or not; that way it  
doesn't really matter if it's a master or a slave... if the data in  
the table is being modified by a replication process then you don't  
fire the trigger/rule, according to the setting. But maybe there is  
some need to discern between origin and target...


That's why I prefer origin and replica. I want to use the same terms 
in the sessions mode GUC, and there local could be misinterpreted as 
doesn't replicate at all.




Also, if enums will be in 8.3, perhaps they can be used instead of  
char?


I don't like this one. It makes it impossible to provide patches, 
enabling this replication system on older Postgres releases. And you 
know that your customers will want them.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 You got me. My description was too loose, but you also got the rough
 picture. We'll save the detail for another day, but we all know its a
 bridge we will have to cross one day, soon. I wasn't meaning to raise
 this specific discussion now, just to say that publishing snapshots for
 known LRTs is one way by which we can solve the LRT/VACUUMing issue.

I don't actually see that it buys you a darn thing ... you still won't
be able to delete dead updated tuples because of the possibility of the
LRT deciding to chase ctid chains up from the tuples it can see.   You
also seem to be assuming that a transaction can have only one snapshot,
which is not something we can enforce in enough cases to make it a very
useful restriction.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Jan Wieck

On 1/26/2007 4:40 PM, Jim Nasby wrote:

On Jan 25, 2007, at 5:33 PM, Jan Wieck wrote:
A new per session GUC variable, restricted to superusers, will  
define if the session is in origin or replica mode.


It would be nice if we had a separate role for replication services  
so that we weren't exposing superuser so much. IIRC Oracle even uses  
2 roles; one for administration of replication and one that the  
replication code actually runs under.


So you think about another flag in pg_shadow? Would work for me.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Jan Wieck

On 1/26/2007 4:47 PM, Jan Wieck wrote:

On 1/26/2007 4:39 PM, Jim Nasby wrote:

On Jan 26, 2007, at 5:13 AM, Markus Schiltknecht wrote:

In Postgres-R, I mostly use the terms 'local' and 'remote'.


Note that those terms only make sense if you limit yourself to  
thinking the master is pushing data out to the slave...


I think it'd make the most sense if the name reflected whether the  
trigger should be fired by a replication process or not; that way it  
doesn't really matter if it's a master or a slave... if the data in  
the table is being modified by a replication process then you don't  
fire the trigger/rule, according to the setting. But maybe there is  
some need to discern between origin and target...


That's why I prefer origin and replica. I want to use the same terms 
in the sessions mode GUC, and there local could be misinterpreted as 
doesn't replicate at all.


I will need that local mode anyway for some conflict resolutions. 
Think of a duplicate key (yeah, yeah, what comes now sounds bad ...) 
conflict, where you need to delete one of the entries without causing 
that delete to replicate.


Before people panic, the final system is supposed to have something 
smarter than deleting a dupkey in its repertoire. But I'll rather go 
with this cheap shot first and add a group communication based advisory 
locking system later, you know?



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] PostgreSQL Data Loss

2007-01-26 Thread Zdenek Kotala
If data are deleted then they are still stored in database until VACUUM 
cleans them. You can look by some hex viewer, if you see some know text 
data there. Or I think there is also some tool which dump tuple list 
from pages.


You can also see deleted data if you change current transaction ID. But 
I not sure if it is simply possible.


Before experiments, do not forget backup of database files.

Zdenek

BluDes wrote:

Hi everyone,
 I have a problem with one of my costomers.
I made a program that uses a PostgreSQL (win32) database to save its data.
My customer claims that he lost lots of data reguarding his own clients 
and that those data had surely been saved on the database.
My first guess is that he is the one who deleted the data but wants to 
blame someone else, obviously I can't prove it.


Could it be possible for PostgreSQL to lose its data? Maybe with a file 
corruption? Could it be possible to restore these data?


My program does not modify or delete data since its more like a log that 
only adds information. It is obviously possible to delete these logs but 
it requires to answer yes to 2 different warnings, so the data can't 
be deleted accidentally.


I have other customers with even 10 times the amount of data of the one 
who claimed the loss but no problems with them.
He obviously made no backups (and claims whe never told him to do them 
so we are responsible even for this) though the program has a dedicated 
Backup-section.


Any suggestion?

Daniele

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 On 1/26/2007 4:39 PM, Jim Nasby wrote:
 Also, if enums will be in 8.3, perhaps they can be used instead of  
 char?

 I don't like this one. It makes it impossible to provide patches, 
 enabling this replication system on older Postgres releases. And you 
 know that your customers will want them.

Also, at the level of C code enums will not be terribly easy to work
with.  We use the char-as-poor-mans-enum trick in all the other system
catalogs, so I feel no desire to do it differently here.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 On 1/26/2007 4:40 PM, Jim Nasby wrote:
 It would be nice if we had a separate role for replication services  
 so that we weren't exposing superuser so much.

 So you think about another flag in pg_shadow? Would work for me.

How exactly would such a role differ from a regular superuser?  It
would still need an awful lot of privilege bypassing ability.  I'm
pretty dubious that you could lock it down enough to make it worth the
trouble of supporting an additional concept.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-26 Thread Rick Gigger

I thought that the following todo item just barely missed 8.2:

Allow a warm standby system to also allow read-only statements [pitr]
This is useful for checking PITR recovery.

I assume it's not on this list either because it is already complete and 
slated for 8.3, or it is going to take too long to make it into 8.3 or 
it has been rejected as a good idea entirely or it's just not big enough 
of a priority for anyone to push for it to get into 8.3.


It is the one feature that would make the most difference to me as it 
would allow me to very easily set up a server for reporting purposes 
that could always be within minutes of the live data.  I know there are 
other solutions for this but if this feature is just around the corner 
it would be my first choice.


Does anyone know the status of this feature?

Thanks,

Rick Gigger




Joshua D. Drake wrote:

Or so... :)

Thought I would do a poll of what is happening in the world for 8.3. I have:

Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
Gavin Sherry: Bitmap Indexes (on disk), possible basic Window functions
Jonah Harris: WITH/Recursive Queries?
Andrei Kovalesvki: Some Win32 work with Magnus
Magnus Hagander: VC++ support (thank goodness)
Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
Oleg Bartunov: Tsearch2 in core
Neil Conway: Patch Review (including enums), pg_fcache

Vertical projects:

Pavel Stehule: PLpsm
Alexey Klyukin: PLphp
Andrei Kovalesvki: ODBCng

I am sure there are more, the ones with question marks are unknowns but
heard of in the ether somewhere. Any additions or confirmations?

Sincerely,

Joshua D. Drake






---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-26 Thread Tom Lane
Rick Gigger [EMAIL PROTECTED] writes:
 I thought that the following todo item just barely missed 8.2:
 Allow a warm standby system to also allow read-only statements [pitr]

No, it's a someday-wishlist item; the work involved is not small.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-01-26 Thread Jan Wieck

On 1/26/2007 5:09 PM, Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:

On 1/26/2007 4:40 PM, Jim Nasby wrote:
It would be nice if we had a separate role for replication services  
so that we weren't exposing superuser so much.



So you think about another flag in pg_shadow? Would work for me.


How exactly would such a role differ from a regular superuser?  It
would still need an awful lot of privilege bypassing ability.  I'm
pretty dubious that you could lock it down enough to make it worth the
trouble of supporting an additional concept.


As already said in the other mail, conflict resolution means that at 
some point you will be in the situation where you need a third role. The 
one of the replication admin that can do things that don't replicate. 
Polluting the system catalogs with flags for one specific external 
system isn't my thing. The different trigger modes as well as the 
snapshot cloning and the commit timestamp are all features, not 
exclusively useful for the one replication system I have in mind. They 
would have made my life developing Slony-I a lot easier to begin with. I 
would never have needed the stupid xxid or the poking around in the 
system catalog.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PostgreSQL Data Loss

2007-01-26 Thread J. Andrew Rogers


On Jan 26, 2007, at 2:22 AM, BluDes wrote:

 I have a problem with one of my costomers.
I made a program that uses a PostgreSQL (win32) database to save  
its data.
My customer claims that he lost lots of data reguarding his own  
clients and that those data had surely been saved on the database.
My first guess is that he is the one who deleted the data but wants  
to blame someone else, obviously I can't prove it.


Could it be possible for PostgreSQL to lose its data? Maybe with a  
file corruption? Could it be possible to restore these data?


My program does not modify or delete data since its more like a log  
that only adds information. It is obviously possible to delete  
these logs but it requires to answer yes to 2 different warnings,  
so the data can't be deleted accidentally.


I have other customers with even 10 times the amount of data of the  
one who claimed the loss but no problems with them.
He obviously made no backups (and claims whe never told him to do  
them so we are responsible even for this) though the program has a  
dedicated Backup-section.



I have seen this data loss pattern many, many times, and on Oracle  
too.  The most frequent culprits in my experience:


1.)  The customer screwed up big time and does not want to admit that  
they made a mistake, hoping you can somehow pull their butt out of  
the fire for free.


2.)  Someone else sabotaged or messed up the customers database, and  
the customer is not aware of it.


3.)  The customer deleted their own data and is oblivious to the fact  
that they are responsible.


4.)  There is some rare edge case in your application that generates  
SQL that deletes all the data.



There is always the possibility that there is in fact some data loss  
due to a failure of the database, but it is a rare kind of corruption  
that deletes a person's data but leaves everything else intact with  
no error messages, warnings, or other indications that something is  
not right.  Given the description of the problem, I find an internal  
failure of the database to be a low probability reason for the data  
loss.



Having run many database systems that had various levels of pervasive  
internal change auditing/versioning, often unbeknownst to the casual  
user, virtually all of the several data loss cases I've seen with a  
description like the above clearly fit in the cases #1-3 above when  
we went into the audit logs i.e. someone explicitly did the  
deleting.  I cannot tell you how many times people have tried to  
pretend that the database lost or messed up their data and then  
been embarrassed when they discover that I can step through every  
single action they took to destroy their own data.  I've never seen a  
single case like the one described above that was due to an internal  
database failure; when there is an internal database failure, it is  
usually ugly and obvious.


Cheers,

J. Andrew Rogers
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-26 Thread Henry B. Hotz
Henry B. Hotz:  GSSAPI authentication method for C (FE/BE) and Java  
(FE).
Magnus Haglander:  SSPI (GSSAPI compatible) authentication method for  
C (FE) on Windows.


(That fair Magnus? Or you want to volunteer for BE support as well?)

GSSAPI isn't much more than a functional replacement for Kerberos 5,  
but it's supported on lots more platforms.  In particular Java and  
Windows have native support (as well as Solaris 9).


If anyone is interested I currently have working-but-incomplete  
patches to support SASL in C.  I've decided not to finish and submit  
them because the glue code to make configuration reasonable, and to  
allow use of existing Postgres password databases with the password- 
based mechanisms is still significant.


On Jan 22, 2007, at 2:16 PM, Joshua D. Drake wrote:


Or so... :)

Thought I would do a poll of what is happening in the world for  
8.3. I have:


Alvaro Herrera: Autovacuum improvements (maintenance window etc..)
Gavin Sherry: Bitmap Indexes (on disk), possible basic Window  
functions

Jonah Harris: WITH/Recursive Queries?
Andrei Kovalesvki: Some Win32 work with Magnus
Magnus Hagander: VC++ support (thank goodness)
Heikki Linnakangas: Working on Vacuum for Bitmap Indexes?
Oleg Bartunov: Tsearch2 in core
Neil Conway: Patch Review (including enums), pg_fcache

Vertical projects:

Pavel Stehule: PLpsm
Alexey Klyukin: PLphp
Andrei Kovalesvki: ODBCng

I am sure there are more, the ones with question marks are unknowns  
but

heard of in the ether somewhere. Any additions or confirmations?

Sincerely,

Joshua D. Drake



--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate

PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] BUG #2917: spi_prepare doesn't accept typename aliases

2007-01-26 Thread Andrew Dunstan

Jim Nasby wrote:

On Jan 26, 2007, at 9:31 AM, Tom Lane wrote:

If you wanted to be a bit more ambitious maybe you could change the fact
that this code is throwing away typmod, which means that declarations
like varchar(32) would fail to work as expected.  Perhaps it should be
fixed to save the typmods alongside the typioparams and then pass them
to InputFunctionCall instead of passing -1.  On the other hand, we don't
currently enforce typmod for any function input or result arguments, so
maybe it's consistent that spi_prepare arguments ignore typmods too.
Thoughts?


I'd like to see us move towards supporting that; both for function 
parameters/results as well as inside functions. It'd be nice if both 
cases got fixed at once, but IMHO fixing only one now would be better 
than fixing none.




I'm not going to do either in fixing this bug - I think they should be 
fixed but are a separate issue. These probably belong on the TODO list.


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] How does EXEC_BACKEND process signals?

2007-01-26 Thread Alvaro Herrera
In testing the new autovac facility, I noticed this log in the
EXEC_BACKEND (on Linux) scenario (I pressed Ctrl-C only once):

DEBUG:  postmaster received signal 2
LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
DEBUG:  drop cascades to table fktable
DEBUG:  drop auto-cascades to type fktable
DEBUG:  drop cascades to table pktable
DEBUG:  drop auto-cascades to type pktable
DEBUG:  drop auto-cascades to constraint pktable_pkey on table pktable
DEBUG:  drop auto-cascades to index pktable_pkey
DEBUG:  server process (PID 21893) exited with exit code 1
DEBUG:  server process (PID 21895) exited with exit code 1
DEBUG:  server process (PID 21899) exited with exit code 1
DEBUG:  server process (PID 21900) exited with exit code 1
DEBUG:  server process (PID 21902) exited with exit code 1
DEBUG:  server process (PID 21904) exited with exit code 1
DEBUG:  server process (PID 21906) exited with exit code 1
LOG:  shutting down
LOG:  autovacuum launcher shutting down
DEBUG:  forked new backend, pid=21907 socket=6
LOG:  database system is shut down
LOG:  background writer process (PID 21220) exited with exit code 0
LOG:  terminating any other active server processes
DEBUG:  sending SIGQUIT to process 21907
DEBUG:  server process (PID 21907) exited with exit code 1
LOG:  all server processes terminated; reinitializing
LOG:  database system was shut down at 2007-01-26 20:21:10 CLST
LOG:  checkpoint record is at 0/4293338
LOG:  redo record is at 0/4293338; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/17395; next OID: 60723
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
DEBUG:  transaction ID wrap limit is 2147484176, limited by database postgres
LOG:  shutting down
DEBUG:  recycled transaction log file 00010002
DEBUG:  recycled transaction log file 00010003
LOG:  database system is shut down

The strange thing is that we're seeing a forked a new backend line
_after_ the shutdown signal was received.  I don't think this is related
to my local changes, because I've been careful with that, but one never
knows.

I wonder if this could cause more than just a curiosity.  The backend
was evidently shut down promptly.

I'll post the autovac patch right away.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PostgreSQL Data Loss

2007-01-26 Thread Andrew Dunstan

BluDes wrote:

Hi everyone,
 I have a problem with one of my costomers.
I made a program that uses a PostgreSQL (win32) database to save its 
data.
My customer claims that he lost lots of data reguarding his own 
clients and that those data had surely been saved on the database.
My first guess is that he is the one who deleted the data but wants to 
blame someone else, obviously I can't prove it.


Could it be possible for PostgreSQL to lose its data? Maybe with a 
file corruption? Could it be possible to restore these data?


My program does not modify or delete data since its more like a log 
that only adds information. It is obviously possible to delete these 
logs but it requires to answer yes to 2 different warnings, so the 
data can't be deleted accidentally.


I have other customers with even 10 times the amount of data of the 
one who claimed the loss but no problems with them.
He obviously made no backups (and claims whe never told him to do them 
so we are responsible even for this) though the program has a 
dedicated Backup-section.


Any suggestion?




This isn't any sort of report that can be responded to. We need to know 
what has happened to the machine, what is in the server logs, what are 
the symptoms of data loss. The most likely explanations are pilot error 
and hardware error.


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-26 Thread Gregory Stark
Rick Gigger [EMAIL PROTECTED] writes:

 I thought that the following todo item just barely missed 8.2:

 Allow a warm standby system to also allow read-only statements [pitr]
 This is useful for checking PITR recovery.

No, nobody worked on it prior to 8.2. Afaik there's still nobody working on
it. It's not trivial. Consider for example that your read-only query would
still need to come up with a snapshot and there's nowhere currently to find
out what transactions were in-progress at that point in the log replay.

There's also the problem that currently WAL replay doesn't take have allow for
any locking so there's no way for read-only queries to protect themselves
against the WAL replay thrashing the buffer pages they're looking at.

It does seem to be doable and I agree it would be a great feature, but as far
as I know nobody's working on it for 8.3.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PostgreSQL Data Loss

2007-01-26 Thread Gregory Stark
BluDes [EMAIL PROTECTED] writes:

 My customer claims that he lost lots of data reguarding his own clients and
 that those data had surely been saved on the database.

Has this Postgres database been running for a long time? There is a regular
job called VACUUM that has to be run on every table periodically to recover
free space. 

If this isn't run for a very long time (how long depends on how busy the
database is, but even on extremely large databases it's usually a matter of
months, on more normal databases it would be years) then very old records seem
to suddenly disappear. There is a way to recover data that this has happened
to though as long as you don't run vacuum after the data has disappeared.

To repeat: If you think this may have happened DO NOT run vacuum now. 

Do you think this may have happened? How long ago was this database created?
Does your system periodically run VACUUM? Is the missing data in every table
or just a particular table?

Incidentally recent versions of Postgres don't allow this to occur and stop
running with a message insisting you run vacuum before continuing. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 10 weeks to feature freeze (Pending Work)

2007-01-26 Thread Gregory Stark
Rick Gigger [EMAIL PROTECTED] writes:

 I thought that the following todo item just barely missed 8.2:

 Allow a warm standby system to also allow read-only statements [pitr]
 This is useful for checking PITR recovery.

No, nobody worked on it prior to 8.2. Afaik there's still nobody working on
it. It's not trivial. Consider for example that your read-only query would
still need to come up with a snapshot and there's nowhere currently to find
out what transactions were in-progress at that point in the log replay.

There's also the problem that currently WAL replay doesn't take have allow for
any locking so there's no way for read-only queries to protect themselves
against the WAL replay thrashing the buffer pages they're looking at.

It does seem to be doable and I agree it would be a great feature, but as far
as I know nobody's working on it for 8.3.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 search clause ::=
   SEARCH recursive search order SET sequence column

 and so CYCLE would come *after* SET sequence column not before it.

Ah, thanks, I had glossed right over the SET sequence column bit. The SET
that I had was the SET cycle column which remains after the CYCLE keyword.

 It looks to me like we'd have to promote SET to fully reserved status,
 but that probably isn't going to surprise anyone.  DEFAULT and USING
 already are fully reserved.  I don't see anything else here that looks
 like it should need to be reserved.

Having fixed that everything works fine with SET and WITH being reserved
keywords. You didn't mean to say I should be able to leave WITH unreserved did
you?

Of course that was the easy part...

Implementing non-recursive common table expressions should be fairly
mechanical though I think I'll have lots of questions about how to get all the
variable references fixed up.

Non-recursive common table expressions are always non-correlated. They can
refer to previous common table expressions but only to select from them either
in the FROM clause or in subqueries. So as far as I can see they can just go
in an InitPlan (or One-Time-Plan? I'm not sure what the distinction is) and be
referred to in the same way.

Recursive queries are of course a whole lot trickier. I've been slowly
wrapping my head around them. So far I have a pretty good idea how to churn
out a typical recursive query analogous to a CONNECT BY query. 

But the spec is a lot more ambitious than that. I haven't quite wrapped my
head around the idea of mutually recursive or non-linearly-recursive queries
yet.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Piggybacking vacuum I/O

2007-01-26 Thread Bruce Momjian
Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  I'd like to see still more evidence that it's a problem before we start 
  changing that piece of code. It has served us well for years.
 
 What I see here is mostly evidence suggesting that we should consider
 raising NUM_CLOG_BUFFERS, rather than anything more invasive.

Added to TODO:

* Consider increasing NUM_CLOG_BUFFERS

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


  1   2   >