from:"Jeroen T. Vermeulen"

Re: [HACKERS] Patch to update libpqxx's homepage in README

2008-03-20 Thread Jeroen T. Vermeulen

On Sun, February 10, 2008 18:35, Gurjeet Singh wrote:
> libpqxx seems to have moved around quite a bit. The attached patch
> corrects
> libpqxx's homepage.

Thanks for that.  However just http://pqxx.org/ would be best.  I'm just
setting up new hosting, and I may not get everything completely
link-compatible.

Jeroen

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PAM authentication fails for local UNIX users

2007-08-20 Thread Jeroen T. Vermeulen

On Mon, August 20, 2007 19:52, Andrew Dunstan wrote:

> I'd rather see an HBA fallback mechanism, which I suspect might overcome
> most of the  problems being encountered here.

I implemented a form of that once, so on local connections you could do
ident mapping with fallback to PAM or some other password authentication. 
That seemed useful, e.g. for granting non-interactive access to a program
running under a dedicated user and requiring a password from everyone
else.  The implementation also allowed for a bit more flexibility in the
auth mechanism.

The main objections I recall were:

1. The protocol doesn't allow for multiple authentication prompts.  My own
proposal didn't have that problem since it only introduced an "optional
ident" authentication that continued looking for a matching rule if the
current user name was not in the given map, but it's a problem for more
general approaches.

2. For real, fully generalized fallback, you'd also need to overhaul the
HBA config file format completely.

3. It was considered unsafe to add even the most limited of fallback
options, because the HBA config is designed to pick just one auth
mechanism for any connection attempt, based on only the first three
columns of the config file.  An admin who didn't understand the new auth
mechanism could use it to write an unsafe HBA configuration, provided it
also broke the existing "go from specific-permissive to
general-restrictive" design guideline.

Personally I think it'd take some careful aim to shoot yourself in the
foot like that, but IIRC it was enough for an "I don't like it" vote.

Jeroen

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] postgresql compile problem

2007-07-17 Thread Jeroen T. Vermeulen

On Wed, July 18, 2007 11:07, [EMAIL PROTECTED] wrote:

> Eecently, I have downloaded the postgresql-8.1.9.tar.gz from the official
> website,and then I install in my linux System ,whose gcc version is
> 2.9.6.Although I can install it successfully,then result version I check
> is 7.2.1~£¬and how can this happen,can u tell me the reason?

You are on the wrong mailing list.  The pgsql-hackers list is for
discussion related to the development of postgres.  Try pgsql-general.

For the record, gcc 2.9.6 does not exist.  If it did, it would be very old.

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

On Mon, July 2, 2007 22:17, Gregory Stark wrote:

> The way you described it there were records being inserted and later
> deleted.
> Why wouldn't you need vacuums?
>
> Or are all the records eventually deleted and then the table truncated or
> dropped before the next batch of inserts?

In a nuthshell, yes.  The problem is I can't delete them all at once; it
happens in batches, and that means that stats degrade in the meantime.


Jeroen



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

On Mon, July 2, 2007 18:15, Gregory Stark wrote:

>> So I suppose the planner has a good reason to ignore the index at that
>> point.  I'm assuming that this is something to do with the correlation
>> between the index and the column's statistics degrading in some way.
>
> Best to post "explain analyze " for when the performance is good
> and
> bad. Perhaps also an explain analyze for the query with enable_seqscan off
> when it's bad.

Can't easily do that anymore...  AFAIR the plans were all identical
anyway, except in the "enable_seqscan bad" case which used a sequential
scan instead of using the index.  The queries are very simple, along the
lines of "select * from foo where id >= x and id < y".

> Also, which version of Postgres is this?

It was an 8.2 version.

> It's possible you just need vacuum to run more frequently on this table
> and
> autovacuum isn't doing it often enough. In which case you might have a
> cron
> job run vacuum (or vacuum analyze) on this table more frequently.

Actually, come to think of it, I don't think I'd want any vacuums at all
on this particular table.  Just the analyze on the primary key, no
vacuums, no statistics on anything else.  Unfortunately it's not just one
table, but a set of tables that can be created dynamically.  I could
change that, but in this particular case I don't think I should.

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

Hi all,

I've run into a case where I get bad performance that doesn't sound too
hard to solve.  Question is: is it worth solving?

The situation is this: I have a table that can grow to a large number of
rows, then shrink to zero over a large number of quick, consecutive
transactions.  The primary key index for the table is getting a lot of use
in the process.

But whenever perhaps one-third or so of the rows have been deleted, the
planner stops using that index and resorts to sequential scans.  I tried
suppressing that by toggling enable_seqscan: works as advertised, but
performance is still terrible until (as far as I can make out) the next
analyze run has completed!

So I suppose the planner has a good reason to ignore the index at that
point.  I'm assuming that this is something to do with the correlation
between the index and the column's statistics degrading in some way.

I also tried doing my own analyze runs on just the primary key index. 
That will complete very quickly, and performance is restored for a while. 
But as far as I can tell, a regular automatic analyze run will block my
own, more limited one on the same table.  So performance is still bad, and
now it's irregular to boot.

This makes me wonder: when the planner finds that an index is no longer
worth using because its corresponding statistics are out of date, and it's
cheap to update those same stats, maybe it should do so?  Even if there's
also going to be a full analyze on the table, it could be worthwhile to do
this quick limited run first.  (Though not if one is already underway, of
course).

All this is based largely on guesswork, so if I've got it all wrong,
please enlighten me!


Jeroen



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen

On Wed, June 20, 2007 19:42, Rob Butler wrote:
> Do you guys need something PG specific or built into PG?
>
> ActiveMQ is very nice, speaks multiple languages, protocols and supports a
> ton of features.  Could you simply use that?
>
> http://activemq.apache.org/

Looks very nice indeed!


Jeroen



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen

On Wed, June 20, 2007 18:18, Heikki Linnakangas wrote:
> Marko Kreen wrote:
>> As I understand, JMS does not have a concept
>> of transactions, probably also other solutions mentioned before,
>> so to use PgQ as backend for them should be much simpler...
>
> JMS certainly does have the concept of transactions. Both distributed
> ones through XA and two-phase commit, and local involving just one JMS
> provider. I don't know about others, but would be surprised if they
> didn't.

Wait...  I thought XA did two-phase commit, and then there was XA+ for
*distributed* two-phase commit, which is much harder?


Jeroen



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen

On Wed, June 20, 2007 04:45, Chris Browne wrote:
> I'm seeing some applications where it appears that there would be
> value in introducing asynchronous messaging, ala "message queueing."
> 
>
> The "granddaddy" of message queuing systems is IBM's MQ-Series, and I
> don't see particular value in replicating its functionality.

I'm quite interested in this.  Maybe I'm thinking of something too
complex, but I do think there are some "oh it'll need to do that too"
pitfalls that are best considered up front.

The big thing about MQ is that it participates as a resource manager in
two-phase commits (and optionally a transaction manager as well).  That
means that you get atomic processing steps: application takes message off
a queue, processes it, commits its changes to the database, replies to
message.  The queue manager then does a second-phase commit for all of
those steps, and that's when the reply really goes out.  If the
application fails, none of this will have happened so you get ACID over
the complete cycle.  That's something we should have free software for.

Perhaps the time is right for something new.  A lot of the complexity
inside MQ comes from data representation issues like encodings and
fixed-length strings, as I recall, and things have changed since MQ was
designed.  I agree it could be useful (and probably not hard either) to
have a transactional messaging system inside the database.  It saves you
from having to do two-phase commits.

But it does tie everything to postgres to some extent, and you lose the
interesting featuresatomicity and assured, single deliveryas soon as
anything in the chain does anything persistent that does not participate
in the postgres transaction.  Perhaps what we really need is more mature
components, with a unified control layer on top.  That's how a lot of
successful free software grows.  See below.

> On the other side, the "big names" these days are:
>
> a) The Java Messaging Service, which seems to implement *way* more
>options than I'm even vaguely interested in having (notably, lots
>that involve data stores or lack thereof that I do not care to use);

Far as I know, JMS is an API, not a product.  You'd still slot some
messaging middleware underneath, such as MQ.  That is why MQSeries was
renamed: it fits into the WebSphere suite as the implementing engine
underneath the JMS API.  From what I understand MQ is one of the
"best-of-breed" products that JMS was designed around.  (Sun's term, bit
hypey for my taste).

In one way, Java is easy: the last thing you want to get into is yet
another marshaling standard.  There are plenty of "standards" to choose
from already, each married to one particular communications mechanism:
RPC, EDI, CORBA, D-Bus, XMLRPC, what have you.  Even postgres has its own.
 I'd say the most successful mechanism is TCP itself, because it isolates
itself from content representation so effectively.

It's hard not to get into marshaling: someone has to do it, and it's often
a drag to do it in the application, but the way things stand now *any*
choice limits the usefulness of what you're building.  That's something
I'd like to see change.

Personally I'd love to see marshaling or low-level data representation
isolated into a mature component that speaks multiple programming
languages on the one hand and multiple data representation formats on the
other.  Something the implementers of some of these messaging standards
would want to use to compose their messages, isolating their format
definitions into plugins.  Something that would make application writers
stop composing messages in finicky ad-hoc code that fails with unexpected
locales or has trouble with different line breaks.

If we had a component like that, combining it with existing transactional
variants of TCP and [S]HTTP might even be enough to build a usable
messaging system.  I haven't looked at them enough to know.  Of course
we'd need implementations of those protocols; see
http://ttcplinux.sourceforge.net/ and http://www.csn.ul.ie/~heathclf/fyp/
for example.

Another box of important tools, and I have no idea where we stand with
this one, is transaction management.  We have 2-phase commit in postgres
now.  But do we have interoperability with existing transaction managers? 
Is there a decent free, portable, everything-agnostic transaction manager?
 With those, the sphere of reliability of a database-driven messaging
package could extend much further.

A free XA-capable filesystem would be great too, but I guess I'm daydreaming.

> There tend to be varying semantics out there:
>
> - Some queues may represent "subscriptions" where a whole bunch of
>   listeners want to get all the messages;

The two simplest models that offer something more than TCP/UDP are 1:n
reliable publish-subscribe without persistence, and 1:1 request-reply with
persistent storage.  D-Bus does them both; IIRC MQ does 1:1 and has
add-ons on top for publish

Re: [HACKERS] Using the GPU

2007-06-09 Thread Jeroen T. Vermeulen

On Sat, June 9, 2007 07:36, Gregory Stark wrote:
> "Billings, John" <[EMAIL PROTECTED]> writes:
>
>> Does anyone think that PostgreSQL could benefit from using the video
>> card as a parallel computing device?  I'm working on a project using
>> Nvidia's CUDA with an 8800 series video card to handle non-graphical
>> algorithms.  I'm curious if anyone thinks that this technology could be
>> used to speed up a database?  If so which part of the database, and what
>> kind of parallel algorithms would be used?
>
> There has been some interesting research on sorting using the GPU which
> could be very interesting for databases.

> Perhaps this can be done using OpenGL already but I kind of doubt it.

GPUs have been used to great effect for spatial joins.  And yes, using
OpenGL so that it was portable.  I saw a paper about that as an Oracle
plugin a few years back.

It works something like this, IIRC: a spatial join looks for objects that
overlap with the query area.  Normally you go through an R-tree index to
identify objects that are in the same general area (space-filling curves
help there).  Then you filter the objects you get, to see which ones
actually overlap your query area.

The GL trick inserted an intermediate filter that set up the objects found
in the R-tree index, and the query area, as 3D objects.  Then it used GL's
collision detection as an intermediate filter to find apparent matches. 
It has to be slightly conservative because GL doesn't make the sort of
guarantees you'd want for this trick, so there's a final software pass
that only needs to look at cases where there's any doubt.

Jeroen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] What is the maximum encoding-conversion growth rate, anyway?

2007-05-29 Thread Jeroen T. Vermeulen

On Tue, May 29, 2007 20:51, Tatsuo Ishii wrote:

> Thinking more, it striked me that users can define arbitarily growing
> rate by using CFREATE CONVERSION. So it seems we need functionality to
> define the growing rate anyway.

Would it make sense to define just the longest and shortest character
lengths for an encoding?  Then for any conversion you'd have a safe
estimate of

  ceil(target_encoding.max_char_len / source_encoding.min_char_len)

...without going through every possible conversion.

Jeroen

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Oracle indemnifies PostgreSQL on its patents

2007-04-01 Thread Jeroen T. Vermeulen

On Sun, April 1, 2007 01:32, Tom Lane wrote:

> The idea of OIN is to have a large patent pool that can be
> counter-asserted against anyone who doesn't want to play nice.
> Mutual assured destruction in the patent sphere, if you will.

And from the participants' point of view, I suppose the big attraction
must be that they do away with a threat to their patents.  If you have a
patent that matches what some open project (not worth suing) has been
doing for the past few years, then anyone else you might want to sue about
the patent could point to that project and say "if you have a valid
patent, why didn't you say something when they infringed it?"

Jeroen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Time-correlated columns in large tables

2007-03-05 Thread Jeroen T. Vermeulen

On Tue, March 6, 2007 03:17, Heikki Linnakangas wrote:

> I think you've just described a range-encoded bitmap index. The idea is
> to divide the range of valid values into a some smallish number of
> subranges, and for each of these boundary values you store a bitmap
> where you set the bit representing every tuple with value < boundary
> value.

That's pretty cool!  From the looks of it, what you describe would solve
my problem--but using more storage in return for more flexibility.  My
scheme really required a correlation between a value and storage order,
which can be pretty fragile.  These range-encoded bitmap indexes wouldn't
have that problem.

I guess if you did simple run-length compression on these bitmaps you'd
end up more or less where I came in.  But you wouldn't want to flip a bit
somewhere in the middle of a compressed data stream, of course. :-)

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Time-correlated columns in large tables

2007-03-05 Thread Jeroen T. Vermeulen

I'm a bit embarrassed to bring this up here because I don't know much
about storage layout and indexing.  It's probably a silly notion, but if
so, could someone please tell me how and why?

First I'll describe the situation that leads me to write this.  I'm seeing
some performance problems in an application that keeps a large table of
logged events.  Rows aren't normally ever updated, but new ones are
inserted all the time.  There are several fields containing various kinds
of timestamps, some exact and some sloppy.  These and perhaps other
columns are loosely correlated with the order in which the rows are
inserted.

[Q: Does this happen to a lot of users?  Is it worth bothering with?]

Naturally the table is frequently queried for ranges of timestamps of one
kind of another.  It's probably not atypical.  Some of the timestamp
columns are indexed, but I'm concerned about both the size of the indexes
and the cost they add to inserts.  Both queries using the indexes and
insertions can become unacceptably slow sometimes.

[Q: Are the access and insertion costs of an index really non-negligible
compared to those of the table itself?]

It sounded to me like this might possibly be a job for spatial indexes
(degraded down to a single dimension), but I couldn't find enough
documentation to figure out whether they generalize to this usage.  From
what I found, it didn't sound likely.

[Q: Do spatial indexes work on simple scalar values and operators?  If
not, any reason why not?  Otherwise, would they help?]

Bruce suggested partitioning the table, but that won't work well for
multiple independent criteria and comes with a host of scalability and
management issues.  I'd also want something that was transparent to the
application.  Perhaps I'm asking for too much but AFAIK it's exactly that
ability to separate functionality from optimization that made SQL a
success in the first place.

[Q: Is there some other transparent optimization for values that correlate
with insertion/update order?]

So I was wondering whether it would make sense to have a more compact kind
of index.  One that partitions the value range of a given column into
sub-ranges, and for each of those, merely keeps loose track of where those
value ranges might be found.  "Dates from 2006-07-15 to 2006-08-04: try
blocks 99-126 and 175."  Just a fixed maximum of two or three contiguous
block ranges per value range would probably do the trick.  The risk of
having too few, of course, is that one oddly positioned block could make
the index look as if a particular value range was spread out throughout
most of the table.

[Q: Am I re-inventing the wheel?  If not, is there really a robust, linear
correlation between a row's time of insertion or last update and its
storage block number?]

The index would not need to be exact, just a bit pessimistic.  Any
re-partitioning or re-balancing could be done offline in "vacuum analyze"
style.  AFAICS that would allow O(1) insertion cost at runtime, or more
aggressive tradeoffs to reduce run-time degradation of indexing quality. 
Most maintenance could be done incrementally to smooth out its costs over
time.

With an "index" like this, an index scan would be very cheap indeed but
you'd then also scan a small subset of the table itself for exact matches.
 I would hope that would be a win if the indexed value accounts for more
than an infinitesimal portion of table size, and is reasonably correlated
with insertion/update time.  Also I guess bitmap scans on the indexed
values could in principle used compressed bitmaps, excluding areas that
according to the index contain no matching rows.  There might be some
marginal benefit there.

[Q: Would this fit in at all?  Any particular reason why it doesn't make
sense?]


Ready and grateful for any criticism,

Jeroen



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-27 Thread Jeroen T. Vermeulen

On Wed, February 28, 2007 06:59, Jim C. Nasby wrote:
> On Tue, Feb 27, 2007 at 05:18:28PM -0500, Bruce Momjian wrote:

>> I think we will remove fsync in favor of the new delay, and allow -1 to
>> be the same behavior as fsync off.
>
> Well, presumably we'd still allow fsync for some number of versions...

I'd hate to lose the ability to disable fsync.  I run tons of tests that
don't require any protection against server crashes or hardware failures,
but their speed does matter.  I know it's not the most important
requirement in the world, but speeding up those tests means I can run more
of them, on more hardware, more often.  Test time also affects my
development cycle.

My main worry is where the option is set, though.  For my situation,
selecting a "fast and sloppy" mode when starting the server is clearly the
best choice.  It'd be possible, though awkward, to change my code to use
COMMIT NOWAIT.  But then am I really sure I'm still testing the same
thing?  Plus it introduces a risk of binaries (distributed by others)
accidentally doing COMMIT NOWAIT, as for testing, in production code.

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-26 Thread Jeroen T. Vermeulen

On Tue, February 27, 2007 06:06, Joshua D. Drake wrote:
>
>> Why do we want this?? Because some apps have *lots* of data and many
>> really don't care whether they lose a few records. Honestly, I've met
>> people that want this, even after 2 hours of discussion and
>> understanding. Plus probably lots of MySQLers also.
>
> Most users will take speed over data loss any day. Whether we want to
> admit it or not.

In that case, wouldn't it make just as much sense to have an equivalent
for this special transaction mode on individual statements, without
transaction context?  I'm guessing that who don't really know or want
transactions would never start one, running lots of loose statements
instead that otherwise get committed individually.

Jeroen

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] Cosmetic note: hit rates in logs

2007-02-13 Thread Jeroen T. Vermeulen

Just noticed a small cosmetic point in the logs when logging statement
performance data: if a statement accesses 0 blocks, the "hit rate" is
given as 0%.

I can see that that makes sense mathematically--insofar as 0/0 makes
mathematical sense at all--but wouldn't it be more helpful to represent
this as a 100% hit rate?

I guess computing hit rate as the limit of 0/x is as valid as computing
the limit of x/x (with x being the number of accesses that approaches
zero).  But when I look at the logs I find myself going "low hit rate
here--oh wait, that's for zero accesses" all the time.  Or would the
change make other people "good hit rate here--oh wait, that's for zero
accesses"?


Jeroen



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Missing directory when building 8.2.3-base

2007-02-12 Thread Jeroen T. Vermeulen

On Tue, February 13, 2007 01:45, Peter Eisentraut wrote:

> Most of the core team is convinced that the postgresql-foo tarballs are
> useless, but Marc insists on keeping them.  But since they are nearly
> useless, no one tests them, so it is not surprising that they don't
> work.

Well, hurray for Marc!  I'm writing from a country where "broadband" is
still measured in kilobits per second, and the government censors (and
causes the various companies and government monopolies along the way to
censor) Internet traffic, keeping the ICT infrastructure slow and
unreliable.  International bandwidth comes at premium prices for those who
can afford serious connections.  Much hardware on sale here is either
counterfeit or export products that failed quality-control tests or
otherwise "fell of the boat."  Downloads are sometimes quietly corrupted,
without any errors at the TCP level.  Long-lived connections often time
out.

Not having to download half again the size of a "-base" tarball can make a
difference in those situations, as can not having to download it all in
one single large file.

Jeroen

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Missing directory when building 8.2.3-base

2007-02-12 Thread Jeroen T. Vermeulen

I built the "-base" version of 8.2.3 today, for installation at a company
I'm helping out.  The build (and later, the installation) gave me an error
about a missing directory "test/regress".  IIRC I downloaded
ftp://ftp.us.postgresql.org/pub/mirrors/postgresql/source/v8.2.3/postgresql-base-8.2.3.tar.bz2

I worked around the problem by creating a directory src/test/regress
containing a Makefile with inert "all" and "install" targets.  That was
enough to get a working installation.

Is this a known problem?  Is there any test procedure that builds the
"base" distribution before release?


Jeroen



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] PQencryptPassword() and encoding

2006-12-19 Thread Jeroen T. Vermeulen

On Wed, December 20, 2006 11:08, Tom Lane wrote:
> "Jeroen T. Vermeulen" <[EMAIL PROTECTED]> writes:
>> Probably a silly question, but better safe than sorry:
>> AFAICS there's no way for PQencryptPassword() to see what encoding
>> applies.  Are we quite sure that that is not a problem?
>
> Right offhand it seems that the worst possible consequence is
> authentication failure: you originally entered your password
> as foobar in encoding X, and then when you enter foobar in
> encoding Y, you get the raspberry.  Do you see something else?

That's definitely the first thing that springs to mind.  I don't suppose
the problems we had with escaping could happen here, and there probably
aren't any security implications.

Getting different password hashes depending on your client encoding would
probably not hit a lot of people, but it would be annoying and hard to
debug where it did happen.  If it can happen in the first place, that is,
which is what I'm asking.

Jeroen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] PQencryptPassword() and encoding

2006-12-19 Thread Jeroen T. Vermeulen

Probably a silly question, but better safe than sorry:

AFAICS there's no way for PQencryptPassword() to see what encoding
applies.  Are we quite sure that that is not a problem?  Or are passwords
ASCII-only?


Jeroen



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Optimizing prepared statements

2006-09-04 Thread Jeroen T. Vermeulen

On Mon, September 4, 2006 23:03, Tom Lane wrote:

> Ah.  I think you're confusing the spectators by using "predict" when you
> should say "match".  You're looking for previously generated plans that
> have assumed parameter values matching the current query --- saying that
> the plan "predicts" a parameter value is just a really poor choice of
> wording.

I guess so...  It's been over a decade since I last busied myself with
database optimization, so I don't know any of the jargon.  Had to use
processor architecture jargon instead.

So with that out of the way, can anyone think of some good real-life
examples of prepared statement usage that I can test against?  Suggestions
I've had include TPC, DBT2 (based on TPC-C), and pgbench, but what I'm
really looking for is traces of invocations by real applications.

If we don't have a body of application traces available, can anyone think
of a convenient, nonintrusive way I could extract them out of applications
I'm running?  If there is, I could write a filter to eliminate irrelevant
information.  The filter would ignore everything other than prepared
statement invocations.  It could randomize statement names, strip out
their definitions, hide the ordering between different statements, and
replace parameter values with meaningless numbers; so it would be
relatively safe for others to volunteer traces of their own applications. 
None of those transformations would affect my simulation results.

Jeroen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Mon, September 4, 2006 03:56, Gregory Stark wrote:

> Thanks, that cleared things up enormously. I'm trying to figure how it
> would
> react to some of the queries I've written in the past. In particular I'm
> thinking of queries like
>
> WHERE (? OR category = ?)
>   AND (? OR cost < ?)
>   AND (? OR description like ?||'%')
>
> Where I then pass flags in from the application to indicate which search
> constraints to apply. If it notices that most searches are for a
> particular
> set of constraints it would be able to cache plans with the unused
> constraints
> removed.

Right.  That's pretty much the problem as Peter originally described it, I
think.

> It would not however be able to notice that the last parameter never
> contains a % and therefore can use an index scan.

If I understand you correctly, then no.  If the algorithm sees highly
variable values for that last parameter, it will never decide to assume
that that parameter will never contain '%'--and I'm not sure how that
could be done safely.

I do see two glimmers of hope, however:

1. If that last parameter is usually some specific value, then you'll
probably end up using specialized plans with that specific value in the
parameter's place.  If that value is a string without wildcards, you can
use your index on description (assuming you have one).  If it's '%' or
null, the optimizer can decide to ignore the "like" clause.  It's only
when the scheme finds that it cannot predict what the parameter's value is
going to be that you get the generic, poorly-performing code.

2. Once we have a predictor, and assuming it works, it could be tied in
with the planner a bit more.  As I believe Tom said, the planner can't
afford to chase down lots of scenarios just in case they ever happen.  But
when a parameter is used only for simple matches or inserts on non-indexed
columns, for example, the planner might find in the course of its normal
activities that there's nothing useful it can do with that parameter and
deliver this information with its plan, so that the predictor can ignore
the parameter.

> I'm also wondering how this interacts with plan stability. Obviously the
> direct effect is to throw out any chance of it. But in the long run they
> may be two complementary sides of the same thing.

Well, it'll cause some plans to be re-generated, surely.  But the
impression I've gotten from the discussion so far is some that plans were
getting too old anyway.

Jeroen

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Sun, September 3, 2006 23:52, Tom Lane wrote:

> What exactly do you mean by "optimize away a parameter"?  The way you
> described the mechanism, there are no parameters that are "optimized
> away", you've merely adjusted selectivity predictions using some assumed
> values.

I'm using "optimized away" as shorthand for "replaced with a literal
constant in the statement definition used to generate the plan."  So if a
parameter $n is found to be, say, always equal to the string 'foo', then
we might want to generate a specialized plan as if the statement's
definition contained the literal string 'foo' wherever it really says $n. 
I've been calling that "optimized for $n=='foo'" or "$n is optimized away
in this plan."

>  Actually converting a parameter to a constant is a whole
> 'nother matter --- it allows constant-folding for example.  But then you
> *cannot* use the plan unless there's an exact match to the assumed
> value.  These two approaches provide very different tradeoffs of plan
> quality vs plan specificity, so it makes me uncomfortable that you're
> failing to clarify what you mean.

Right.  When I said "optimized for" a certain parameter value, I meant
actual substitution the whole time.  I'm sorry if I didn't make that
clear; it seemed so basic that I must have forgotten to mention it.  I
guess the principle would also work otherwise, but it's intended to allow
constant folding.

So for any given statement, there would be a cache of frequently-needed
plans for different sets of constant substitutions.  As you point out, a
call could only use a plan if the plan's substitutions were consistent
with the call's parameter values (but within that constraint, the more
substitutions the merrier).  That's why I talked so much about comparing
and matching: that part is for correctness, not optimization.

As I've said before, all this falls down if there is a significant cost to
keeping one or two extra plans per prepared statement.  You mentioned
something about "tracking" plans.  I don't know what that means, but it
sounded like it might impose a runtime cost on keeping plans around. 
Could you elaborate?

Jeroen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Sun, September 3, 2006 23:28, Jeroen T. Vermeulen wrote:
> On Sun, September 3, 2006 21:52, Gregory Stark wrote:

>> I read that but apparently I misunderstood it since it would not have
>> been
>> doable the way I understood it. I thought you wanted the predictor bits
>> to
>> correspond to particular plans. If a plan was "wrong" then you marked it
>> as a
>> bad guess. I don't think that can be made to work though for the reasons
>> I
>> stated then.

Come to think of it, I still haven't answered your question very clearly...

I keep one predictor for any given statement.  The predictor keeps two
pieces of state *per statement parameter*:

1. The predictor value (usually the last-seen value for that parameter,
though it does take a few mismatches to make it drop a value that was
repeated a lot).

2. A saturating counter measuring confidence in the current predictor
value.  As long as this counter is below the confidence threshold, the
predictor value has no other meaning than "let's see if this value comes
back."

No information flows between these pieces of state.  All logic in the
predictor itself is entirely local to individual parameters.  But any new
plan is generated based on a "snapshot" of all predictor values that (i)
match their respective parameter values in the ongoing call, and (ii) have
their counters above the confidence threshold.  It is that combination of
parameters, for that combination of values, that is taken as a set of
pseudo-constants.

The cache can hold any number of plans optimized for the same combinations
of parameters but with different values; or for different combinations of
parameters but the same values; subsets and supersets of parameters with
either the same or different values; completely disjoint sets; anything. 
The cache neither knows nor cares.  The only thing that *won't* happen is
two plans in the same cache covering the same set of parameters with the
same set of values--because there is never any need to generate that
second plan while the first is still in cache.

Jeroen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Sun, September 3, 2006 21:52, Gregory Stark wrote:

> I read that but apparently I misunderstood it since it would not have been
> doable the way I understood it. I thought you wanted the predictor bits to
> correspond to particular plans. If a plan was "wrong" then you marked it
> as a
> bad guess. I don't think that can be made to work though for the reasons I
> stated then.

Oh, sorry--I guess I haven't been too systematic about it.  In the
algorithm's current incarnation, the confidence counters don't mark
anything as bad ideas, as such.  Instead, whenever a new invocation has a
set of parameters that are (1) correctly and (2) confidently predicted by
the counters, the predictor decides that it's probably a good idea to plan
for calls with that particular set of parameters built in as constants.

Assuming we didn't already have a plan for the particular combination of
values at hand, the algorithm generates a new one.  The new plan is
optimized on the assumption that those predicted parameters are constants.

We keep a small cache of recently-used plans, possibly including the
original plan where all parameters are truly variable.  Every plan also
remembers a list of the predicted parameter values, so on any next call,
we can check whether a particular cached plan actually applies to the
call.  If it doesn't (because of a mismatch between the incoming
parameters and the plan's assumed pseudo-constants), we just pick another
plan.

If multiple cached plans can be applied to a given call, we prefer the one
that optimizes away the most parameters.  Age is used as a tiebreaker, on
the assumption that more recent planning information is likely to be more
accurate.

The tricky part is deciding when to generate a new, more specialized plan
when we already have a matching one that may not be optimal.  Without this
step we'd never get beyond that first, completely generic plan--it applies
to every call.  The way I've approached it is this: when the predictor's
current state correctly and confidently predicts more of the invocation's
parameter values than any of the cached plans did, then it's time to
generate a new plan.

So let's say your statement has a parameter x that's always the same
value, say x==0, and another parameter y that's a randomly alternating
Boolean value, and another one z that varies randomly between lots of
values.  What order they're in doesn't matter, and they needn't be the
only parameters.  You're probably going to see four plans generated for
this example:

1. Either on the first call or during definition, you get the generic
plan.  This is the same plan you'd get in the existing backend, with
placeholders for variable x, y, and z.

2. Pretty soon, the algorithm is going to detect that x is always zero. 
It will generate a new plan, substituting the constant value 0 for x, and
hopefully getting better optimization because of it.

3. Sooner or later (probably fairly soon) you'll see a run of consecutive
calls where y happens to be "true."  A new plan is generated with the
assumption that y==true.  The new plan will also still assume that x==0.

4. The same is going to happen for y==false.  Yet another specialized plan
is generated.  If we keep up to 3 plans per statement, say, then this new
plan overflows the cache.  The least recently used plan is flushed to make
room for the new one--in this case, the generic one because we haven't
seen any cases where x!=0 recently.

More complex scenarios will also happen, of course, such as "if y==true
then x will usually be 0, but otherwise x will be highly variable" or "if
y==true then x is pseudo-constant and z is highly variable, but if
y==false then it's the other way around" or "if y==false then z is usually
the empty string," and so on.  The predictor as I've simulated it is "too
dumb to be intimidated" by the complexity.  It should work reasonably well
for all those scenarios, assuming of course that its cache is large enough
to remember the most frequent patterns.

Right now I'm using the plans' time since last use as the only eviction
criterion when the cache overflows.  There may be a better policy; the one
Achilles heel of LRU is the "big loop" where every cache entry is used
once, then evicted shortly before it is needed again.  (If the loop is so
big that entries are flushed long before they're needed again, well, then
it's just a big job and you stop blaming LRU :-)

> But if you have something working clearly that's not what you're doing. So
> what are you doing? Storing up a list of arguments seen for each parameter
> when executed and use the predictor bits to determine if any of those
> arguments are constants? Storing up a list of selectivity estimates?

The former.  I'm keeping a single predictor with a single "more or less
last-seen value" per parameter; plus a checklist of pseudoconstants for
every cached plan.  It's pretty simple, really, with no cost functions or
spanning trees or other intelligent logic--and certainly

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Sun, September 3, 2006 18:41, Gregory Stark wrote:

> I'm confused, what exactly are you trying to predict? Whether each
> parameter
> will be some cached value? Or whether the cached plan was correct?

That's described in more detail in a separate thread ("prepared statements
considered harmful").  In a nutshell, the algorithm detects pseudoconstant
parameters to prepared statements, and keeps a small set of different
plans optimized for recurring combinations of constants.

>> So once again, does anyone know of any realistic logs that I can use for
>> more useful simulations?
>
> You might look at the DBT test suite, iirc the TPCC spec it implements
> intentionally mixes random queries with predictable queries.

I considered the TPC benchmarks, but they're still benchmarks.  When seen
from one angle they try to look like real applications, but I'm worried
that in testing this algorithm, I may be looking at them from a very
different angle.  I'd still need actual application logs to validate them!

Jeroen

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

I've rigged up a simple simulator for the scheme I described for detecting
pseudo-constant parameters to prepared statements.  It withstands simple
tests, and it neatly picks up cases where some parameters are
pseudo-constants and others aren't--even if some of them are more "pseudo"
while others are more "constant."

What I need now is some realistic test data.  Does anyone have realistic
SQL logs that use prepared statements?

For now, I'll summarize some results I got from randomized input data.  I
used very simple traces, with 11 prepared statements, each taking a
different number of parameters (0 through 10, inclusive).  All calls were
uniformly randomized.  I used LRU replacement of cached plans, with up to
4 retained plans per statement.  Confidence counters ran from 0 to 3
inclusive, with the confidence threshold lying between 1 and 2.

Per simulation run, 20,000 statement invocations were processed.  Runs of
20,000 took about 3.5 seconds of wall-clock time each, or 0.175
milliseconds per statement, on a lightly-loaded 1.8 GHz 32-bit Athlon XP. 
That's for simulation in Python 2.4, with no effort to optimize and no
precompilation, and several lines of information composed and printed to
/dev/null for every invocation.  So I think the overhead of the matching
and comparing that the algorithm does is not a performance worry.

In my first test, parameters were uniformly-distributed integers in the
range [0, 999].  Over this test, 104 plans were generated for the 11
plans, for an average 192 calls per generated plan.  Only 133 calls out of
20,000 used optimized plans, in this case optimizing out only one
pseudo-constant each.

When parameters were made to follow the normal distribution with mean 500
and standard deviation 100 (rounded to integers), the number of generated
plans went up to 305 as more patterns were recognized, and of course the
number of calls per generated plan dropped to 65.  Of the 20,000
invocations here, 770 used plans with one parameter optimized away, and 2
used plans with two.

These results don't look very good, but bear in mind this is for
randomized data.  Can't expect to exploit many patterns in random inputs! 
Real-life use is probably going to be much more favourable.  If we want to
guard against fluke patterns in highly-variable parameters, we can always
increase the range of the confidence counters.  That would make the
predictor more skeptical when it comes to accepting reptitions as
patterns.  Just how we tune the counters would be a tradeoff between the
cost of additional planning and the benefits of optimizing out more
parameters.

So once again, does anyone know of any realistic logs that I can use for
more useful simulations?


Jeroen



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Prepared statements considered harmful

2006-09-01 Thread Jeroen T. Vermeulen

On Fri, September 1, 2006 22:14, Gregory Stark wrote:

> I think the slow part is trying to figure out whether to count the current
> call as a hit or a miss. How do you determine whether the plan you're
> running
> is the best plan without replanning the query?

The question of knowing which plan is best _based on what's in the actual
tables_ would be unsolved just as it always was.  The scheme addresses
only the opportunity to optimize for pseudo-constant parameters.  It
treats the existing planner as a black box.  If you find a solution to the
problem of inaccurate statistics, it'll probably be more or less
orthogonal to what I'm describing: you could have one or the other, but
combining them shouldn't be much harder.

I don't think telling hits from misses would be all that hard.  Let's say
you're having a prepared statement called, and you're evaluating a
candidate plan.  Each parameter is in one of two sets: those "predicted"
by the plan to have certain values (let's call them P), and those "not
predicted" by the plan because their confidence counters were below the
threshold (I'm tempted to call this set NP, but let's make it Q instead). 
Whether a parameter is in P or in Q can be derived from its confidence
counter.  In my previous example, you just take its most-significant bit.

 * For any parameter in P, if the actual value does not match the plan's
prediction, you have a miss.  Can't use this plan.  Use another if you
have one that applies (such as your regular old non-specialized
plan--that always applies), or if not, write a new one!

If you get through this without finding a mismatch, congratulations: you
have a hit.  The plan you're looking at is applicable to your call.  But
now we see if we can do better:

 * For any parameter in Q, if its value would have been predicted
correctly but its counter was below the confidence threshold, you
increment the counter.  If that lifts the counter above the threshold,
you have room for improving on this plan.  It means there's a good chance
you can re-plan for the case that this parameter is also a
pseudo-constant, without the effort being wasted.  Of course you could
also set a minimum number of invocations between re-plannings to get a
more long-term view (e.g. different parameters being recognized as
pseudo-constants in subsequent calls--you may not want to re-plan for
each of those calls).

So which plan do you execute if you have more than one applicable
candidate?  We can see what works well.  As a starter I would definitely
pick the one with the larger P (smaller Q), breaking ties in favour of the
most recently generated plan.  I'm assuming we only want one plan for a
given P.

We'd probably want to limit the number of candidate plans per statement to
some very small, fixed number--somewhere between one and four, I'd say; or
maybe one generalized plan plus up to two specialized ones.  With numbers
like that, none of this should be very expensive.  A simple linear match
against 1-4 candidates may be more effective than any attempt to be
clever.

I must admit I haven't thought through all of the consequences of caching
more than one specialized plan per statement.  For example, we could give
every cached plan its own set of confidence counters, and match an
incoming invocation against each of those; or we could keep just one "most
likely" plan with its associated predictor state, and only consider
previously generated plans if we either miss or find room for improvement
in the predictor.

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Prepared statements considered harmful

2006-09-01 Thread Jeroen T. Vermeulen

On Fri, September 1, 2006 21:30, Tom Lane wrote:

> Yeah.  One of the reasons the planner is acceptably fast is that it is
> aggressive about discarding candidate plans as soon as they are clearly
> inferior to other plans.  Tracking multiple plans that might be optimal
> under varying assumptions about the query parameters would make things
> exponentially slower.

AFAICS the planner shouldn't be affected at all--it'd just be invoked more
often as and when the need for new plans became apparent.  Not
"exponentially" (that's an overused word anyway) but "proportionally" to
that.

I've been assuming that once you have a plan, storing it is not very
expensive.  If, say, doubling the number of plans stored with a session's
prepared statements incurs some serious cost (apart from the planning
itself, of course) then that changes things.

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Prepared statements considered harmful

2006-09-01 Thread Jeroen T. Vermeulen

On Fri, September 1, 2006 16:53, Martijn van Oosterhout wrote:

> Interesting thought. It might be worth trying. But my big question: is
> all this testing and counting actually going to be faster than just
> replanning? Postgresql's planner is not that slow.

In the best case (which of course would have to be very frequent for any
of this to matter in the first place) it's mainly just a short loop
comparing the call's parameter values to their counterparts stored with
the plan and update those two-bit confidence counters.  You wouldn't
*believe* how simple you have to keep these things in processor
architecture.  :-)

> The thing is that number of possible plans is going to be proportional
> to factorial(number of tables). Once you have 3 tables you're going to
> have at least a dozen possible plans, probably more. What the best plan
> is depends strongly on what the parameters are.

Of course.  That's the whole point: to end up with a small but effective
subset of all those possible plans.  I'd guess that you could cover even
most of the nasty cases with a maximum of three plans or so per prepared
statement, including the original fully-generalized one.  The plans could
be replaced on an LRU basis, which isn't very costly for three or so
entries.

> Anyway, your plan assumes that you have information to work with. The
> current system plans prepared queries with no information at all about
> parameters and people are advocating to keep it that way. I think a
> good first step would be the plan on first execution, like Oracle does.

Yes, delaying things a bit can help a lot sometimes.  That's also what JIT
compilers in JVMs do, for instance.  FWIW, libpqxx doesn't prepare
statements until they're first called anyway.

But if this choice to discard parameter information is exactly what causes
a lot of the bad plans in the first place, as Peter says, what's wrong
with putting it to use instead?  For those cases, you're pretty much
screwed by definition as long as you fail to do so.  And it's not like
what I'm suggesting is very difficult!

The real question is whether it's worthwhile.  To find that out, we'd need
to estimate four factors: coverage (how often you'd get a useful
prediction), accuracy (how often that prediction would be accurate), cost
of misprediction (near-zero compared to current situation, assuming we
keep the generalized plans handy), and savings for correct prediction (in
our case, benefit of planning for a constant instead of a variable minus
the cost of re-planning which you say isn't very expensive).

Based on what Peter and you tell me about cost, the main worries here are
coverage and accuracy.  Coverage and accuracy can be extracted (and
tweaked!) relatively easily if we have logs of prepared-statement
executions in a wide variety of real-life applications.  Listings of
consecutive prepared-statement invocations (statement name plus parameter
values) are all that's needed.

Do we have any logs like that?  If we do, I'll be more than happy to run
some simulations and see if the idea shows any promise.  Like I said,
there's every chance that it doesn't.  It was just an off-the-cuff
suggestion and if it's no good I'll have no problems saying so.  But
there's not much point sitting around arguing over theoretical merits if
they're that easy to quantify!

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Prepared statements considered harmful

2006-09-01 Thread Jeroen T. Vermeulen

On Thu, August 31, 2006 21:41, Phil Frost wrote:

>> Is there any kind of pattern at all to this problem?  Anything
>> recognizable?  A few typical pitfalls?
>
> Frequently I have found preplanning will result in a horrible plan
> because it is assumed parameters may be volatile while in practice they
> are literals. Here is a function from my database:

That's a very common thing in processor design as well, and there's a
standard trick for it: the saturating two-bit counter.  It tends to work
pretty well for branch prediction, value prediction etc.  Usually it's the
first thing you reach for, so of course somebody may already have tried it
here and found it didn't work.

In this particular case it might be applied something like this: for each
parameter in a prepared statement you cache a predictor value, plus a
"confidence counter" saying (more or less--see below) how many times in
succession that value has repeated.  Let's say each of the counters count
from 0 to 3 inclusive, with its confidence threshold right in the middle,
between 1 and 2.

On every invocation, you check each parameter value against the
corresponding predictor value.  If it's identical, you increment its
counter (provided it can be incremented any further).  If it isn't, you
decrement its counter, and if the counter ends up below its confidence
threshold, you replace the predictor value with the new parameter value.

Then, whenever any new planning needs to be done (I'll get to that in a
moment), you see which counters are above their confidence thresholds.  In
your new planning you assume that all parameters with confident
predictions will remain pseudo-constant for the next few invocations.

Of course there's a problem when parameters do not match predicted values.
 That's where having one or two backup plans could come in handy.  You
could keep your original, fully-generalized plan around.  If plans are
cheap enough to store, you could try to keep a cache of old plans for the
same query.  The great thing about keeping some backup plans around is
that a pseudo-constant parameter can have a different value once in a
while, then flick back to its old habits without invalidating all your
efforts.  Your usually-unused search fields are a good example.  You may
also have two stable parameter patterns with different sets of
pseudo-constants competing for your attention.

It's not perfect, and it clearly has its pathological cases--but if it
works well enough overall, the bad cases could be caught and handled as
exceptions.  Confidence counters can be tweaked to lose confidence more
easily than they gain it, or vice versa.  Some meta-confidence scheme may
catch the worst offenders.  I won't go into that now--first I'll shut up
and wait for others to point out what I've missed.  :)

Jeroen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Prepared statements considered harmful

2006-08-31 Thread Jeroen T. Vermeulen

On Thu, August 31, 2006 18:56, Peter Eisentraut wrote:

> With time, it becomes ever clearer to me that prepared SQL statements are
> just
> a really bad idea.  On some days, it seems like half the performance
> problems
> in PostgreSQL-using systems are because a bad plan was cached somewhere.

Is there any kind of pattern at all to this problem?  Anything
recognizable?  A few typical pitfalls?

Without knowing much of the internals, I could imagine [waves hands in
vague gestures] other options--something like recognizing major changes
that upset the cost functions that went into generating a plan, and
invalidating the plan based on those; or noting bad estimates somehow as
they become apparent during execution, and annotating the plan with a
"this assumption was a bad idea" marker so you'll do better next time.

I guess you can't go far wrong if you re-define "prepared" to mean merely
"pre-parsed," but it sounds like such a waste of opportunity...

Jeroen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [OT] MySQL is bad, but THIS bad?

2006-05-19 Thread Jeroen T. Vermeulen

On Fri, May 19, 2006 13:25, Thomas Hallgren wrote:

> If that's really true, then let's create a bidirectional compatibility
> layer as a joint
> venture with people from the MySQL camp. Should be a win-win situation. I
> somehow doubt that
> is the case. Important yes. But "just as important"? No way.

I'm not too hopeful, for two reasons.  First: MySQL is very, very
different.  I heard they just introduced a "create user" command like
everybody else, but that's a drop in an ocean.  I'm sure it's
unintentional, but publishing a "quaint" SQL dialect amounts to a
vendor-lock-in scheme--this time with the barn being locked before the
cash cows have walked in.

Second: management changes at MySQL seem to have favoured conventional
business thinking over following the techs where no man has gone before. 
A year or two back we discussed porting libpqxx to MySQL so we'd have at
least a strong, common C++ layer.  Some of the technical people loved it,
a provisional team was sketched out, and the idea was pitched to
management.  The argument: the more stable interfaces we share, the more
confident corporate customers will feel adopting free databases.

It didn't go anywhere.  Reports I heard later amounted to "they don't see
why they should spend the money."

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Return results for PQexec vs PQexecP*

2006-05-17 Thread Jeroen T. Vermeulen

On Wed, May 17, 2006 19:53, Martijn van Oosterhout wrote:

> The main problem with PQexec and co is that they don't really do very
> well if a single query produces multiple result sets. I'm not sure if
> it's defined whether you get the first or the last. In any case, if you
> want all the result sets, you need to be looking at PQsendquery and co.

AFAIK it's well-defined if you send multiple queries in a single string,
separated by semicolons: PQexec() returns the result of the last query.

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Pregunta sobre limitar el

2006-03-15 Thread Jeroen T. Vermeulen

On Tue, March 14, 2006 22:14, Paulina Quezada wrote:
> Hola, necesito saber con que sentencias sql puedo controlar desde un
> trigger
> el número de registros a actualizar o a borrar, esto para que desde la
> consola no se hagan deletes o updates masivos.

Aqui se habla ingles.


Jeroen



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Pgfoundry and gborg: shut one down

2006-02-19 Thread Jeroen T. Vermeulen

On Mon, February 20, 2006 11:00, Marc G. Fournier wrote:

>> Speaking for libpqxx, my only concern with that is the mailing list.
>> Would those have to move to different addresses--or conversely, would a
>> forced migration make it much easier to move *all* GBorg mailing lists
>> to
>> pgFoundry and maintain their old addresses?
>
> All addresses would have to be changed to the pgfoundry.org one ...

Ouch!  Moving my project off GBorg wasn't so hard, but forcing all mailing
list subscribers to move to a different address does hurt.

If the same goes for many other projects on there, wouldn't it be possible
to move all mail handling for gborg.postgresql.org over to pgFoundry at
once, but preserve the domain name and list names?  It may help people
make the jump if mailing list migration could be decoupled from the other
changes.

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Pgfoundry and gborg: shut one down

2006-02-19 Thread Jeroen T. Vermeulen

On Sun, February 19, 2006 05:10, Bruce Momjian wrote:

> I don't care what direction we go, just kill one.

Speaking for libpqxx, my only concern with that is the mailing list. 
Would those have to move to different addresses--or conversely, would a
forced migration make it much easier to move *all* GBorg mailing lists to
pgFoundry and maintain their old addresses?


Jeroen



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] ROLLBACK triggers?

2006-01-23 Thread Jeroen T. Vermeulen

On Mon, January 23, 2006 16:35, Daisuke Maki wrote:
>
> I'm currently trying to embed Senna full text search engine
> (http://qwik.jp/senna/) into postgres. I'm trying to achieve this by
> using triggers (implemented in C) to cause an update to senna's index at
> various points.
>
> This seemed to work fine until I realized that while postgres' SQL
> commands could be rolled back, Senna's index remained already-changed.
> There are other potential issues with regards to transaction safety, but
> currently this seems to be a problem that I cannot fix by simply
> patching Senna. So I thought that if there was a rollback trigger, I
> could call whatever necessary to undo the changes that were made to the
> index.

I may just be being stupid here (haven't had my coffee yet) but are you
sure that:

I. The triggers really do arrive even when the modifications are aborted? 
AFAIK triggers that were, er, triggered during a transaction only really
get notified once the transaction commits.  In psql:

=> LISTEN x;
LISTEN
=> BEGIN;
BEGIN
=> NOTIFY x;
NOTIFY
=> ABORT;
ROLLBACK
=> BEGIN;
BEGIN
=> NOTIFY x;
NOTIFY
=> COMMIT;
COMMIT
Asynchronous notification "x" received from server process with PID 42.

As you can see, the ABORT also rolled back the NOTIFY, so it never
arrived.  This may be exactly what you want.

Well, actually it raises another question: is it alright for the ongoing
transaction not to see any changes it makes reflected in your index?

II. Is there any chance of wrapping your work in a function, so you can
then create an index on the result of that function?  I've never tried
this but presumably the server would then do all the work to keep your
index updated, without any need for triggers and such.

This is no different from what you'd do if you wanted, say, an index on an
upper-cased version of a text field to speed up case-insensitive searches.
 You create an index on "TOUPPER(name)" or whatever it is, and then when
you select on "WHERE TOUPPER(name)=TOUPPER(searchstring)" you get full use
of the index, which you wouldn't get from a regular index on "name".

Jeroen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Old interfaces directory in CVS tree?

2005-11-07 Thread Jeroen T. Vermeulen

On Tue, November 8, 2005 00:02, Marc G. Fournier wrote:
> On Mon, 7 Nov 2005, Andrew Dunstan wrote:

>> Oh, the top level interfaces directory.  I misunderstood. Why is anybody
>> checking that out at all? Are we keeping it for historical purposes?

I think the person in question was doing a regular checkout of the
mainline source tree, and found this in there.


> Yes, since past releases did include it, so if we check out a previous
> release, it needs to be able to pull those files as well ...

But 8.1 is hardly a previous release.  Shouldn't this directory be in the
attic or something?


Jeroen



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Old interfaces directory in CVS tree?

2005-11-07 Thread Jeroen T. Vermeulen

On Sat, Nov 05, 2005 at 09:46:15AM -0500, Andrew Dunstan wrote:
 
> >A libpqxx user just informed me that the anonymous CVS repository at
> >anoncvs.postgresql.org still contained a 2002 version of libpqxx in the
> >interfaces directory.  I checked it out and otherwise it seems to be the
> >current source tree--at least I found an 8.1 version number somewhere.
 
> you informant probably needs to use the -P option for cvs checkout.

Doesn't make a difference.  Besides, the outdated source is there even on
a fresh checkout.


Jeroen


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] additional GCC warnings

2004-10-18 Thread Jeroen T. Vermeulen

On Sun, Oct 17, 2004 at 01:50:46PM -0400, Tom Lane wrote:
 
> > -Wdeclaration-after-statement (Recent versions of GCC allow declarations 
> > and statements to be intermixed in C; enabling this flag would enforce 
> > the current convention of avoiding this style.)
> 
> Ick.  If the default is to allow that, then yes we need a warning.
> I'd be fairly annoyed if it's not an error, in fact.
 
IIRC it's a new feature in C99.  If that is the case, you may want to
tell gcc simply to compile an older C dialect and get errors for all
such newfangled code.


Jeroen


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] profile-guided opt. w/ GCC

2004-09-30 Thread Jeroen T. Vermeulen

On Thu, Sep 30, 2004 at 07:07:27PM +1000, Neil Conway wrote:

> I think it would be cool to add support for PGO to PostgreSQL's build
> system (for 8.1). There are a lot of situations where PostgreSQL is
> compiled once, and then used for weeks or months (compilations for
> inclusion in a distro being the extreme example). In that kind of
> situation, trading some additional compile-time for even a small
> improvement in run-time performance is worthwhile, IMHO.

It's some time ago now, but a group at the Universitat Politecnica de
Catalunya (including my thesis advisor Alex Ramirez and our databases 
specialist Josep Larriba-Pey) has done a case study on something similar,
using PostgreSQL as their case study.

What they researched was a code reordering algorithm that minimized both
taken branches and I-cache clashes.  The scheme was quite aggressive, even
going so far as to coallocate code in some functions with code in their
most frequent callers.  The study also includes a characterization of the
I-miss and execution intensities of the backend, in a neat matrix with
the major functions on one axis and the stage from which they're invoked 
on the other.

The paper may be enlightening.  Just a moment while I google for it...

...Got it.  Here's the paper:

http://research.ac.upc.es/CAP/hpc/Papers/1999/aramirez1999aC.pdf

And here's the Citeseer entry:

http://citeseer.ist.psu.edu/context/163268/0

Jeroen

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] transaction idle timeout in 7.4.5 and 8.0.0beta2

2004-09-18 Thread Jeroen T. Vermeulen

On Sat, Sep 18, 2004 at 04:05:26PM -0400, Tom Lane wrote:
 
> Well, I think it would time out quickly --- anyway on the order of
> minutes not hours.  By hypothesis, the situation you're worried about is
> where the backend was unable to send you a COMMIT acknowledgement
> message.  The kernel is going to realize that it didn't get an ACK back,
> and is going to retry a few times, and is then going to declare the
> connection lost.  The case where you may have a very long delay before
> detection of connection loss is where the backend is sitting idle with
> nothing to send.

That's one load off my mind, thanks.


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] transaction idle timeout in 7.4.5 and 8.0.0beta2

2004-09-18 Thread Jeroen T. Vermeulen

On Sat, Sep 18, 2004 at 03:41:24PM -0400, Tom Lane wrote:
> >> No, stats_command_string need not be set, only stats_start_collector.
> 
> > BTW, I've got this set (I'm even running as "postgres") but still I get
> > the "" message instead of current_query.  :(
> 
> It has to be set in the backend you want the info from ...

I set it in /etc/postgresql/postgresql.conf; then I restarted the postmaster
just to be sure, and checked from psql.  Same message, even about the same
session:

[EMAIL PROTECTED]:~$ sudo su postgres -c 'postgresql-8.0/bin/psql template1'
Password:
Welcome to psql 8.0.0beta1, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit

template1=# select * from pg_stat_activity;
 datid |  datname  | procpid | usesysid | usename  |current_query | 
query_start 
---+---+-+--+--+--+-
 1 | template1 |   11559 |1 | postgres |  | 
(1 row)

template1=# show stats_start_collector;
 stats_start_collector 
---
 on
(1 row)

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] transaction idle timeout in 7.4.5 and 8.0.0beta2

2004-09-18 Thread Jeroen T. Vermeulen

On Sat, Sep 18, 2004 at 02:32:32PM -0400, Tom Lane wrote:
 
> No, stats_command_string need not be set, only stats_start_collector.

BTW, I've got this set (I'm even running as "postgres") but still I get
the "" message instead of current_query.  :(


Jeroen


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] transaction idle timeout in 7.4.5 and 8.0.0beta2

2004-09-18 Thread Jeroen T. Vermeulen

On Sat, Sep 18, 2004 at 02:32:32PM -0400, Tom Lane wrote:

> No, because of the reporting delay.  I would recommend waiting for the
> backend's row in pg_stat_activity to disappear entirely.  Under normal
> circumstances that should occur quickly.  If there's a communications
> problem, it might take awhile to detect connection loss ... but if
> there's a communications problem, you probably aren't going to be able
> to open a new connection, either.

Unfortunately, a communications problem is exactly the kind of scenario
we were talking about in the first place!  Might be a misguided firewall,
for instance.  (In which case we'd want the TCP connection to time out
quickly on the server as well to avoid piling up dead backends, but that's
another matter).

BTW is there really no other way to see if a query (for the same user who's
looking) is still executing?  It'd be really helpful if the check could be
done from the same client process that lost the connection in the first
place.

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] transaction idle timeout in 7.4.5 and 8.0.0beta2

2004-09-18 Thread Jeroen T. Vermeulen

On Sat, Sep 18, 2004 at 12:43:05PM -0400, Tom Lane wrote:

> I don't see any reason for guesswork.  Remember the PID of the backend
> you were connected to.  On reconnect, look in pg_stat_activity to see if
> that backend is still alive; if so, sleep till it's not.  Then check to
> see if your transaction committed or not.  No need for anything so
> dangerous as a timeout.

Looks like that only works if stats_command_string is set (and backend
version is at least 7.2), correct?  I couldn't find this table in the
online documentation, but can I assume that the query will have finished
executing (whether for good or for bad) when its current_query is either
empty or null?

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] transaction idle timeout in 7.4.5 and 8.0.0beta2

2004-09-18 Thread Jeroen T. Vermeulen

On Sat, Sep 18, 2004 at 12:43:05PM -0400, Tom Lane wrote:
 
> I don't see any reason for guesswork.  Remember the PID of the backend
> you were connected to.  On reconnect, look in pg_stat_activity to see if
> that backend is still alive; if so, sleep till it's not.  Then check to
> see if your transaction committed or not.  No need for anything so
> dangerous as a timeout.

I didn't know you could do that!  I'll look into it.  Thanks.


Jeroen


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] transaction idle timeout in 7.4.5 and 8.0.0beta2

2004-09-18 Thread Jeroen T. Vermeulen

On Fri, Sep 17, 2004 at 08:47:02AM +0200, Szima G?bor wrote:

> I was implement the "transaction idle timeout" function in PostgreSQL
> (version 7.4.5 and 8.0.0beta2)

It sounds interesting to me (for use in libpqxx, the C++ API), but perhaps
for a slightly unusual reason.  When a connection to the backend is lost
just as you're waiting for the result of a COMMIT, you can't be sure if the
transaction was rolled back or not.

If I could know (not influence, just "know") when a transaction times out,
then I could wait for this amount of time, reconnect to the backend, and
check for some record left in a special table by the transaction.  If it's
not there, I'll know "for sure" (insofar as anything can ever be sure) that
the transaction was not committed.  This is still guesswork in the current
situation.

There's one assumption, however, and that is that the transaction will time
out even if (part of) the timeout time was spent processing rather than
waiting for a command.  I'm not sure how many people would be willing to
pay that price.  Perhaps a connection timeout would be more appropriate for
this sort of thing.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] WIN1250 as server encoding

2004-09-15 Thread Jeroen T. Vermeulen

On Wed, Sep 15, 2004 at 05:02:44PM +0200, Peter Eisentraut wrote:
> Some people have requested to add WIN1250 as an allowed server encoding.  
> So far, the order of the encoding numbers determined which ones were 
> client-only, so in order not to renumber the encodings, I could only 
> come up with the attached ugly solution.  If no one thinks of a better 
> one, we'll go with that.
 
Probably a silly suggestion, but...  A second encoding number which happens
to assign the same character codes as WIN1250, with a 1-to-1 translation
table, and renaming the old number to make room for the new one in newly
compiled code?


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] devx article

2004-08-20 Thread Jeroen T. Vermeulen

On Sat, Aug 21, 2004 at 01:15:29AM +0200, Gaetano Mendola wrote:
 
> http://www.devx.com/dbzone/Article/20743
> 
> I notice on page 2:
> 
> "New versions of PostgreSQL support standard row-level
> locking as an option, but MVCC is the preferred method"
> 
> What this does mean ?

Or this one:

"MySQL does, however, support the advanced feature of data partitioning
within a database. PostgreSQL does not."

Isn't that what tablespaces are all about?


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] COPY with column headings

2004-08-16 Thread Jeroen T. Vermeulen

On Mon, Aug 16, 2004 at 11:30:49AM -0400, Tom Lane wrote:

> The bigger question is whether this would do anything to satisfy the
> requestor.  He probably wants the headings to appear in the file
> resulting from COPY TO file (or the psql equivalent), which this would
> not do.  Providing the info in the COPY PGresult would change nothing
> except the behavior of client applications specially rewritten to use it.

True, and from a compatibility standpoint that would probably be good...

Output post-processing (translation to CSV or XML output, XSLT transforms,
what have you) IMHO should probably go on top of COPY anyway, rather than
into it.  Are people really having problems with

select * from table where 1=0 ; copy table to stdout

and first printing the column names found in the first PGresult, then
dumping the COPY lines to the same fd?  I think that should do the trick for
most uses, although obviously I'm no expert.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] COPY with column headings

2004-08-16 Thread Jeroen T. Vermeulen

On Mon, Aug 16, 2004 at 10:53:36AM -0400, Bruce Momjian wrote:
> Someone just asked about a COPY capability to supply the column headings
> as the first line of the copy statement.  Do we want to support
> something like that?  Does anyone else want such functionality?

Wouldn't it be more logical, and more compatible, to keep this information
as regular column information in the PGresult returned by the COPY?  Or
would that clash with the difference between "command" and "query" result
handling?

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] try/catch macros for Postgres backend

2004-07-29 Thread Jeroen T. Vermeulen

On Thu, Jul 29, 2004 at 09:58:54AM -0400, Tom Lane wrote:

> Right.  The last bit (FINALLY executes whether or not a CATCH block
> re-throws) seemed too messy to handle in my little macros, so I'm
> planning on leaving it out.  But I'm open to the idea if anyone has
> a clever implementation thought.

There's also the alternative of going to C++, of course, which would
give you full native exception handling.  Most of this "finally" stuff
will go away when you have destructors, IMHO, and resource cleanups are
a whole lot easier.  The main drawback is that stricter rules apply to
gotos and longjumps--but most of those will be "a poor man's exception
handling" anyway.

Jeroen

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] try/catch macros for Postgres backend

2004-07-29 Thread Jeroen T. Vermeulen

On Thu, Jul 29, 2004 at 12:10:12AM -0400, Alvaro Herrera Munoz wrote:

> (The "finally" block, AFAIU, is executed whether an exception was raised
> or not, so it serves as cleanup for try and catch blocks.  Somebody more
> knowledgeable in this OO matters may enlighten us better?)

...Or I could try.  Yes, the "finally" block is executed after executing
the "catch" block if an exception was caught, or when leaving the "try"
block if there wasn't.  That includes both normal completion and uncaught
exceptions.

This is useful for cleanup stuff, as you say--mostly because Java doesn't
have C++'s destructors to take the cleanup out of your hands.

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Binary Cursors, and the COPY command

2004-07-27 Thread Jeroen T. Vermeulen

On Mon, Jul 26, 2004 at 10:06:28PM -0400, [EMAIL PROTECTED] wrote:

> So what you are saying is that you should inconvenience 90% of your users
> to make sure they do something "right?"

I would say that was pretty solid reasoning.  Exposing 10% of users to a
high data corruption risk just to get slightly better CPU performance on
I/O bound operations (think Amdahl's Law!) does not sound all that sensible
to me.  Remember what happened to NT on the Alpha?  Device driver writers
failed to use the portability macros for byte or 16-bit memory-mapped
hardware register accesses, exactly because it also worked without on x86.

This was one of the main reasons why they had to add 8-bit and 16-bit
loads and stores to the architecture!  Similarly, lots of programs harbor
annoying, hard-to-find endianness bugs because little-endian byte order
happens to hide some typical pointer bugs.

> When you *really* think about it, by forcing a big endian byte order over
> machine dependent byte order, you are inconveniencing 90% of the users,
> but only helping the very small amount of people who run in mixed
> environments where the server is intel and the client is big endian.

I don't see this holding in situations where the user, the programmer, and
the systems manager are different (groups of) people.  Users may want to
connect to different servers.  Systems managers may want to replace servers.
In today's world even a non-programming user might recompile your
application on a big-endian machine.  Applications may be running under
emulation, and people will prefer to run the database server natively.

All those scenarios may break the client-side application.  The chance that
the mixed-endian scenario had never been tested would be close to those 90%.

> Few deployments will *ever* really have different endian-ness amongst
> their servers. 99% of all deployments will be the same or compatible
> hardware, and probably intel at that.

I'd like to add 2 points here:

1. Server architectures and client architectures are two very different
things.  There are lots of SPARC, PA-RISC, MIPS, POWER, zSeries etc.
servers out there.

2. Even if both platforms are AMD or AMD-clones (we'll have to get used to
saying that as 64-bit becomes more popular), the number of situations where
this is _guaranteed_ as a part of the project definition will be much lower.

Remember, the programmer should generally support both the single-endian
and the big-endian scenario anyway.  We might as well make sure that the
hardest is also the most widely tested.

Jeroen

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-12 Thread Jeroen T. Vermeulen

On Sun, Jul 11, 2004 at 07:23:13PM -0400, Bruce Momjian wrote:
> 
> Were are we on deciding how PREPARE in aborted transactions should behave?

Haven't gotten much further than agreeing that current behaviour is
quirky.  It does not follow that we agree it's bad.  I would say most
of us agree that it may have been a little rash to unify prepared
statements on the fe/be protocol level and those on the SQL level into
the same namespace.

One piece of progress we did make is a layercake model to facilitate
discussion of the fe/be protocol and other interface issues:

SQL session - tables & queries and stuff
Interchance - dat a representation, encodings &c.
Protocol - bind/prepare/execute, portals &c.
Connection - sockets and such

Some observations we can make based on this:

 - Transactionality plays at the upper three levels.  Should the upper level
   be exclusively transactional?  In other words, should we have an ACID SQL
   implementation?  This issue can in principle be separated from any choices
   at the lower levels, but currently prepared statements and cursors cross
   the divide.

 - Session variables see significant use at the Interchange level, yet are
   manipulated at the SQL level.  This means they're transactional although
   some would like to see them work nontransactionally.

 - To make things easier we've lumped everything between the client socket
   and client-side business logic under the term "middleware."

 - Some middleware such as language drivers hide only the lowest levels but
   leave SQL alone (JDBC), or hide only the Protocol level, expose the
   Connection level, and leave everything else to further client software
   (libpq), and some hides the lower 2 levels, gets involved in the upper
   level, but doesn't touch the Interchange level (libpqxx).  This may
   influence how easy it is for the middleware to support transactionality
   for various features, and where we would like to have it and where we
   wouldn't.  Having nontransactional behaviour is convenient for middleware
   that uses prepared statements and is not transaction-aware.

 - There is also a layercake at the client side.  Middleware's naming
   choices for e.g. prepared statements must not clash with those of other
   layers, and this currently requires specific documentation.  I would
   describe that as another namespace problem.

Basically the whole problem probably wouldn't be with us if prepared
statements on the SQL level were different from the identically-named
concept in the fe/be protocol.

Jeroen

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: subtransactions and FETCH behaviour (was Re: [HACKERS] PREPARE and transactions)

2004-07-05 Thread Jeroen T. Vermeulen

On Tue, Jul 06, 2004 at 08:45:52AM +1200, Oliver Jowett wrote:

> This is a non-starter for JDBC: it has no control over when an 
> application decides to access a ResultSet in a way that results in a 
> FETCH of new data.

>From what you're telling me, I'm not sure I like JDBC!  Why did they come
up with such a low-level design?  Sounds like little more than a thin ODBC
wrapper plus JVM marshalling...

> Buffering *all* the ResultSet data client-side isn't an option -- 
> cursors are used specifically to handle resultsets that don't fit into 
> heap on the client side. And implementing a disk cache or similar a) 
> doesn't work if you don't have disk access, b) is bandwidth-intensive 
> and c) is really silly -- that's work that belongs on the server side, 
> or why bother with implementing cursors at all?!

But does this type of ResultSet scroll cursors?  Because in that case, it
should be easy to reset the cursor's position at rollback!  Not fast
perhaps, but easy.  Screw fast when you're rolling back, because you'll
have other things to worry about.

Okay, I know, you might not _want_ to reset on rollback.  But it does give
the middleware a lot more freedom to play with connections etc. like we
discussed before.  So personally, if it meant that I had to support
rollbacks, I would think it was a small price to pay for full ACID
guarantees.

> Having ResultSets spontaneously change position on transaction 
> boundaries would cause even more portability problems -- and it goes 
> completely against how that API is designed (it's meant to *insulate* 
> the application from details like cursors that may be used behind the 
> scenes).

Are you saying this is not something you'd be able to hide in the driver?

> Reporting the new cursor positions at the protocol level when rollback 
> happens might help but it's still fairly ugly and would need a protocol 
> version change.

It would be nice IMHO to have a "tell" function for cursors, giving the
enumerated current position of the cursor.  I can fake that by counting
rows, in fact I've already done that, but it's not pretty and it easily
gets confused with the lower isolation levels (which fortunately Postgres
doesn't have).

> Also consider that the V3 protocol Execute message is essentially a 
> FETCH (you can only do FETCH FORWARD count, but it's otherwise 
> equivalent). This is another case of overlap between the SQL level and 
> the protocol level and has much of the same problems as we have with 
> PREPARE vs. Bind/Execute. The protocol-level portal state shouldn't 
> suddenly change on a transaction boundary. I can understand closing 
> nonholdable portals when the creating transaction closes (the data 
> source just disappeared) but having the portal change *position* would 
> be very weird.

You're beginning to convince me that maybe ACID for transactions in
postgres is unsalvageable and we should be thinking about some alternative,
such as ways of finding out whether ACID still applies to the current
transaction, and/or whether the current statement will change that...

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-05 Thread Jeroen T. Vermeulen

On Tue, Jul 06, 2004 at 12:17:50AM +1200, Oliver Jowett wrote:

> 7.4/7.5's behaviour leaves the cursor state unchanged by the rollback:
> 
> DECLARE foo CURSOR WITH HOLD FOR SELECT * FROM sometable
> 
> BEGIN
>FETCH FORWARD 10 FROM foo   -- returns rows 1..10
> ROLLBACK
> 
> BEGIN
>FETCH FORWARD 10 FROM foo   -- returns rows 11..20
> ROLLBACK

That makes me wonder why people want to maintain transactionality w.r.t.
nested transactions but not with "outer" ones.  Odd!

I can see the technical problem, of course, although I think it should
respect rollbacks if at all possible without sacrificing significant
performance *in the commit case.*  Verify for failure, but optimize for
success.  Even if the cursor cannot go backwards I'd rather see those rows
buffered and "spliced back" into the cursor's stream on rollback.  Perhaps
the reasoning is that you process the batches returned by the cursor
inside a transaction, and have separate error handling for failed batches.

But then the FETCH should still occur before the transaction as far as I'm
concerned.  You fetch a batch (if it fails, you terminate) and *try* to
process it.

> >There's just been a discussion here about how
> >nested transactions should not be allowed to FETCH from cursors defined in
> >a wider scope for precisely this reason: to ensure neat transactional
> >behaviour.
> 
> This breaks things like JDBC that want to use cursors to batch access to 

This is a restriction on nested transactions, which aren't even in a real
release yet.  I thought you said you can't break compatibility without
changing the code?  ;)

> I don't like rollback of FETCH for much the same reasons as I don't like 
> rollback of PREPARE -- lots more work on the client side. See my mail on 
> the other thread. Avoiding changing the behaviour of FETCH in the above 
> case is also an argument against it.

In the case of FETCH, where does that extra work come from?

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-05 Thread Jeroen T. Vermeulen

On Mon, Jul 05, 2004 at 11:44:08PM +1200, Oliver Jowett wrote:

> > SQL session - temp tables, session variables, database contents
> > Interchange - encoding & representation
> > Protocol - COPY, bind/execute &c.
> > Connection - socket stuff

> >Transactions come into play at the Protocol level, and the way things are
> >implemented, go all the way up to SQL level.  Only the Connection level is
> >entirely nontransactional, and the SQL layer to my intuition ought to be
> >exclusively transactional.  The only current exception to that that springs
> >to mind is the way PREPARE is implemented.
> 
> Other exceptions I can think of are FETCH and DEALLOCATE. DEALLOCATE is 

Well, I'd say DEALLOCATE is implicitly lumped in with PREPARE.  But as for
FETCH, are you referring to cursors that live outside transactions (but get
manipulated inside transactions)?  Are those implemented yet and if so, how
does FETCH work there?  There's just been a discussion here about how
nested transactions should not be allowed to FETCH from cursors defined in
a wider scope for precisely this reason: to ensure neat transactional
behaviour.

> particularly fun -- don't most of the arguments for making PREPARE 
> transactional also apply to DEALLOCATE? Is it actually feasible to roll 
> back a DEALLOCATE?

I can see how it gets a bit more complicated when you DEALLOCATE, then
PREPARE a new statement with the same name in the same transaction.  But
nothing that a transaction-local mapping (plus "being deleted" bit) can't
fix, AFAICS.

Jeroen

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-05 Thread Jeroen T. Vermeulen

On Mon, Jul 05, 2004 at 10:21:26AM +1200, Oliver Jowett wrote:

> It certainly affects the JDBC driver -- the native String representation 
> in Java is UTF-16, so the driver transcodes between that and 
> client_encoding for parameterized queries and query results involving 
> strings.

Oops, yeah, I forgot.  So perhaps we should be distinguishing several
layers in a session's state, along the lines of:

SQL session - temp tables, session variables, database contents
Interchange - encoding & representation
Protocol - COPY, bind/execute &c.
Connection - socket stuff

> So at least from that point of view, client_encoding is very much a 
> protocol-level thing. Much as I see PREPARE :)

The Interchange layer is the ugly stepchild here; it's controlled at the
SQL level but should be handled either by the application or in middleware,
together with the Protocol layer.  The model really seems to assume that it
belongs in the application, which in your case of course is not an option.
If they were placed at the driver level (together with Protocol) then I'd
see how they might as well be nontransactional.  Are there even significant
uses of session variables other than controlling the Interchange layer?

Transactions come into play at the Protocol level, and the way things are
implemented, go all the way up to SQL level.  Only the Connection level is
entirely nontransactional, and the SQL layer to my intuition ought to be
exclusively transactional.  The only current exception to that that springs
to mind is the way PREPARE is implemented.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-04 Thread Jeroen T. Vermeulen

On Sun, Jul 04, 2004 at 06:39:46PM -, Greg Sabino Mullane wrote:
  
> > There's no actual extra parsing involved, as far as I can see, just
> > pattern matching and the extraction of the variables.
>  
> That sounds like "parsing" to me. :)
  
Depends on your definition, I guess.  I would say very limited lexical
analysis, yes, but nothing involving actual structure beyond individual
lexical tokens.


> It's already been done in DBD::Pg. Naming starts at dbdpg_1 and goes to
> dbdpg_2, dbdpg_3, etc. The only requirement we ask of the application
> using it is that you don't prepare statements yourself named "dbdpg_x".
> In most cases, the application does not worry about the naming anyway,
> but simply issues an anonymous prepare request through DBIs paradigm of
> one statement handle bound to a single SQL statement. DBD::Pg also does
> the deallocating itself, and keeps track of the transaction status as well.
> Deallocation is merely a courtesy anyway, as we don't reuse the names.
>  
> If there are flaws in the above design, I'd like to know about them,
> as all of this prepare/execute stuff is rather new and undertested.

Can't think of any, as long as you don't try to manage the connection.


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] LinuxTag wrapup

2004-07-04 Thread Jeroen T. Vermeulen

On Sun, Jul 04, 2004 at 12:10:52AM -0400, Alvaro Herrera wrote:
 
> You made me remember that some time ago a non-tech fellow presented me
> as giving a talk about "Postgresol" ... the audience had quite a laugh.
> It seems nobody thought about instructing him on how to pronounce the
> thing ... it was rather embarrasing anyway.

I once ran into a case of "Postgresequel."  Asked the presenter about it
afterwards and he apologized, citing prolonged stay in the US where, he
said, everybody pronounced SQL as Sequel all the time.


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-04 Thread Jeroen T. Vermeulen

On Sun, Jul 04, 2004 at 02:33:53PM +1200, Oliver Jowett wrote:
 
> Consider SET client_encoding then..

Does that really affect most middleware?  In my situation for instance,
what goes through the connection either way is "just bytes" to the
middleware.  Its interpretation is a client matter.  So to me this is a
client application thing (though of course lower-level than normal SQL)
and it's also fully transactional.


Jeroen


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-03 Thread Jeroen T. Vermeulen

On Sat, Jul 03, 2004 at 02:59:58PM +1200, Oliver Jowett wrote:
> 
> I think you mean "between 7.2 and 7.3".

Ah, OK.  I thought PREPARE had been added in 7.4.  My apologies.

> Yes. I see PREPARE/EXECUTE as a SQL-statement-level, connection-local 
> way of getting control over reuse of plans that doesn't otherwise affect 
> the semantics of the query. With the V3 protocol you can also do it at 
> the protocol level rather than the statement level, but it's still the 
> same underlying operation; why should it behave differently?

The real basis of this pattern-matching idea I'm proposing is that the
naming issue (which is semantic) and the reuse of plans (which is an
optimization) could be handled separately.  The next question I think is
whether the two should really share a namespace, and whether the two types
of names should have the same behaviour.  The underlying mechanism would 
remain the same, but once these issues have been decoupled I think SQL
names and protocol-level names are easy to see as different things.

> I'm not too worried, to tell the truth -- the JDBC driver has already 
> moved to using the protocol-level approach, and so long as that doesn't 
> change I'm happy. It just seems to me that the changes you're advocating 
> are going to break more clients than they help (e.g. it would have 
> required nontrivial work on the JDBC driver as of a month ago to deal 
> with the change), and for no convincing reason.

Maybe.  OTOH I find it simply disturbing (as a matter of aesthetics, you
could say) that the application can be messing around with the protocol
underneath the middleware it's supposed to live on top of--the middleware
that should expect to be in control of the backend below the SQL level.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] LinuxTag wrapup

2004-07-03 Thread Jeroen T. Vermeulen

On Sat, Jul 03, 2004 at 05:59:17PM +0200, Andreas Pflug wrote:

> classifying the questions we got those three days in the PostgreSQL 
> booth on LinuxTag, we had three ever repeating topics, two of them 
> non-surprising:
> - what's the difference to MyS***
> - what about win32 native
> - what about Oracle portability.

That about covers the important stuff.  Some more for the "other" bucket
(although they all came repeatedly):

 - so how do I pronounce "Postgre"?
 - will it support my performance requirements?
 - are you a company?  Can you tell me someone who is?
 - have a job for me?
 - do you have drivers for Kylix?
 - why don't you support ?
 - what client GUI programming environment do you offer?

On the "Postgre" point, I remarked to some friendly people (who are
developing a content management system based on postgres, by the way)
that we ought to have something like "just call me Postgres" posters in
our booth.  It turned out they had the gear to cut stickers in letter
shapes, so a little while later we actually had those words plastered
over our booth walls.  I think we got most interested passers-by before
they had a chance to read it, though.

On the last points I eventually learned to stop answering and shoot back
the question instead: "what, doesn't yours support ODBC?"

In particular, X.org's Leon Shiman felt that we Postgres people should be
especially interested in their work on X.  I didn't even see what he was
getting at until he mentioned GUI builders.  Again, I told him that my
personal conviction is that those should be database-agnostic and the very
idea that these should be bundled with database servers is a by-product of
the need to sell proprietary database licenses, and that any good free GUI
builder should build on GUI toolkits rather than on raw X, etc.

But like I said, that's just my personal conviction.  I definitely think
people in our community ought to be willing to work together with the
MySQL people, the FireBird people and anybody else in the free world to
have world-class GUI development tools; it should be a rising tide that
raises all boats.  If anyone feels differently, I did make it perfectly
clear that I wasn't speaking for anyone.

Of course one area where we should care about X, but I completely forgot
to mention this to Leon, is that modern graphics hardware can be used to
speed up database engines.  Hardware detection of collisions or overlaps,
for instance, has been shown to be a viciously effective filter for
spatial joins in GIS databases.  But that's another story!

Jeroen

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] PREPARE and transactions

2004-07-03 Thread Jeroen T. Vermeulen

On Sat, Jul 03, 2004 at 08:20:17AM +0530, Abhijit Menon-Sen wrote:

> But for what it's worth, I strongly dislike the later proposal of making
> prepared statements anonymous, and pattern matching the statement text,
> especially if they reintroduce the need to quote query parameters.

Only in cases where you aren't sure your prepared statement name will be
visible.  And even in those cases, wouldn't PQexecParams() do the job?
That'd be like PQexecPrepared() except the query text becomes a kind of
replacement for the query name.

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-02 Thread Jeroen T. Vermeulen

On Sat, Jul 03, 2004 at 12:16:50PM +1200, Oliver Jowett wrote:

> I stand by my original statement: making no change does not break 
> compatibility. Please provide an example of PREPARE/EXECUTE use that 
> works under 7.3/7.4 but does not work with current 7.5.

Whether the transaction from "7.3/7.4" to "7.5" (as it stands) breaks
compatibility was never at issue here.  There would be no point: this break
is _between_ 7.3 and 7.4.  Middleware can no longer assume that rolling
back will get it out of the changes it was making at the SQL level, unless
it is in exclusive control of prepared statements.

> Parse/Bind/Execute interact with PREPARE/EXECUTE -- they share a 
> namespace. Quirky as the current behaviour is, it'd be even quirkier if 
> PREPARE/EXECUTE had substantially different semantics to Parse/Bind/Execute.
> 
> Please do read the V3 protocol spec: 
> http://developer.postgresql.org/docs/postgres/protocol.html

Ah, now I see.  This is why some have proposed to change the SQL behaviour
(which is what I'm concerned with) but not the protocol.  That seemed
strange to me before, but it makes more sense now.

I guess the question then becomes: should we see the frontend-backend
protocol as a transport layer underneath, and conceptually separate from,
the SQL session?  Or should the protocol be allowed to shine through in
the way SQL itself is supported, and vice versa, so the two can share code
and concepts?

My point of view in this is the layered one, i.e. I'd like to be able to
"speak SQL" to the server, as I'm doing now through libpq, and ignore how
it gets there--at least as far as the SQL itself is concerned.  Call it a
2.0 vantage point.  From that angle there are no problems with giving the
protocol's bound statements and SQL's prepared statements both different
namespaces and different behaviour.

You seem to be taking the other view, where now that we have the extended
query protocol, it makes no sense to have one behaviour for the protocol
and another for SQL--and since the protocol implementation is not subject
to transactions (except that portals live in transactions?) there is no
point in going with transactional behaviour for a mechanism shared between
both.

Does that more or less describe the underlying controversy?

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-02 Thread Jeroen T. Vermeulen

On Sat, Jul 03, 2004 at 11:17:18AM +1200, Oliver Jowett wrote:

> >So many things are true.  But _not_ doing so also breaks compatibility,
> >so like I said, there are counterexamples.
> 
> This is nonsense. Not changing the current behaviour cannot break 
> compatibility, almost by definition.

Almost.  But prepared statements can break compatibility with code not
aware of their existence yet--and in some cases, this does not happen if
they behave transactionally.  It may not be a big deal, but I'm not
convinced that the effort of supporting rollbacks in middleware is such
a big waste of time either.

> Please do take a look. The V3 protocol treats the set of named 
> statements as part of the connection state, not as anything at the SQL 
> statement level. There are also named portals to deal with if your issue 
> is that things shouldn't be named.

But neither of these pose as SQL statements.  It's the SQL session that
I'm really worried about.

> The client has query-lifetime and query-reuse information that the 
> server does not have and can't obtain via simple query matching. Also, 
> clients need per-query control over use of PREPARE: prepared queries can 
> run slower as they must use a more general query plan. I don't see how 
> you overcome either of these if the server hides the mechanics of which 
> query plans are preserved.

The converse is also true: a dynamic scheme may do better than a static
one.  This often happens.  We may even want to detect reusability on the
fly; that could be based on the same mechanism.  And there's that idea of
sharing plans between backends that also comes into play.

> You could implement the pattern-matching logic as a passthrough layer in 
> front of the server -- perhaps in something like pgpool? -- and 
> translate to PREPARE based on patterns. Then your application can remain 
> unaware of the translation to PREPARE for the most part, the only issue 
> being name collision which in practice is simple to work around. But I 
> don't see why you want this in the main server at all -- it's really a 
> bandaid for applications that don't want to precisely control the 
> prepared-statement behaviour themselves.

Don't want to, or perhaps can't.  It may be hard for the application to
deallocate a statement, for instance, because the transaction failed
before it got to the DEALLOCATE and the middleware doesn't make it easy to
go back and fix that.

Jeroen

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-02 Thread Jeroen T. Vermeulen

On Fri, Jul 02, 2004 at 09:51:38PM -, Greg Sabino Mullane wrote:

> Specifically, your proposal (if I read it correctly) would require me
> to send in the full SQL statement every time, thus negating a major

Well, like I said, it wouldn't actually _require_ this; it would just
allow transactional semantics for the current, explicit syntax without
the actual optimization being rolled back.

A logical consequence of the implementation, I think, would be that
you could re-PREPARE a statement when in doubt, without paying the
parsing and planning cost again.

> benefit of prepared statements in that I do not have to quote the
> values myself, but can let the server handle it. It would also be a lot
> of work for the server to parse the statement and find a "match".
> Maybe enough work to make the "solution" worse than the cure.

The algorithm as I've got it worked out so far doesn't look very costly,
unless you have lots and lots of highly similar prepared statements, or
queries so long that their sheer text size becomes the problem.  There's
no actual extra parsing involved, as far as I can see, just pattern
matching and the extraction of the variables.

As you say, however, quoting of parameters remains...

> Finally, I am still not convinced there is a problem at all. While the
> current behavior is a bit quirky, I think it does work out overall.
> A prepared statement will throw an error on two conditions:
>  
> 1) The name is already in use.

Presumably you're taking about a prepare statement here, not a prepared
statement.

> This is trivial to solve by the client, so it should never arise.
> (with the caveat that middleware should clearly document its naming
> scheme, and request and/or enforce that the client not creating
> conflicting names). So we really should remove this from the debate.

In other words, it's only trivial to solve by the client if there is no
possible conflict over where and how the statement got prepared.  You
say it's trivial, but you're also assuming that the work to ensure this
has already been done.  Most problems become trivial that way!

> 2) The statement is invalid.
> This one is very common, and it /should/ throw an error. With the fact
> that statements cannot be shared across connections, and the handy
> PQtransactionStatus function, the client should easily be able to
> handle any situation that arises smoothly.

Of course you shouldn't forget this one:

3) The PREPARE is executed inside a transaction that has already run
into an error.

> Having the statement exist even on rollback is a little harder to
> accept, but I have yet to see a better alternative. Prepare statements
> already "break the rules" anyway, by the fact that a commit() after a
> prepare does not make them available to any other people viewing
> the database.

But that's no different with session variables.  They're transactional,
and they affect session state only.  There is a place reserved for session
state as well as transaction state and database state.  No news there.  What
bugs me is that PREPARE introduces another kind of session state into the
mix, one that doesn't allow you to get out of a fix by aborting the
transaction and continuing the session.

Jeroen

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Nested Transactions, Abort All

2004-07-02 Thread Jeroen T. Vermeulen

On Fri, Jul 02, 2004 at 05:30:49PM -0400, Alvaro Herrera wrote:
 
> You can't have subtransactions inside an implicit transaction block, so

Haven't been following this thread closely, but just my 2 cents...

If you collate queries using the semicolon, AFAIK the whole thing is
executed as a single implicit transaction (if not in an explicit one
already, of course).  So is there anyone stopping a user from executing

BEGIN ; UPDATE ... ; COMMIT

?


Jeroen


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-02 Thread Jeroen T. Vermeulen

On Fri, Jul 02, 2004 at 11:18:44AM -0400, Merlin Moncure wrote:
 
> Yes, it would.  It would actually make my life a lot easier
> (notwithstanding my minor gripe with ExecParams) because I would no
> longer have to deal with all the complexities surrounding prepared
> statements.  This is A Good Thing, because the optimization is generic
> and thus will benefit a lot more people.  
 
Okay, off I go into the source code then.  :-)

(Which is not to say "I'll have it ready next week" but I can get an idea
of how hard it would be)


> It is also more a more complex optimization model; and I think you would
> have to justifiably prove that it is in the same performance league as
> the current prepared statement model.  Also the current prepared

True, there is a cost.  OTOH I think the added advantages could pay off
as well, e.g. sharing plans between backends (if there's not too much
locking.


> statement behavior should be retained for 7.5 and perhaps deprecated if
> your viewpoint wins out (and it should).

I guess so.  My timing isn't exactly impeccable...


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-02 Thread Jeroen T. Vermeulen

On Fri, Jul 02, 2004 at 08:51:05AM -0400, Merlin Moncure wrote:

> Right now, I'm transitioning to ExexPrepared to skip the string escaping
> step on the client side.  I would hate to lose that ability.  ExecParams
> is a little more work to set up (doable, though).

OTOH, if you're taking client code queries and replacing them with
EXECUTEs (which I sort of gathered some people were doing), all it should
really do is move the matching activity to the server...

> > Am I making things too simple here?
> 
> Unfortunately, I think the answer is yes.  Being able to roll back
> prepared statements would exclude them from a narrow but important class
> of use.  

Wait, wait, wait!  I'm talking about the pattern-matching proposal there.
The proposal means that prepared statement introduction would _not_ have
to be rolled back; only the _names_ would obey different rules.  In effect
it would separate the names from the plans, and make the plans do what you
want and more.  The names would be effectively obsolete.

> With the introduction of nested x into the code base, your original
> problem can be guarded against (albeit somewhat clumsily) by careful use
> of wrapping PREPARE/DEALLOCATE.  If you switch things up, every EXECUTE
> has to be wrapped to guarantee safety in a transaction agnostic
> scenario.

But why should anyone insist on a "transaction agnostic scenario" yet 
play with prepared statements?  That seems backwards to me.  Transactions
are a correctness issue, which you think about first.  Prepared statements
are optimization, which though important is undoubtedly a lesser concern.
If you're deep enough into the SQL stream to analyze the client's demands
and prepare statements based on it, just knowing what happens to bracketing
(unlike prepared statements, you don't even need to control that) should be
relatively easy.

In any case, AFAICS my proposal does away with the need to wrap anything,
or keep track of transactions and/or prepared statements.  Would it work
for you?

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-02 Thread Jeroen T. Vermeulen

On Fri, Jul 02, 2004 at 12:30:07PM +1200, Oliver Jowett wrote:

> >If it's that important, come up with a generic "session-not-transaction"
> >syntax to temporarily escape bracketing.
> 
> Do you have a proposal for this? It seems to me that if your argument is 
> that "if you want the old behaviour, you could add this extension" means 
> that you need to provide the extension before changing the behaviour. 
> Otherwise you're just exchanging one set of disgruntled users for another.

I was thinking of some "escape" keyword to say "I don't want this to be
part of the current transaction; execute it as if directly in the
surrounding session context."  That would at least make it obvious what's
going on.  Wouldn't take any extra semantic work, since one could just
restrict the statements that can go with this syntax--to just PREPARE and
DEALLOCATE for now.  I guess it should also escape auto-abort of
transactions, which has the advantage that you can still DEALLOCATE inside
a transaction.

> >Just don't tell me that making
> >PREPARE respect rollbacks would break compatibility
> 
> Why not? It's true.

So many things are true.  But _not_ doing so also breaks compatibility,
so like I said, there are counterexamples.

> >I should add here that session variables used to escape transaction
> >bracketing as well--but that was fixed some time ago.  Why are session
> >variables so different from prepared statements?
> 
> Prepared queries do not change the semantics of queries. Some session 
> variables do (e.g. DateStyle).

Prepared queries _do_ change semantics of subsequent EXECUTE and PREPARE 
statements--cause them to fail where they succeeded before; cause them
to succeed where they failed before; or make an EXECUTE do something not
necessarily related to what it did before.  It just doesn't affect you
if you're totally aware of prepared statements throughout your path to
the backend.

I'm coming from the other side, where transactions are a primary semantic
concern but not everybody in the stack is aware of prepared statements,
or at least not in a coordinated way.  In any case, I think the
pattern-matching approach I proposed earlier solves the problem for all
of us, and gives us more besides.  Any comments on that?

> It's actually quite annoying -- speaking from the client side -- for the 
> state of some of the session variables to be affected by transaction 
> boundaries, namely those that are really to do with the protocol-level 
> stream's state, e.g. client_encoding. You really don't want those 
> changing back on you, and you want the ability to change them at 
> arbitrary points in the connection -- even if you're in a failed 
> transaction.

I'm not entirely sure.  What if setting a session variable triggers a
failure?  I'm thinking along the lines of "oops, this data isn't valid
utf-8 after all; never mind that change in encoding."  If you don't want
to have transactional behaviour, back we go to an escape syntax like I
suggested.  But I think transactionality ought to be the default like it
is for everything else; ACID and correctness are a lot more important than
a modest amount of convenience in middleware code.  And last but not least,
it's much easier to reason about a cleanly transactional system than about
one with holes and exceptions in it.  That last point is much more
important to me than programming convenience.  Remember that I too have had
to program around rollbacks to support session variables properly; I'll
gladly pay the price for a pure transactional model.

> Also: what about the V3 protocol's support for named statements?

Haven't looked at them (I work on top of libpq), but you make it sound
like the whole PREPARE mess is just a special case of a problem with named
statements.  Like I said before, the prepared-statements problem wouldn't
be here if only prepared statements were anonymous, or at least their names
weren't used in invocation.  It's not the prepared plan that breaks
transactionality; it's the name.

So again, I'd like to hear any comments about my pattern-matching
proposal.  Is there any problem here that it would not solve?  Does anyone
see a problem in implementing them?

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Survey: "Motivation of Free/Open Source Software (F/OSS) Developers"

2004-07-01 Thread Jeroen T. Vermeulen

On Mon, Jun 28, 2004 at 10:50:53PM +0200, Marc R?ttig wrote:
> Survey: "Motivation of Free/Open Source Software (F/OSS) Developers"

Some remarks:

 - Although the MIME header doesn't say it, the document is encoded in a
   Windows-specific encoding.  This is screwing up the apostrophes (')!

 - Q3 ("should all software be Free?") doesn't distinguish between
   software that's distributed and software for internal use.  Thus the
   question can be taken to mean "should all software be made available
   to the public, and under a free or open-source license?" but also, at a
   stretch, "are free licenses the only ethical way to distribute
   software?"  The two are very different, yet the difference does not
   come up anywhere in the answers... Q6 does phrase it very carefully,
   but also doesn't provide differentiation in the answers.  Neither does
   Q15: is internal-use software also counted as "should be F/OSS," i.e.
   does this refer to all software a company produces or only to the
   software it sells/publishes?

 - Maybe Q5 should allow multiple options.  I'd be willing to pay for
   software, but it depends on the price _and_ on the license (or as you
   put it, "only if I get the source.")

 - The phrasing in Q10 is a bit awkward.  For instance, "I only feel joy
   when coding" can mean either (a) "I feel nothing but joy while coding"
   or (b) "I get my only joy in life from coding."

 - Also in Q10, I think "pensum" should be "quota" in English.

 - Q18 ("if you build your own company on your software, how would you set
   your prices?") seems to assume that the only way to make money from
   software is to sell the software itself.  Perhaps you should make clear
   whether the pricing question involves only the software or also any
   services etc. related to the software.  The answers you get may be very
   different, e.g. because it's pointless to charge high prices for freely
   available software.  Yet e.g. developing the software or providing
   consultancy about it is a different matter.

 - Q19 unfortunately is a bit vague when it comes to how software may be
   "sold."  Take SCO for an example: what exactly did they buy when they
   "acquired Novell's Unix business"?  In the case of a BSD-licensed project
   like PostgreSQL of course, a company could already package, modify and
   sell the product without buying anything.  I would interpret "sell all
   the software" differently for a BSD-licensed project than I would for a
   GPL'ed project.  And it might be different again for an Apache-licensed 
   project.  In any of those cases, "selling" the software wouldn't mean
   that you'd lose your own rights to use, maintain, extend, or distribute
   the code.


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-01 Thread Jeroen T. Vermeulen

On Thu, Jul 01, 2004 at 04:06:06PM -0400, Merlin Moncure wrote:

> The big picture here is that with the current behavior, it is possible
> to keep track of existence of prepared statements without wrapping or
> even being aware of transaction activity.  This is tremendously useful
> for handling them in a generic way because transactions and prepared
> statements are handled on different levels of my (and others')
> technology stack.  If you change that behavior, that is no longer
> possible, period, and I will be forced to stop using them.

But then how can you be sure that other layers in your stack won't try to
re-establish a broken connection, multiplex sessions, pool connections,
parallellize transactions or prepare their own statements?

> On the conceptual side of things, I submit that the creation and
> deletion of prepared statements are much closer to session variables
> than to normal imperative SQL (that normally has a transactional
> lifetime).  They are an optimization hack (albeit a very handy one) much
> the same as turning off sequential scans.  I tend to think of them more
> as macros that I can create and destroy (and in implementation, that is
> really close to what they are).

Here lies the real problem: if only they were _designed_ as an optimization
hack, things would be fine.  Just make them anonymous and let the server
pattern-match them; no need to change your code to keep it working.

In fact...

What if we allowed the backend to match queries to the patterns of
prepared statements without actually naming them?  You'd then have two
ways of executing a prepared query:

1. EXECUTE, as we do now, which involves naming the statement and that's
where the trouble begins.  So probably best to avoid this approach, unless
you PREPARE and EXECUTE in the same transaction and so know exactly what's
going on.

2. Issue your query as normal; the backend will find that it matches a
prepared query (regardless of what it's called) and use the plan stored
there.  Optimizing a program to use prepared queries is a mere matter of
adding the PREPARE, and no other program statements need to be changed. 
The worst that can happen is that you accidentally lose an opportunity
to apply the prepared plan somewhere.

The great benefit of 2 would be that the server can cache & reuse prepared
statements across transactions, _without transferring any semantic state_
in these strange and exotic ways.  In other words, no matter what happens to
the prepared statement, your code will still run.  And probably still
benefit from the optimization; you get the freedom to tweak that further as
you like, but it needn't affect existing code.  Middleware writers won't
even need to parse SQL statements to replace them with EXECUTEs anymore.

Other benefits we could eventually get out of this would be dynamic
garbage collection; reuse across sessions; dynamic, server-side choice
between multiple overlapping ("specialized") prepared statements; quietly
accepted redefinitions that make syntactic sense.

Of course this would take some work, both in developer time and execution
time, but it would allow us to have sensible transaction semantics on the
one hand, and optimization (and simplified at that!) on the other.

Am I making things too simple here?

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-07-01 Thread Jeroen T. Vermeulen

Sorry for the delay, life tends to get complicated if you leave it
alone for a few days...

I see how making PREPARE obey rollbacks would be inconvenient for some
existing code, but frankly I'm getting a bit worried about the "why should
I care whether what I do is committed or not?" argument.  I guess one could
say that about lots of statements: "I don't really want this to be subject
to the transaction but it happens convenient for me to write it inside the
transaction, and then I have this problem that it's affected by rollbacks."
If it's that important, come up with a generic "session-not-transaction"
syntax to temporarily escape bracketing.

I'll wave my hands a bit here and use the term "middleware" for drivers,
front-end libraries, proxies, in-application abstraction layers, anything
that sits between the business logic (is that term still fashionable?)
firing queries and the backend handling them.

So far, as I believe Tom has pointed out, I've assumed regular code that
knows whether it's in a transaction, or where transactions begin and end, 
or what else the program is doing with the same session.  The arguments for
the nontransactional behaviour have been about code where this isn't the
case, but that does have exclusive control of prepared statements.

Frankly I think that's a strange point of view, since transactions are one
of the pillars of database management and elementary to statement semantics,
whereas prepared statements are a recently added optimization feature.

In this message I'll give examples of how the current behaviour may affect
other middleware that doesn't use prepared statements, but may have to deal
with applications that do.

So let's assume for a change that the middleware has at least some
knowledge or perhaps even control over transactionality, but doesn't know
about prepared statements (perhaps because it predates 7.4):

(*) The middleware may deal with transient errors, such as some deadlocks,
by rerunning the transaction.  Now if a PREPARE (and no matching DEALLOCATE)
preceded the point of failure, the retry would break under the
nontransactional semantics--just because the PREPARE escaped the rollback.

You can work around this by silently accepting redefinitions of prepared
statements, but a redefinition may also be a true honest bug.  Only
accepting identical redefinitions will not work, because a redefinition
during retry may have a slightly different body than the original definition
during the first attempt.  Another workaround as we've seen is to re-PREPARE
in a nested transaction, which besides offsetting any performance gain may
mean that you're blithely executing the wrong version of the statement!

(*) Transaction code may start out with a PREPARE statement, use the
prepared statement a few times, and then DEALLOCATE it at the very end.
This seems a clean way to work, and an attempt not to affect session state.
But now, paradoxically, there will be unintended residue of the transaction
only if the transaction _fails_.  This may break a future instance of the
transaction, which may be why the DEALLOCATE was there in the first place.

(*) Middleware may support some form of connection pooling, or other
reuse of connections.  The only difference between a reused connection
and a fresh one to the same database that I can think of right now are (i)
session variables and (ii) prepared statements.  So based on the 7.3 state
of things, the middleware might assume that a connection was reusable if (a)
no statements affecting session variables were issued; (b) any changes in
session variables are OK; or (c) no transaction affecting session variables
was committed.

Prepared statements can break each of these options--most notably, they
would break the even more diligent assumption that a session is clean and
unspoiled as long as any transactions (whether implicit or explicit) entered
inside it have been rolled back.

(*) Middleware may want to restore broken connections.  It would have to
restore any prepared statements from the old connection (quite possibly
unnecessarily) in addition to session variables.

Worse, the semantics for the two kinds of session state are different!
Middleware that tries to maintain session state but doesn't keep track of
rollbacks is really already broken when it comes to session variables.
The fix for that and a change to transactional PREPARE require the exact
same mechanism.


Now, two more issues for middleware that does prepare statements:

(*) What if preparing a statement fails?  Could be that you've just broken
the transaction at a point where the application didn't expect it, or in a
way it didn't expect.

(*) What if your prepared statement interferes with one prepared by the
application?

Sure, all of these could be worked around; there's a lot of "don't do that
then" in there--which IMHO cuts both ways.  And of course some of these
are simply real-life examples that I need to try and deal with elegantly if
behaviour is to st

Re: [HACKERS] Default libpq service

2004-06-29 Thread Jeroen T. Vermeulen

On Tue, Jun 29, 2004 at 09:46:34PM +0200, Peter Eisentraut wrote:
> A while ago it was speculated that it might be nice to have a default 
> service in libpq's pg_service.conf file that would supply missing 
> connection parameters if none are specified elsewhere, so users could, 
> say, set the default server host in a configuration file instead of 
> environment variables.
> 
> The precendence would be like this: explicit specification, service (if 
> explicitly specified), environment, (new: ) default service, 
> compiled-in default (e.g., for port).  (Or maybe the default service 
> before the environment?)
> 
> Comments?  Better ideas?

Only problem that springs to mind is that you probably wouldn't want a
default password in a world-readable /etc/ file, but you can't stop a
determined fool anyway.

I know it would make my life a little easier, so yeah, go for it.  This
was proposed for libpqxx a long time ago, but I felt it belonged more at
the libpq level.


Jeroen

PS - Nette Witmung, danke  :)


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] PREPARE and transactions

2004-06-25 Thread Jeroen T. Vermeulen

On Fri, Jun 25, 2004 at 10:02:29AM -0400, Tom Lane wrote:

> It occurs to me that a lot of the problem would go away if we allowed
> DEALLOCATE of a nonexistent statement to not be an error (seems like
> a NOTICE would be be plenty).  Then you could just unconditionally
> DEALLOCATE anything you were about to PREPARE, if you weren't entirely
> sure whether it already existed.

That would be an improvement anyway, I think--especially if the backend
could keep deallocated plans around a little longer in case they got
re-prepared soon after.  That way the client can ensure not only that the
statement doesn't exist, but also that it _does_ exist, without incurring
prohibitive cost.  And without going through an "if" construct too.

OTOH the problem then remains that we've got semantically significant work
escaping from transactions, but in all other ways being presented as a
regular bracketed operation.  To me it's a bit like a C function returning
a pointer to one of its own stack variables!

Jeroen

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] PREPARE and transactions

2004-06-25 Thread Jeroen T. Vermeulen

On Thu, Jun 24, 2004 at 04:19:36PM -0400, Merlin Moncure wrote:

> I am using PostgreSQL as a backend for legacy COBOL applications and
> have written a driver which maps the COBOL I/O statements to SQL
> statements.  To save a little bit on parsing time and for various other
> reasons these SQL statements are handled as prepared queries.  Each
> COBOL file has a corresponding SQL table in the database and each table
> can have up to 7 prepared statements that the application creates when
> it needs them.  Unless I am misunderstanding things, if you change the
> prepared statement's lifetime, I am forced to prepare a bunch of
> statements all at once instead of when they are needed.  I am prepared
> to do this, however (pun intended).

Sounds like a challenging (is that word still fashionable?) job.  Now if
only you weren't dealing with legacy applications, this would be a natural
for stored procedures I guess.  Well okay, maybe you could define stored
procedures on demand but then who would clear them up afterwards...

What if prepared statement semantics were modeled after those of session
variables?  You can change a session variable from within a transaction
and keep the change afterwards, but it does go by transaction rules.

What that leaves us is something that works _almost_ the way things work
now, so your code would work unchanged in the normal case.  The difference
would be that PREPARE would roll back like any other statement.

Taking this one step further, if compatibility with current semantics is
important, I could imagine adding a hack-I admit it's not pretty--that
keeps the statement allocated despite a rollback, but sets a "soft" bit.  
The backend could silently accept identical redefinitions of prepared
statements that have the "soft" bit set.

I think that would be the most compatible way, and probably the easiest 
as well, to turn PREPARE into a regular statement:

1. Add a "soft" bit to prepared-statement plans
2. Add rollback bookkeeping for prepared statements, which sets the bit
3. Accept identical re-preparations of "soft" statements, clearing the bit

Deallocation, lookup, execution etc. would not need to change.  There would
still be the risk of "leaking" prepared statements, but that is a problem
of the current semantics anyway.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [Re] Re: [HACKERS] PREPARE and transactions

2004-06-25 Thread Jeroen T. Vermeulen

On Fri, Jun 25, 2004 at 02:00:12AM -, Greg Sabino Mullane wrote:

> I was originally unhappy with the current situation, but now I think
> it is the best. Any changes will also cause a huge further headache
> for driver/application writers, as we already have a released version
> (and probably at least one more) with the current behavior. I'd be

Granted, that's probably going to force the issue.  I do wonder though:
one of the arguments in favour of the current semantics is that the
problems can be worked around using nested transactions.  Then what were
people doing before nested transactions, in Tom's scenario where the
programmer doesn't know where transactions begin?

There was also the middleware argument--some intermediate software layer
may be in control of bracketing.  But in such cases, can you even rely
on two independent transactions executing in the same session?  You'd
need to assume that to make the current semantics work in that situation.
What if the middleware does connection pooling, or restores a broken
connection between two transactions?  The latter might happen because of
a timed-out firewall, for instance, when there is a long pause between
two unrelated transactions.

Besides, just the fact that current semantics are completely "out-of-band"
relative to bracketing, I guess it really ought to be any middleware's
responsibility to manage prepared statements.  If the application isn't in
control of transactionality, it seems a little iffy to have it fire off
statements that don't affect database state but can make or break future
transactions.

As for the case where statements are prepared on demand when they are
first executed, wouldn't that be better done in the backend?  It would
save the application this trouble of keeping track of which statements
have been prepared.

Perhaps the real problem is in the SQL syntax...  Imagine a syntax that
doesn't assign a name to a prepared statement, just defines an anonymous
pattern to plan for.  The backend would match against the pattern on the
fly, so introducing prepared statements in a program would involve no
changes apart from the PREPAREs.  Implementations could ignore them if
they cached plans dynamically anyway; they could implement dynamic and
more effective replacement policies for prepared statements, and share
plans between connections.

Jeroen

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] PREPARE and transactions

2004-06-24 Thread Jeroen T. Vermeulen

On Thu, Jun 24, 2004 at 08:51:32AM -0400, Merlin Moncure wrote:

> > When I say "within a transaction" as opposed to outside a transaction,
> I
> > mean of course an explicit transaction.  If you want a prepared
> statement
> > to last throughout the session, I'd say it stands to reason that you
> > create it outside a transaction--in unfettered session context, so to
> > speak.  I can't see how that would be either less intuitive or harder
> to
> > program in the client.
> 
> I disagree.  Lots of people use prepared statements for all kinds of
> different reasons.  A large percentage of them do not need or make use
> of explicit transactions.  Having to continually rebuild the statement
> would be a hassle.  The caching mechanism also seems like extra work for

I think we're talking at cross purposes here...  If the client doesn't use
explicit transactions, as you say is common, then you're obviously not
defining prepared statements inside explicit transactions either.  And so
you're certainly not going to be bothered by what happens at the end of a
transaction!  In that case, what I'm saying doesn't affect you at all, in
any way.

But let's look at the case where you do use explicit transactions, which
is what we're talking about.  I think there should be a difference between

(1) BEGIN
PREPARE foo AS ...
...
COMMIT/ABORT

(2) PREPARE foo AS ...
BEGIN
...
COMMIT/ABORT

There currently is no difference.  Even if you abort the transaction, you
will still have that prepared statement (which may require manual cleaning
up), unless you aborted because of an error which occurred inside the
transaction and before or during the PREPARE, in which case trying to
clean up the statement would be an error.  You can try to clean up the
prepared statement inside the transaction, but it would not work if
there were an error or abort between the PREPARE and the DEALLOCATE.

That sounds messy to me.

What I propose is simply that these two behave as follows:

(1) PREPARE foo AS ...
BEGIN
...
COMMIT/ABORT

In this case, foo is defined for the duration of the session *just like
current behaviour.*  The presence of the transaction isn't relevant here
at all; it's only there for comparison.  Commit or abort of the
transaction doesn't affect foo, because foo has been defined outside
the transaction in "unfettered session context," for want of a better
term.

Presumably you're going to use foo in several transactions, and/or in
several statements that are not in any explicit transaction.  Unless you
deallocate explicitly, foo will be there as long as you stay connected,
just like you're used to.

(2) BEGIN
PREPARE foo AS ...
...
COMMIT/ABORT

Here, the PREPARE is inside the transaction so at the very least, you'd
expect its effect to be undone if the transaction aborts.  I would go
further and say "if you wanted foo to persist, you would have prepared
it before going into the transaction" but that's a design choice.  
Deallocating at commit/abort would have the advantage that you always
know whether foo exists regardless of the transaction's outcome: if
defined inside the transaction, it lives and dies with the transaction.
If defined merely in the session (i.e. not in any transaction), it lives
and dies with the session.

So you use this second form when you don't intend to reuse this statement
after the transaction.  If you do, OTOH, you use the first form.  It
also means that you don't "leak" prepared statement plans if you forget
to deallocate them--remember that the prepared statement may be generated
on-the-fly based on client-side program variables.

> little result (to be fair, I like the idea of multiple backends being
> able to make use of the same plan).  Generic routines can just always
> wrap the prepare statement in a subtransaction, which now allows safety
> until such time that a create or replace version becomes available,

The nested-transaction version allows you to add code to deal with the
uncertainty that I'm proposing to remove.  In the current situation, it's
annoyingly hard to figure out whether the prepared statement exists so you
redefine it "just in case," going through a needless subtransaction abort
or commit.  That's the nested-transaction solution you appear to favour;
but AFAICS _this_ is the approach where you have to "continually rebuild
the statement."  With my version, you don't need to go through all that
because you're allowed to _know_ whether the statement exists or not.

I don't even think the nested-transaction approach helps with anything:
if you want to re-prepare foo for continued use in the rest of the session
just in case it wasn't around anymore (and ignore the likely error for
the redefinition), you might as well do so before you go into your
transaction in the first place.  No nested transactions needed.

Jeroen

---(end of broadcas

Re: [HACKERS] PREPARE and transactions

2004-06-24 Thread Jeroen T. Vermeulen

On Wed, Jun 23, 2004 at 03:26:49PM -0400, Tom Lane wrote:

> > Even if the spec doesn't help, I think a statement prepared within a
> > transaction should definitely be deallocated at the end of the transaction.
> 
> Uh, you do realize that Postgres does *everything* within a transaction?

Well, except prepared statements apparently; I'm not sure why they are an
exception.

When I say "within a transaction" as opposed to outside a transaction, I
mean of course an explicit transaction.  If you want a prepared statement
to last throughout the session, I'd say it stands to reason that you
create it outside a transaction--in unfettered session context, so to
speak.  I can't see how that would be either less intuitive or harder to
program in the client.

Maybe it would help to think of some precedents.  Are there any actions
where it makes sense to disobey rollbacks?  Counters spring to mind, but
I guess that's a technical necessity rather than an interface choice.
Session variables may be another one, but IIRC they become strictly
bracketed (when set inside a transaction, naturally) around 7.3.  What
else?

> The above proposal would render prepared statements practically useless.

Could you elaborate?  Wouldn't it be possible to cache the plans across
transactions like I suggested, reusing the old plan if the statement is
re-prepared with the same definition?  Or are you saying it's possible,
but wouldn't be helpful?

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] PREPARE and transactions

2004-06-23 Thread Jeroen T. Vermeulen

We were discussing prepared statement support for libpqxx just now (Bruce,
Peter Eisentraut & myself are manning the postgres booth at LinuxTag 2004
in Karlsruhe, Germany), when we ran into a problem that came up two months
ago.  That discussion follows:

Post by Alvaro Herrera:
> Hackers,
> 
> Is this expected?  If so, why?  I'd expect the prepared stmt to be
> deallocated.
> 
> alvherre=# begin;
> BEGIN
> alvherre=# prepare tres as select 3;
> PREPARE
> alvherre=# rollback;
> ROLLBACK
> alvherre=# execute tres;
> ?column? 
> --
> 3
> (1 fila)

Followup by Tom Lane:
> prepare.c probably should have provisions for rolling back its state to
> the start of a failed transaction ... but it doesn't.
> 
> Before jumping into doing that, though, I'd want to have some
> discussions about the implications for the V3 protocol's notion of
> prepared statements.  The protocol spec does not say anything that
> would suggest that prepared statements are lost on transaction rollback,
> and offhand it seems like they shouldn't be because the protocol is
> lower-level than transactions.

Now, here's a scenario that has us worried:

BEGIN
  PREPARE foo AS ...
  ...   [error]
  DEALLOCATE foo[fails: already aborted by previous error]
ABORT
BEGIN
  PREPARE foo AS ...[fails: foo is already defined!]
  EXECUTE foo   [fails: already aborted by previous error]
COMMIT  [fails: already aborted by previous error]
 
You could say that the DEALLOCATE in the first transaction should have
been outside the transaction, i.e. after the ABORT.  But that would mean
that the client is expected to roll back, manually, individual changes
made in an aborted transaction.  If that's what we expect from the client,
what's the point in having transactions in the first place?

Lots of variations of the scenario spring to mind.  Imagine the second
transaction were not a transaction at all: the second PREPARE would fail,
and the EXECUTE may go execute the wrong statement.

A partial fix would be to allow identical redefinitions of a prepared
statement, optionally with reference counting to determine when it should
be deallocated.  But instances of the same transaction may want to include
a pseudo-constant in the fixed part of the query text that changes between
instances of the transaction.

Even if the spec doesn't help, I think a statement prepared within a
transaction should definitely be deallocated at the end of the transaction.
If it turns out that this wastes a lot of opportunities for reuse, the
prepared plans can always be cached across definitions.


Jeroen


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] New COPY commands in libpq

2004-05-01 Thread Jeroen T. Vermeulen

On Sat, May 01, 2004 at 02:25:01AM -0700, Tony Reina wrote:
> > You might try porting your code to libpqxx, which is C++-native and should
> > make large swathes of this sort of code unnecessary.

> I've seriously considered it (along with the npgsql library), but am
> not really sure as to what the advantage(s) would be considering that
> (with the corrections suggested) my code works now with libpq.

It depends.  It takes a lot of debugging out of your hands because the
boiler-plate stuff like error checking goes into the library, rather than
being rewritten all over the place in the applications.  Of course if
your current code works just fine, there's no reason to change anything
so drastic.

> Would there be any improvement in speed with libpqxx? Does libpqxx
> make use of the SSL encryption? How hard is it to link to the SSL
> connections for Postgres commands?

Don't expect any speed improvements per se; libpqxx is a layer on top of
libpq.  OTOH some performance features like COPY and limited non-blocking
access (no support for select()ing on multiple file descriptors yet) become
much more easily accessible.

Frankly I don't recall ATM just how libpq deals with SSL.  Don't have the
opportunity to look it up just now.

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] New COPY commands in libpq

2004-04-30 Thread Jeroen T. Vermeulen

On Fri, Apr 30, 2004 at 06:12:35AM -0700, Tony Reina wrote:

> CString cmd, msg;

> cmd.Format("1\t\2\t{3,4,5}\n");
> * PQputCopyData(conn, cmd, sizeof(cmd));
> cmd.Format("\\.\n");
> *   PQputCopyData(conn, cmd, sizeof(cmd));
> *   PQputCopyEnd(conn, msg);

> Old C++ code works, new stuff doesn't. Only line that have changed are
> *'d.

I'm not surprised.  CString is clearly a class, and cmd is an object of
that class.  Apparently that class has an implicit conversion operator to
char * (which is a striking example of a Bad Idea--kids, don't try this
at home!) but nonetheless, sizeof() should still give you the size of the
object, *not* the size of the string it represents!

You might try porting your code to libpqxx, which is C++-native and should
make large swathes of this sort of code unnecessary.

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] 7.5 beta version

2004-04-14 Thread Jeroen T. Vermeulen

On Wed, Apr 14, 2004 at 12:22:18AM +0200, Kurt Roeckx wrote:

> > But in the case of x86 (among others) that's the in-register
> > representation, no?  IIRC they are stored to memory as 64-bit doubles at
> > best.
> 
> You also have "long double"s on some compilers which could be 80 bit.
 
Actually, they're a supported feature in both C99 and C++ IIRC.  But I
still suspect they're not *actually* 80 bits, at least not in memory.


> We gave up trying to make C++ dlls on windows because of ABI/name
> mangling problems, never tried it again though.
> 
> The compilers from Microsoft and Borland atleast aren't
> compatible.

But that shows up as link errors, not at runtime, right?


Jeroen


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] 7.5 beta version

2004-04-12 Thread Jeroen T. Vermeulen

On Mon, Apr 12, 2004 at 12:35:15PM -0700, Dann Corbit wrote:

> I do know of important differences in compilers in this regard.  You can
> (for instance) have 80 bit floating point on one compiler using double
> but it is only 64 bits on another.

But in the case of x86 (among others) that's the in-register
representation, no?  IIRC they are stored to memory as 64-bit doubles at
best.

> The point being that there is no such thing as a binary interface for
> alignments or data types that is defined by the C or C++ ANSI/ISO
> language standards.  If there is another standard, I would like to hear
> about it.

That depends on the platform vendor.  Which depending on the platform may
actually be whoever specified the CPU architecture and/or whoever supplied
the OS.  As you say, compilers may deviate from it although in many cases
it would render them useless.

In this particular case, it would most likely be Microsoft as the dominant
{OS,compiler,everything else} vendor.  I *think* (but I'm not sure) that
Microsoft set an ABI even at the C++ level (as Intel also did with the
Itanium, BTW), although it's more common to specify the C level only.

In C++, ABI compatibility is normally protected through a side effect of
name mangling.  By maintaining different name mangling schemes for
different ABI conventions, compiler vendors ensure that object files will
refuse to link to other object files that adhere to different ABIs.

> Here is my puzzlement...
> If I compile a PostgreSQL database on some 64 bit machine, I should be
> able to access it from a 32 bit machine.  For instance, I can access
> DB/2 on our 3090 or Rdb on our Alpha from a 32 bit workstation and I
> have no problems of this nature.  Surely it is an issue with PostgreSQL
> that has been recognized before.

I would say yes, definitely!  That part is not in question here, only the
linking-across-compilers part.  But see below.

> If I change compilers or if I even completely change architectures it
> should not matter.  The interface to the database should be architecture
> independent.  Said another way:
> I should have no concerns about what sort of architecture the server is
> on or what compiler was used.

Unless you use the binary mode of data transfer perhaps; I think that's been
rationalized in 7.4 and is now portable.  No idea what happens if you
convert tables written in an older version (say, 7.3) to 7.5 and then
read them from a wildly different platform than you wrote them from, but
that may be a bit far-fetched.

Jeroen

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] 7.5 beta version

2004-04-12 Thread Jeroen T. Vermeulen

On Mon, Apr 12, 2004 at 11:55:45AM -0700, Dann Corbit wrote:

> 1.
> The C language does not define alignment of structs.

Platform ABI standards do, though (hence the "as long as it adheres to..."
clause in my previous post).  Whether it's in the C language or in the
platform's ABI standards is important when writing portable code; it should
not matter when compiling an existing product--provided that the platform
defines the ABI unambiguously.  The latter is what's in question here.

> The C language does not specify that an integer shall be the same size
> on two different platforms.

That is not a problem when linking across compilers, only when linking
across ABIs.  I would expect that doing so would fail at link time.
Besides, as a practical matter, even the 64-bit platforms current today
normally specify that int is a 32-bit type so this should not come into
play in this case, even when linking e.g. Itanium code to x86 code.

> The C language does not specify that IEEE arithmetic must be used or
> even that a double shall be able to represent a value larger than a
> float.

Again, I doubt this is relevant in this case where the problem is linking
across compilers on the same platform, not across platforms.

> Mingw GCC is both a C and a C++ compiler.  Fortunately, the C compiler
> is used, because there is no way that the code base would compile as C
> code.  A trivial example to show why not is the frequent use of the C++
> keyword 'new' in the code base.  Here is an example:

With gcc at least, the matter is even simpler than that.  It selects an
appropriate front-end (C, C++, ...) based on the names of the source files
being compiled.  This may break down at link time however; g++ links to
the C++ standard library, which gcc by default doesn't.

Jeroen

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] 7.5 beta version

2004-04-12 Thread Jeroen T. Vermeulen

On Sun, Apr 11, 2004 at 10:21:30PM -0400, Bruce Momjian wrote:
 
> I was not sure if Win32 had standard alignment for C.

Good point.  There's standards, and then there's Windows.  It's possible
that separate "tight-packing" and "regular" pragmas are used there, just
for structs that are expected to be used across linkage boundaries.


Jeroen


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] 7.5 beta version

2004-04-11 Thread Jeroen T. Vermeulen

On Mon, Apr 05, 2004 at 09:38:13PM -0400, Bruce Momjian wrote:
> 
> I don't think you can mix libs/binaries from different compilers.

As long as it's plain old C, and the compilers adhere to the platform's
ABI standards, why not?  Even if you compile the C code using a C++
compiler, as in this case, any C structs will be PODs and so should be
compiled according to the C layout rules.

Jeroen

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [pgsql-www] [HACKERS] The Name Game: postgresql.net vs.

2004-03-12 Thread Jeroen T. Vermeulen

On Fri, Mar 12, 2004 at 01:02:00PM -0600, Frank Wiles wrote:
> 
>   As for the "length" of the URL, I think any developer or user 
>   of PostgreSQL is knowledgeable enough to take advantage of browser
>   bookmarks. :) 

I've heard this said a several times now, but that doesn't make me feel
any better.  I frequently find myself in situations where I *must* get
to my project site from a memorized URL, and clicking through to it is
a luxury I can ill afford.  I travel.  Sometimes I'm dependent on slow
lines and/or other people's machines.  For instance, whether I will be
able to respond promptly to new support requests and bug reports over the
entire month of May this year will depend partly on that ability.

Apart from that, we could do with some public attention and that's where
catchiness matters.

Jeroen

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [pgsql-www] [HACKERS] The Name Game: postgresql.net vs.

2004-03-12 Thread Jeroen T. Vermeulen

On Fri, Mar 12, 2004 at 10:43:34AM -0600, Thomas Swan wrote:
> 
> foundry.postgresql.org?

Been through that one...  Too long when you have to add project name as
well.


Jeroen


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] The Name Game: postgresql.net vs. pgfoundry.org

2004-03-12 Thread Jeroen T. Vermeulen

On Fri, Mar 12, 2004 at 10:37:58AM -0500, Tom Lane wrote:
> 
> Well, if you want to think along those lines, I believe that we (PGDG)
> currently hold these domain names:

[...]

>   postgres.org

This is the one I was silently rooting for, but figured was too good to
be true.

> You could make a case that postgres.org for the projects would be the
> perfect complement to postgresql.org for the core.

Still _slightly_ confusing, but I think the plain and simple idea of a
prominent banner was mentioned.  We can have them both ways to avoid all
confusion.  I say go for it!

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] The Name Game: postgresql.net vs. pgfoundry.org

2004-03-12 Thread Jeroen T. Vermeulen

On Fri, Mar 12, 2004 at 02:42:47PM -, Dave Page wrote:
> 
> We need some distinction between the core project sites and other
> project sites - istm that a different domain is the only way to do that.

Okay, then how about postgres-extra.net, or forpostgres.net?

Saying Postgres instead of PostgreSQL takes out a bit of that extra length
and it's lots easier to pronounce.  We've been through this whole what-
shall-we-call-it thing months ago and IIRC the upshot was that the short
version of the name is perfectly acceptable and much catchier.  Here's a
chance to use it!

Even shorter and catchier would be "pgprojects.net" IMHO, but that again
stretches the connection with PostgreSQL.

Jeroen

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] The Name Game: postgresql.net vs. pgfoundry.org

2004-03-11 Thread Jeroen T. Vermeulen

On Thu, Mar 11, 2004 at 07:01:47PM -0500, Tom Lane wrote:
> 
> Actually, proposal (A) does provide such a separation: notice that the
> projects would go under *.postgresql.net, with the core database remaining
> at *.postgresql.org.  I am not sure if that will provoke confusion or
> not, but I think I like it better than pgfoundry because it is clear
> that the domains are related.  pgfoundry seems a bit, um, random.

Agree with the last bit, but I really feel that the difference between
postgresql.org and postgresql.net is too subtle--at least for people who
don't work with either very often.

Here's another idea: couldn't we have a subdomain for the projects, as in
".forge.postgresql.org"?  Or would that be too long?

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] The Name Game: postgresql.net vs. pgfoundry.org

2004-03-11 Thread Jeroen T. Vermeulen

On Thu, Mar 11, 2004 at 03:14:10PM -0800, Josh Berkus wrote:
> 
> So far, only 4 people, total, have expressed opinons on the matter.  I'm 
> throwing this on Hackers so that members of projects we will be hosting can 
> indicate whether they:
> 
> A) Favor www.postgresql.net
> B) Favor www.pgfoundry.org
> C) Don't care as long as the porting is relatively painless.

I'm not crazy about the name pgfoundry, but otherwise I think it's the
better choice.  The "www." problem could be circumvented by renaming the
project, perhaps, but I think it's best to keep a distinction between
"Postres, the database" and "related projects."

Jeroen

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [pgsql-www] [HACKERS] Collaboration Tool Proposal

2004-02-26 Thread Jeroen T. Vermeulen

On Thu, Feb 26, 2004 at 10:49:46AM -0800, Josh Berkus wrote:
> 
> Not sure.   Depends on what the leads of the associated projects think.   
> Obviously, if everyone's dead set against it, we won't do it.

I for one am willing to try this in the near term.  I've got an external
domain (pqxx.tk) pointing to the libpqxx page on GBorg, and moving it over
to a new URL is child's play.  My main worry is transition management:

 - How will mailing list subscribers be affected?
 - How will CVS users be affected?
 - Can the mailing list archives be moved over?
 - Where will my old bug reports and corresponding discussions go?
 - Can FAQ entries be copied over automatically?
 - Is there a way of migrating these services one by one?

If it takes some scripting and/or programming to do some of this, I'm
willing to help insofar as I have time.

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] [pgsql-www] Collaboration Tool Proposal

2004-02-26 Thread Jeroen T. Vermeulen

On Thu, Feb 26, 2004 at 09:16:38PM +0100, Peter Eisentraut wrote:
> 
> In terms of improving the hosting infrastructure, this would surely be a 
> step forward, but the problem with "collaboration" is not that the 
> tools are missing, it's that people are unwilling to use any tools for 
> issue tracking, etc.  This is in fact a near-universal problem.  If you 
> look at sourceforge, very few projects actually use any of the 
> "collaboration" tools.  If you want to get the project to do something, 
> you still have to use email and CVS.  And with those projects (not 
> necessarily on sourceforge) that have a sophisticated bug tracking 
> structure, the sheer number of filed bugs is so large and irregular in 
> quality that the bugs are in fact meaningless.  (Oddly enough, the 
> projects I have in mind here do *not* use a full-service collaboration 
> tool, just a bug tracker.  Make of that what you will.)  So yes, I 
> think this is a reasonable plan, just don't expect "collaboration" to 
> suddenly appear out of nowhere.

One thing that helps a lot in my experience is the ability to manage bug
reports.  On gborg, for instance, I'm stuck with several dozen duplicates
from a time there were technical problems with the site; lots of "semantic
garbage" in the form of people making silly assumptions, not reading
earlier bug reports, or asking generic C++ questions; requests for features
that are already there; support requests and other irrelevant issues; and
multiple reports covering the same underlying problem.

If I could merge, delete, categorize & group these requests the list would
be a lot easier to manage.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

1 2 >

1 - 100 of 176 matches

Mail list logo