Re: [GENERAL] Which Python library - psycopg2 or pygresql?

2008-04-20 Thread Paul Boddie
On 18 Apr, 14:12, [EMAIL PROTECTED] (Karsten Hilbert) wrote:
>
> If one wants to operate on one/a range of row(s) but the
> code fetches "all" rows (for various values of all) then I'd
> suspect there's something missing in the SQL statement, say,
> a LIMIT or appropriate WHERE conditions - regardless of
> whether a cursor is used or not.

But if you want to process all of the rows, and you don't want the
client to suck them all down at once, then you need to use the
database system's cursor support.

> If you refer to whether server-side cursors are used one
> must explicitly request them from psycopg2 by using the
> "name" argument to the connection.Cursor() call. Combine
> that with a Python generator and one should end up with
> truly on-demand single-row fetching.

As I noted, the problem is arguably shared between the database system
(because cursors don't work with certain statements that you might
use, and there's no way of finding out without trying) and the
database adapter (because it doesn't try to support the behaviour
implied by the DB-API). Inventing names for cursors, although tedious,
is the easy part in all this.

> Unfortunately, I am not entirely sure how and when psycopg2
> uses (database) cursors when no name argument is supplied.

It doesn't.

> IMO the cursor concept of the DB-API is broken anyhow -
> everything is forced to go through a (DB-API) cursor no
> matter whether a database-side cursor would be wanted or not
> and there's no provision for controlling the latter via the
> API itself.

Well, the DB-API doesn't seem to be moving in any real direction these
days, anyway. I've wanted and even proposed code for a single
parameter standard, and the progress on that matter has been glacial:
it's too controversial to do what ODBC and JDBC have been doing for
years, apparently. Still, I don't really see that doing the equivalent
of a cursor.fetchall for something like cursor.fetchone is appropriate
when "all" might be millions of rows, but that's just my view.

Paul

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Which Python library - psycopg2 or pygresql?

2008-04-18 Thread Paul Boddie
On 15 Apr, 17:53, [EMAIL PROTECTED] (Erik Jones) wrote:
> On Apr 15, 2008, at 10:27 AM, Dawid Kuroczko wrote:
>
> > By the looks of descriptions I am slightly inclined towards
> > psycopg2, but I would feel better if I talked with people
> > who actually used these libraries.
>
> Most definitely psycopg2, it's pretty much the standard dbapi
> compliant Postgres driver library for Python.

One caveat: psycopg2 doesn't (or didn't) use cursors in a transparent
fashion like pyPgSQL does. If you're traversing potentially large data
sets, this will mean that psycopg2 will download all the result data
into the client process unless you start introducing explicit DECLARE
CURSOR statements in all the right places. Although this might not be
an issue if you're determined to only support PostgreSQL and psycopg2,
it's worth noting that the behaviour is somewhat counter-intuitive
from the perspective of people with experience of other database
systems: attempting to fetch a single row (or a limited number of
rows) may cause you to discover that the client has acquired all of
them and has taken over the job of feeding them to your code, instead
of leaving that to the database system.

Admittedly, the cause of the lack of such support in psycopg2 is the
uncertainty regarding cursor-capable statements in PostgreSQL: pyPgSQL
uses potentially awkward and fairly simplistic techniques to guess
whether the issued statement can be used with cursors, and I can
understand that the psycopg2 developers want to steer away from such
practices.

Paul

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] MySQL to Postgres question

2008-03-21 Thread Paul Boddie
On 21 Mar, 17:15, [EMAIL PROTECTED] ("Edward Blake") wrote:
>
> When I try and rewrite it as a Postgres statement (below), it fails at line
> 9.
>  0 SET CONSTRAINTS ALL DEFERRED;
>  1 CREATE TABLE products (
>  2  product_id serial[11] not null,
>  3  product_name varchar[255] not null,
>  4  product_descrition varchar[255] not null,
>  5  class_id integer[11] not null,
>  6  subclass_id integer[11] not null,
>  7  department_id integer[11] not null
>  8  PRIMARY KEY (product_id),
>  9  KEY class_id (class_id),

Isn't KEY a MySQL shorthand for creating an index within the table
declaration. Why not create the index afterwards using CREATE INDEX
instead?

> 10  KEY subclass_id (subclass_id),
> 11  KEY department_id (department_id)
> 12 );
>
> Any ideas?

Yes, just decouple the index declarations from the table declaration.
There are benefits to doing this, too, such as being able to populate
tables more rapidly before the indexes are added - a technique which
appears to be useful for certain kinds of applications.

Paul

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgre vs MySQL

2008-03-14 Thread Paul Boddie
On 14 Mar, 09:26, [EMAIL PROTECTED] ("jose javier parra sanchez")
wrote:
> >  itself open source, you have to pay to get a license.  Pay for GPL 
> > software?
>
> You cannot be serious, GPL has no relation with monetary value. The
> GPL is a 'Usage License'.  If i write GPL software to my clients,
> should i give it free of charge ?. That's absurd.

Yes, it's nice to see the standard licensing rumours spread around
completely unconstrained by inconvenient things like the facts. Of
course you can charge people for GPL-licensed software, but you have
to promise to let them have the source code at no additional cost. And
the mere existence of your GPL-licensed software doesn't mean that you
are obliged to give random inquirers the source code: it's only if
you've already distributed the software to people that they have the
right to the source.

http://www.fsf.org/licensing/licenses/gpl-faq.html#DoesTheGPLAllowMoney

As for things like contributor agreements, that has nothing to do with
the licence and whether a product is Free Software or not: it's a
copyright thing; various permissively licensed projects have
contributor agreements, too. Naturally, the MySQL corporate entity
want people to assign copyright to them so that they can then offer
the code under a proprietary licence, but there would be nothing to
stop you from just forking MySQL and offering it as a purely GPL-
licensed product.

And with respect to the MySQL corporate policy on using their product
in proprietary software, I believe that the reason why the client
libraries are GPL-licensed is precisely because nobody bought their
case for insisting that merely using the database system from a
program creates a GPL-licensed derived work consisting of MySQL and
the program. By linking to the client libraries, however, you are
creating a GPL-licensed derived work in a situation that the FSF would
actually go along with. The recent tendency of differentiation between
the "commercial" and "open source" editions would also indicate that
people aren't really believing the MySQL corporate spin, either.
Here's an example of the smoke and mirrors:

http://forums.mysql.com/read.php?4,31,888#msg-888

In some businesses with a dual-licensing model, I think it can be the
case that some people in sales/marketing/licensing like to make claims
that wouldn't stand up to thorough scrutiny, but where customers
probably aren't going to risk making a fuss if the licensing costs are
relatively low. Really, the MySQL people would have more credibility
if they just charged for support and bug-fixing and/or used something
like the Affero GPLv3 instead of the vanilla GPL, rather than trying
to ride two quite different horses.

Paul

P.S. It's not that I use MySQL, being happy with PostgreSQL, but
people should at least try and understand the licensing issues
involved.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Auto incrementing primary keys

2008-02-18 Thread Paul Boddie
On 18 Feb, 13:36, django_user <[EMAIL PROTECTED]> wrote:
>
> How can stop postgresql from incrementing the primary key value, so
> that even after many failed insert statements it get the next id val.

"Auto-incrementing" columns, typically implemented using the serial
data type [1], employ sequences.

>From the manual:

"To avoid blocking of concurrent transactions that obtain numbers from
the same sequence, a nextval operation is never rolled back; that is,
once a value has been fetched it is considered used, even if the
transaction that did the nextval later aborts. This means that aborted
transactions may leave unused "holes" in the sequence of assigned
values. setval operations are never rolled back, either."

http://www.postgresql.org/docs/8.1/interactive/functions-sequence.html

In other words, to permit a decent level of concurrency, PostgreSQL
doesn't wait to see if a transaction succeeds with a value from a
sequence before updating the sequence. If you want to reset a sequence
so that it always uses the next unused value as determined by looking
at the table, I suppose you could do something like this:

select setval('mytable_id_seq', x) from (select max(id) as x from
mytable) as y;

But I doubt that you would want to do this too often in any system
with any reasonable level of concurrent access to the table or the
sequence concerned.

Paul

[1] http://www.postgresql.org/docs/8.1/interactive/datatype.html#DATATYPE-SERIAL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Deadlock when updating table partitions (and presumed solution)

2007-12-06 Thread Paul Boddie
On 5 Des, 05:00, [EMAIL PROTECTED] (Tom Lane) wrote:
>
> Yeah, this is a problem.  The SELECT will acquire AccessShareLock
> on R and P, and subsequently try to acquire AccessShareLock on all
> the inheritance children of P (and I don't think the order in which
> these locks are acquired is very clear).  Meanwhile the ALTER acquires
> AccessExclusiveLock on Pm and R --- probably in that order, though
> I'd not really want to promise that ordering either.  So the potential
> for deadlock is obvious.

Indeed.

> You seem to be hoping that the SELECT would avoid acquiring lock
> on child tables Pn that it didn't need to access, but this cannot be:
> it has to get at least AccessShareLock on those tables before it can
> even examine their constraints to find out that they don't need to be
> scanned.  And even if it could magically not take those locks, the
> deadlock condition still exists with regard to the child table that
> it *does* need to access.

Understood. I was really wondering whether the SELECT would be able to
acquire locks on child tables at the same time as it acquired the lock
on the parent table, but I suppose this isn't an atomic operation: it
first has to acquire a lock to be able to see the constraints; then it
finds referenced tables and attempts to acquire locks on them.

> I guess I'm wondering why you need to be adding foreign key constraints
> during live operations.

This was just some impatience on my part while updating my database: I
was merely inspecting some data which I knew resided in some
partitions whilst some other partitions were being altered. Obviously,
the database system cannot know that some data of interest isn't going
to be found in some partition without checking the properties of that
partition. Consequently, it made sense for me to exclude such
partitions from consideration by the SELECT in order to help it reach
the requested data whilst keeping it out of the way of the alteration
activities.

I suppose the lingering question is this: what constraints should I
drop in order to avoid such problems? Dropping the insert rule from
the parent table for each child table being altered *seems* to
diminish the possibility of deadlock, in that my tests produced no
deadlock situations when I adopted this approach (whereas such
situations were unavoidable before adopting this approach), but
shouldn't I actually be removing the check constraints from the child
tables instead? The manual for 8.1 says that "constraint exclusion is
driven only by CHECK constraints", but my intuition tells me that the
SELECT should initially be driven by the mere existence of tables
inheriting from the parent table and that the insert rules should have
little or nothing to do with it.

Paul

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Deadlock when updating table partitions (and presumed solution)

2007-12-04 Thread Paul Boddie
I recently encountered an interesting situation with regard to
partitioned tables, concurrent updates and deadlocks which probably
has an obvious explanation, although I can't seem to find one in the
manual. Below, I explain the situation and provide some of my own
naive reasoning about what seems to be happening. Since I think I now
know how to avoid such matters, this message is mostly for the
purposes of sharing my recent experiences with those who may one day
encounter similar problems. I'd be grateful if anyone can explain what
must really be occurring and correct any erroneous conclusions,
however.

I have one process querying a table P with partitions P0, P1, P2, ...
Pn joined with table R as follows:

select * from R inner join P on R.id = P.id and P.section = 5

...where the column "section" determines which partition shall be
searched utilising the constraint exclusion support in PostgreSQL.
Here, I use the specific value of 5 to indicate that the involvement
of a specific partition is envisaged.

Now, each partition of P is created inheriting from P, and I also
include a rule which "redirects" inserts from P to the specific
partition of P depending on the value of "section". This latter detail
is, I believe, the principal contributing factor to the problems
subsequently experienced.

I have another process performing updates to individual partitions of
P - specifically "alter table" operations adding foreign key
constraints referencing R as follows:

alter table Pm add constraint Pm_fk_id foreign key(id) references
R(id)

...where "m" is the specific partition number, starting at 0,
increasing by 1, ending at n.

What seems to happen, by looking at pg_lock (and pg_class) is that the
following sequence of events occurs:

 1. The query process acquires an AccessShareLock on R and P.
 2. The update process acquires an AccessExclusiveLock on Pm and seeks
an AccessExclusiveLock on R.
 3. The query process seeks an AccessShareLock on P0 ... Pn.
 4. Deadlock is declared.

Since the query should only involve a single partition of P, one might
expect that the query process might immediately obtain an
AccessShareLock on P5, but what seems to occur is a race condition:
the update process is sometimes able to acquire a lock on P5 before
the query process is able to realise the involvement of P5 in the
query operation. Moreover, a deadlock occurs even when the update
process is adding the foreign key constraint to tables other than P5,
suggesting as I note above that all child tables are involved in the
query operation.

My initial conclusions were as follows:

 1. A query on a partitioned table only initially causes lock
acquisition on the parent table.
 2. Subsequent attempts to acquire locks on child tables conflict with
the locking done by the "alter table" operation.
 3. The classic solution (ensure consistent lock acquisition order)
may not be readily applicable.

Intuitively, I understood that PostgreSQL may only resolve the child
tables involved in a query by using a mechanism specific to the
partitioning infrastructure. I then considered the role of the rules
(collectively redirecting inserts from P to P0 ... Pn), even though
they are concerned with insert statements. By dropping the rule
associated with a given child table before attempting the "alter
table" operation on that table, then recreating the rule, it would
appear that the issues with lock acquisition disappear.

It makes sense that, if operating on a specific child table, the links
to the parent should be broken temporarily in order to isolate it from
the parent and any operations which may involve all children (or even
the checking of the involvement of all children), and to not realise
this may have been an oversight on my part. Can anyone help me to
refine my thinking further on this matter?

Paul

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] PostgresSQL vs Ingress

2007-12-02 Thread Paul Boddie
On 30 Nov, 16:12, [EMAIL PROTECTED] (Tom Lane) wrote:
>

[Quoting a re-telling of the myth of products living happily ever
after under the control of big companies]

> Anyone who thinks that's a reason to feel good is living on some other
> planet than I do.  Consider that if the company *does* decide to abandon
> the product ... which happens all the time, particularly for products
> that aren't market leaders ... you are up the proverbial creek with no
> paddle.  You've never seen the code and never will.

Indeed. I used to work with a database system which had already
changed ownership at least once, and through a succession of
acquisitions not dissimilar to fish being eaten by successively bigger
fish, with each owner slotting the product alongside some very similar
existing products in their portfolio, the product eventually ended up
being owned by a very large company with a lot of other products on
their shelf (or, if you prefer, a very big fish with a lot of smaller
fish in its diet).

Now, fortunately, I haven't had anything to do with the product
concerned for many years, and although the current owner has a
reputation for supporting stuff over long periods of time, one has to
wonder what kind of support you're actually going to get, whether
there's going to be much new development, or whether the cumulative
effect of the rationalisation process (which saw the little fish all
eaten up) is to milk the existing customers for as long as they can
bear sticking with the product and not migrating to anything else. I
think I'd rather have the source code and a Free Software licence than
an account manager and a corporate roadmap.

Paul

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] select count() out of memory

2007-10-29 Thread Paul Boddie
On 25 Okt, 17:36, [EMAIL PROTECTED] wrote:
>
> The design is based on access patterns, i.e. one partition represents a
> group of data along a discrete axis, so the partitions are the perfect for
> modeling that. Only the last partition will be used on normal cases. The
> previous partitions only need to exists until the operator deletes them,
> which will be sometime between 1-6 weeks.

This has been interesting reading because I'm working on a system
which involves a more batch-oriented approach in loading the data,
where I've found partitions to be useful both from a performance
perspective (it looks like my indexes would be inconveniently big
otherwise for the total volume of data) and from an administrative
perspective (it's convenient to control the constraints for discrete
subsets of my data). However, if all but the most recent data remains
relatively stable, why not maintain your own statistics for each
partition or, as someone else suggested, use the pg_class statistics?

I'd just be interested to hear what the best practices are when tables
get big and where the access patterns favour the most recently loaded
data and/or reliably identifiable subsets of the data, as they seem to
in this case and in my own case. The various tuning guides out there
have been very useful, but isn't there a point at which partitioning
is inevitable?

Paul


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Database reverse engineering

2007-09-18 Thread Paul Boddie
On 13 Sep, 06:12, [EMAIL PROTECTED] (Ow Mun Heng) wrote:
> On Mon, 2007-09-10 at 13:00 -0600, RC Gobeille wrote:
> > Or this one:
> >http://schemaspy.sourceforge.net/
>
> Can't seem to get it to connect to PG using the example.
>
> java -jar schemaSpy_3.1.1.jar -t pgsql -u operator -p operator -o
> test_db  -host localhost -db test_db

As the end of the error message suggests, you might want to try
something like this:

java -jar schemaSpy_3.1.1.jar -cp postgresql-xxx.jar -t pgsql -u
operator -p operator -o test_db  -host localhost -db test_db

The argument for -cp needs to be the full path to the .jar file, and
must obviously use the real filename of that file itself. You may then
get complaints about not finding the schema: I had to specify "-s
public", I think.

Paul


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match