Re: [SQL] [pgsql-sql] Daily digest v1.3050 (5 messages)

2009-06-26 Thread Science

pgsql-sql-ow...@postgresql.org wrote:

Date: Thu, 25 Jun 2009 17:13:42 +0100
From: "Oliveiros Cristina" 
To: "Rob Sargent" ,

Subject: Re: Bucketing Row Data in columns
Message-ID: <02e701c9f5af$e8cb4f90$ec5a3...@marktestcr.marktest.pt>

I admit that must be a more elegant and faster solution with pl/psql (or 
whatever other languages)


As I don't know nothing about pl/psql I tried with pure sql (if you don't 
have a hunting dog, hunt with a cat)


But obviously this solution doesn't scale well if you have a giant table 
with lots of columns


- Original Message - 
From: "Rob Sargent" 

To: 
Sent: Thursday, June 25, 2009 4:57 PM
Subject: Re: [SQL] Bucketing Row Data in columns


>I would be suspicious of this sort of solution of turning rows into columns 
>by mean of a series of correlated sub-selects.  Once the data set gets 
>large and the number of columns goes over 2 or 3 this will in all 
>likelihood not perform well. I had the pleasure of re-writing a "report" 
>which was based on count() (similar to sum()) per user_id with the counts 
>going into various columns per user.  18000 users, a dozen columns from 
>table of 2 million rows, report took >1,000,000 seconds (yes almost 12 
>days) to complete.  Re-write runs in 5-10 minutes (now at 10M rows) by 
>getting the counts as rows (user, item, count) into a temp table and making 
>the columns from the temp table (pl/psql)  Getting the counts takes half 
>the time, making the flattened report takes half the time.

>

Is it possible that using the "tablefunc" contrib module would help. 
What I mean is, couldn't this be written as a performant query that 
returns a set of rows and then use the crosstab capability to simply 
rewrite that rows as columns?


As another poster pointed out you can do the same with a set of CASE 
statements, but I wanted to throw this idea out there as well. I'm not 
too familiar with the the tablefunc / crosstab stuff, but it seems like 
this is generally what you're trying to accomplish?


I googled this fwiw: 
http://www.tek-tips.com/viewthread.cfm?qid=1444284&page=1


Best,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Bit by "commands ignored until end of transaction block" again

2009-07-26 Thread Science

Date: Thu, 23 Jul 2009 09:06:50 +0100
From: Richard Huxton 
To: Glenn Maynard 
Cc: pgsql-sql@postgresql.org
Subject: Re: Bit by "commands ignored until end of transaction block"
again
Message-ID: <4a681a1a.1090...@archonet.com>

Glenn Maynard wrote:

On Thu, Jul 23, 2009 at 2:41 AM, Richard Huxton wrote:

Ah [cue light-bulb effect], I think I understand. Your function isn't in the
database is it? Surely your application knows if it's issuing BEGIN..COMMIT?

I'm writing a Python library call.  It has no idea whether the caller
happens to be inside a transaction already, and I don't want to
specify something like "always run this inside a transaction".
(Callers are equally likely to want to do either, and it's bad API to
force them to start a transaction--the fact that I'm using the
database at al should be transparent.)


That last bit is never going to work. There always needs to be some 
basic level of understanding between systems and transactions really 
have to be part of that for talking to a RDBMS. There will have to be a 
piece of code responsible for managing transactions somewhere in the 
middleware/application layers.



You'll have people with torches and pitchforks after you if you change
RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork.

RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint
that it's releasing started it.  Every currently-valid case requires
that a transaction is already started, so no existing code would be
affected by this.

SAVEPOINT a; -- implicitly issues BEGIN because one wasn't started
RELEASE SAVEPOINT a; -- implicitly issues COMMIT because savepoint "a"
issued the BEGIN, not the user

[snip]

Of course, there are other details--it probably shouldn't allow
ROLLBACK or COMMIT on an implicit transaction block, for example.


All you're doing here is moving the point of confusion around, surely? 
At some point you still need to know whether you can issue 
BEGIN/ROLLBACK/COMMIT etc.



Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines
of plpgsql will neatly encapsulate the problem. That plpgsql can be
automatically generated easily enough too.

I don't think so, at least not without digging into internals.  Django
is built around knowing all data types, so it'd need to be givne types
explicitly--for example, to know whether a timestamp should be
formatted as a timestamp, date or time.  (I do have a couple other
columns here--timestamps for cache expiration, etc.)  I'll have to ask
Django-side if there's a public API to do this, but I don't think
there is.


Well, the types would be exactly the same as for your existing insert. 
All it's really doing is changing the template those values get 
substituted into. It presumably does mean patching the ORM (or 
subclassing from it anyway).



Ah, the joys of badly designed ORMs. The nice thing is that there seem to be
plenty of bad ones to choose from too. If your ORM doesn't handle
transactions well, the more you use it the more difficult your life will
become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM
as cleanly as you can. That's assuming they're not interested in patches.

The ORM on a whole is decent, but there are isolated areas where it's
very braindamaged--this is one of them.  They have a stable-release
API-compatibility policy, which I think just gets them stuck with some
really bad decisions for a long time.


Presumably they targetted MySQL first, where there's a lot less use in 
multi-statement transactions with their different behaviour of their 
various storage-engines.




FWIW, the way the Rails ORM ActiveRecord (another fairly damaged ORM) 
handles this is by allowing you to open any number of transaction 
blocks, but only the outer transaction block commits (in Pg):


Property.transaction { # SQL => 'BEGIN'
  User.transaction {
Foo.transaction {
  Foo.connection.execute('--some sql code') # SQL => '--some sql code'
}
  }
} # SQL => 'COMMIT'

This is pretty kludgy but lets me start any arbitrary transaction 
without worrying if there's already another one running "on top" of me 
(which I gather is your worry in this case). Dunno if the approach could 
work using a class wrapper and the Django ORM, but I would have thought 
that the implementation shouldn't be too hard..


And apologies to the list for going a little off-track from Pg.

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Bit by "commands ignored until end of transaction block" again

2009-07-27 Thread Science

Craig Ringer wrote:

On Sun, 2009-07-26 at 19:15 -0400, Science wrote:

FWIW, the way the Rails ORM ActiveRecord (another fairly damaged ORM) 
handles this is by allowing you to open any number of transaction 
blocks, but only the outer transaction block commits (in Pg):


Property.transaction { # SQL => 'BEGIN'
   User.transaction {
 Foo.transaction {
   Foo.connection.execute('--some sql code') # SQL => '--some sql code'
 }
   }
} # SQL => 'COMMIT'


What happens if, Foo.transaction does something that causes an error,
though, or issues a rollback? It's not creating savepoints, so if
Foo.transaction rolls back it throws out the work of User.transaction
and Property.transaction too.

Ugh.

That design would be quite good _IF_ it used savepoints:


Property.transaction { # SQL => 'BEGIN'
   User.transaction {  # SQL => SAVEPOINT User
 Foo.transaction { # SQL => SAVEPOINT Foo
   Foo.connection.execute('--some sql code') # SQL => '--some sql code'
 } # SQL => RELEASE SAVEPOINT Foo
   }   # SQL => RELEASE SAVEPOINT User
}  # SQL => 'COMMIT'

... so that inner transactions could ROLLBACK TO SAVEPOINT on error ,
and so that asking for a rollback would give you a ROLLBACK TO SAVEPOINT
if the transaction is a subtransaction.



For all I know that's how it works these days. I haven't looked at the 
code underneath this in a couple of years. It should be trivial to 
implement in the way you describe based on how the Ruby codebase is set 
up -- hopefully all this will help OP with the Django/Python version of 
the same problem.


Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql