Re: [GENERAL] Using Postgres to store genetic data

2009-07-10 Thread Peter Hunsberger
On Fri, Jul 10, 2009 at 4:02 PM, Steve Crawford
 wrote:
>
> Peter Hunsberger wrote:
>>
>> We're looking at potentially using Postgres to store a variety of molecular 
>> and genetic data.  At this point I have a bunch of general questions...
>
> I don't know enough about your area of expertise to know if this is useful, 
> but I'd look at the Unison project to see what they are doing. There are also 
> videos/slides up from a recent San Francisco PUG meeting at UCSF that might 
> be of interest:
>
> http://www.vimeo.com/3732938
> http://www.slideshare.net/linuxpoet/unison-ucsf-sfpug
>

Looks like it may be somewhat applicable to my first question. Thanks.

Area of expertise is application architecture and design, focusing on
clinical systems for medical research these last 7 years or so and
medical in general for the last 15.  Adding the genomic and molecular
world to the mix is a new initiative. At the moment I've got enough
domain knowledge to be dangerous...

--
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL]

2009-07-10 Thread Clark Slater
Hello-

I am working on an e-commerce system that has different lists of products
which contain many of the same products, at different prices.  When a user
searches for a certain set of part numbers, I would like the resulting
products (and prices) to come from one of the lists, according to the
list's priority.  Each user can have a different set of lists and
priorities.

Table: product_lists
 id | name | priority |  user_id
+--+--+--
 5  |  General List of Products| 2| 23
 3  |  Different List of Products  | 3| 23
150 |  Customer-Specific Products  | 1| 23

Table: products
product_list_id | part_number |  price
+-+---
   3|   92298A| 123.38
   5|   92298A| 111.04
   3|   C39207|  78.38
  150   |   C39207|  67.93

Below is a simplified example of the structure of the query I am working
with.  I realize that in this case, I could re-factor all of this into one
statement, but each sub-query in the real case has a more complex set of
joins that determines the price.  The pricing joins from one sub-query to
the next vary, so a collection of sub-queries seemed to be a logical
solution.  Some part numbers are found in only one of the lists, while
other part numbers are repeated across lists at different prices.

This is what I would *like* to say:

SELECT DISTINCT ON (part_number) * FROM (

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id=product_lists.id
AND product_list_id=150
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=5
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=3
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

) AS filter_duplicates ORDER BY priority,part_number

I need to ORDER BY priority so that, in the case of duplicates, the
product from the desired list is returned first.  Then the purpose of
DISTINCT ON is to filter out any duplicate part numbers that have a lesser
priority.  But, the above statement fails because the DISTINCT ON
expression must match the leftmost ORDER BY expression.  However,
inserting the priority into the DISTINCT ON expression means that all of
the resulting tuples are unique, even though the part_number is the same.

If anyone could suggest a solution or alternative approach, I would
greatly appreciate it.

Thank you,
Clark



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Idle in transaction help

2009-07-10 Thread Scot Kreienkamp
We have a java web page that will give us the stack trace of all the
running JDBC connections inside our system.  The problem is that we
currently have no way of relating those stack traces back to a PID so
the programmers can get the stack trace of the hung database connection.
We use the JDBC connection pooling so there's no way to be sure what
stack trace goes to what PID.  I gave the developers the postgres call
to get that backend PID through the JDBC connection a few days ago, but
they don't have the resources to get the additional call built into
their programs for up to 1-2 months.  

I'm working on the business side to get priorities changed, but it
hasn't happened yet.  Mostly because I've got Xymon watching for those
conditions so I can correct them before we get calls into the helpdesk.
Sorry, I'm rambling.

Anyway, I'm trying to attack it from the database side out since I am
not a programmer and can't help with that part.  I can do simple CGIs
with bash, but I don't know Java or C or even Perl yet for that matter.
Since you guys are the experts, I'm looking for any way to attack this
problem from the database side.  The tips I've gotten about the JDBC
driver and commits are helpful in that it gives our programmers things
to watch out for that we didn't realize, and I'm open to any suggestions
from the list about how I can help attack this.  

Since I'm ultimately responsible for database performance and I don't
like being reduced to sitting on the sidelines I'm trying to see what if
anything else my skills can contribute.  

As for patting you on the head, I was being sincere.  And trying not to
come off sounding like a cocky SOB.  :-)

Thanks,

 

Scot Kreienkamp
 


-Original Message-
From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
Sent: Friday, July 10, 2009 7:02 PM
To: Scot Kreienkamp
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Idle in transaction help

On Fri, Jul 10, 2009 at 4:40 PM, Scot Kreienkamp
wrote:
> Thanks scott, but I wrote a cgi to combine all of the process info and
allow
> me to kill errant queries. So I know how to track down the pid. Thanks
for
> trying to help though. :-)

So, what are you looking for, a stack trace dump from java to look
through maybe? (the one that kill -1 or whatever generates? It's been
a few years.)  That'll usually give you the context to find out which
thread is where.  P.s. no need to pat me on the head like the doggie.
:)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] singletons per row in table AND locking response

2009-07-10 Thread Sam Mason
On Tue, Jul 07, 2009 at 06:45:36PM -0700, Dennis Gearon wrote:
> When locking is involved, does a transaction wait for access to a row
> or table, or does it just fail back to the calling code? Would it be
> up to my PHP code to keep hammeing for access to a row/table, or could
> a user defined function do that?

It defaults to waiting for the lock to be released, but will fail if you
tell it to.  Sounds as though you want to be using row-level locks,
it will allow more concurrency where possible, but table level locks
are also possible.  PG also makes a distinction between "share" locks
(multiple transactions can have a share lock on any table or row) and
"update" locks (this locks out share and other update locks).

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [Fwd: Re: [GENERAL] How to trace client sql requests?]

2009-07-10 Thread Erik Jones


On Jul 10, 2009, at 1:31 PM, James B. Byrne wrote:




On Fri, July 10, 2009 16:10, hubert depesz lubaczewski wrote:



truncate. but first simple question - did you commit the inserts?



But if it were done with truncate then I would see truncate in the
log file, yes?

Second, I am working with PG through an ORM called ActiveRecord,
part of the Rails framework.  I do not see a COMMIT anywhere in the
log. Should I?  This is a test run using the Rails test, actually
cucumber, environment.  I know that they do some things differently
with DB connections in this environment but I believe that this is
limited to transactions and rollbacks.

The thing is that this problem only arises when testing the script
inside the test harness. In production it runs just fine.  Further,
if I list the contents of the table from a call in the test harness
immediately prior to executing the script under test then the data
is there.



I'm not sure about cucumber but I'm fairly certain that the Rails test  
runs don't commit their transactions.  That means that if the test  
that is not seeing the data is perhaps being run with a different db  
connection from that that inserted the data then it won't see it.


Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] change location of postmaster.pid file?

2009-07-10 Thread Janet Jacobsen
Hi.  Thanks for the quick and definitive answers to my questions.
The information you provided will save me from wasting time and
energy trying to see how far I could get otherwise.  Thanks very much.

Janet


Tom Lane wrote:
> Janet Jacobsen  writes:
>   
>> Is it possible to create a database cluster on a machine that
>> has write access to the shared file system, shut down the
>> Postgres server on that machine, and then start up the
>> Postgres server on the machine that cannot write to the
>> shared file system, and thereafter, *only query* the database.
>> 
>
> No.  The pid file is only the first and smallest problem you'd run into
> with a read-only database filesystem.
>
>   regards, tom lane
>
>   
> On Fri, Jul 10, 2009 at 11:17 PM, Janet Jacobsen wrote:
>   
>> > Is it possible to create a database cluster on a machine that
>> > has write access to the shared file system, shut down the
>> > Postgres server on that machine, and then start up the
>> > Postgres server on the machine that cannot write to the
>> > shared file system, and thereafter, *only query* the database.
>> 
>
> Postgres isn't really designed to work this way. It expects to have
> write access and will occasionally still write stuff to disk even for
> read-only queries.
>
> It won't work even a little bit before 8.3. For 8.3 or later you could
> maybe make it work using vacuum freeze but there's no facility to
> verify that it's really frozen everything and you'll still be taken by
> surprise by queries which try to use temporary space for large sorts
> or commands which start transactions that you didn't realize were
> necessary.
>
> -- greg http://mit.edu/~gsstark/resume.pdf


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Idle in transaction help

2009-07-10 Thread Erik Jones


On Jul 10, 2009, at 3:34 PM, Scott Marlowe wrote:


Assuming that tracking down the process that's connected might help,
you can use pg_stat_activity to find the port that the client is
connecting from, then on the client machine, use lsof to hunt down the
process that is connecting via that port.

For instance, I connect from my laptop with two connections.  One I do
a begin; in and in the other I look it up like so:

select * from pg_stat_activity where current_query ilike
'%idle%trans%' and current_query not ilike 'select%';
datid | datname  | procpid | usesysid | usename  | current_query
  | waiting |  xact_start   |  query_start
 | backend_start | client_addr  | client_port
---+--+-+--+-- 
+---+-+--- 
+---+--- 
+--+-

11511 | postgres |   24893 |16413 | smarlowe |  in
transaction | f   | 2009-07-10 16:20:15.056385-06 | 2009-07-10
16:20:15.056385-06 | 2009-07-10 15:27:48.944738-06 | 192.168.0.74 |
  48727

The client port is 48727.  Now, on my laptop I can do:

sudo lsof |grep 48727 and I have this line in there:

psql  27964   smarlowe3u IPv41114765
 TCP steamboat:48727->192.168.0.247:postgresql (ESTABLISHED)


Just a little tidbit for that: you can have lsof tell you what's got  
that port open directly, no need for grep:


lsof -i tcp:48727

that way you keep the column headers in the output.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Idle in transaction help

2009-07-10 Thread Scott Marlowe
On Fri, Jul 10, 2009 at 4:40 PM, Scot Kreienkamp wrote:
> Thanks scott, but I wrote a cgi to combine all of the process info and allow
> me to kill errant queries. So I know how to track down the pid. Thanks for
> trying to help though. :-)

So, what are you looking for, a stack trace dump from java to look
through maybe? (the one that kill -1 or whatever generates? It's been
a few years.)  That'll usually give you the context to find out which
thread is where.  P.s. no need to pat me on the head like the doggie.
:)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BR/

2009-07-10 Thread James B. Byrne

On Fri, July 10, 2009 18:13, Alvaro Herrera wrote:

>
> Is it using a different PG connection than the one doing the
> insert?  In that case, it won't see the new row until the
> inserting transaction commits.

That is almost certainly the exact problem.  I will check and
determine if this is so but I it seems to me unavoidable that
launching a new shell for the script under test will cause another,
different, connection to be used. If this proves the case then I
will report back.  If not then no doubt you will hear from me as
well.

Thank you for illuminating this for me.

>
> BTW it seems necessary to clarify that LOCATION lines correspond
> to the LOG/NOTICE/WARNING/ERROR line immediately _above_ it, not
> the one below.
>

So noted, with thanks.

Regards,

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BR/

2009-07-10 Thread Scott Marlowe
On Fri, Jul 10, 2009 at 4:53 PM, James B. Byrne wrote:
>
> On Fri, July 10, 2009 18:48, Scott Marlowe wrote:
>> On Fri, Jul 10, 2009 at 2:13 PM, James B.
>> Byrne wrote:
>>>
>>> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
>>> hll_theheart_db_admin : LOCATION:  exec_simple_query,
>>> postgres.c:1105
>>> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
>>> hll_theheart_db_admin : LOG:  0: duration: 0.782 ms
>>>  statement:
>>> SELECT * FROM "currencies"
>>>
>>> The client program that receives this result reports that there
>>> are
>>> no rows returned. So where did they go"?
>>
>> Maybe there were no rows to return??
>>
>
> Clearly there are no rows.  That is the problem. The question
> begging an answer is: where are the rows added in the immediately
> previous INSERTS?

Trigger or rule put them somewhere else or ignored them?  Hard to say without

\d currencies

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BR/

2009-07-10 Thread James B. Byrne

On Fri, July 10, 2009 18:48, Scott Marlowe wrote:
> On Fri, Jul 10, 2009 at 2:13 PM, James B.
> Byrne wrote:
>>
>> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
>> hll_theheart_db_admin : LOCATION:  exec_simple_query,
>> postgres.c:1105
>> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
>> hll_theheart_db_admin : LOG:  0: duration: 0.782 ms
>>  statement:
>> SELECT * FROM "currencies"
>>
>> The client program that receives this result reports that there
>> are
>> no rows returned. So where did they go"?
>
> Maybe there were no rows to return??
>

Clearly there are no rows.  That is the problem. The question
begging an answer is: where are the rows added in the immediately
previous INSERTS?

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] change location of postmaster.pid file?

2009-07-10 Thread John R Pierce

Greg Stark wrote:

It won't work even a little bit before 8.3. For 8.3 or later you could
maybe make it work using vacuum freeze but there's no facility to
verify that it's really frozen everything and you'll still be taken by
surprise by queries which try to use temporary space for large sorts
or commands which start transactions that you didn't realize were
necessary.
  


or pg_stats or ..


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BR/

2009-07-10 Thread Scott Marlowe
On Fri, Jul 10, 2009 at 2:13 PM, James B. Byrne wrote:
>
> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOCATION:  exec_simple_query,
> postgres.c:1105
> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOG:  0: duration: 0.782 ms  statement:
> SELECT * FROM "currencies"
>
> The client program that receives this result reports that there are
> no rows returned. So where did they go"?

Maybe there were no rows to return??

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] change location of postmaster.pid file?

2009-07-10 Thread Greg Stark
On Fri, Jul 10, 2009 at 11:17 PM, Janet Jacobsen wrote:
> Is it possible to create a database cluster on a machine that
> has write access to the shared file system, shut down the
> Postgres server on that machine, and then start up the
> Postgres server on the machine that cannot write to the
> shared file system, and thereafter, *only query* the database.

Postgres isn't really designed to work this way. It expects to have
write access and will occasionally still write stuff to disk even for
read-only queries.

It won't work even a little bit before 8.3. For 8.3 or later you could
maybe make it work using vacuum freeze but there's no facility to
verify that it's really frozen everything and you'll still be taken by
surprise by queries which try to use temporary space for large sorts
or commands which start transactions that you didn't realize were
necessary.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] change location of postmaster.pid file?

2009-07-10 Thread Tom Lane
Janet Jacobsen  writes:
> Is it possible to create a database cluster on a machine that
> has write access to the shared file system, shut down the
> Postgres server on that machine, and then start up the
> Postgres server on the machine that cannot write to the
> shared file system, and thereafter, *only query* the database.

No.  The pid file is only the first and smallest problem you'd run into
with a read-only database filesystem.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Idle in transaction help

2009-07-10 Thread Scot Kreienkamp
Thanks scott, but I wrote a cgi to combine all of the process info and allow me 
to kill errant queries. So I know how to track down the pid. Thanks for trying 
to help though. :-)

- Original Message -
From: Scott Marlowe 
To: Scot Kreienkamp
Cc: pgsql-general@postgresql.org 
Sent: Fri Jul 10 18:34:14 2009
Subject: Re: [GENERAL] Idle in transaction help

On Fri, Jul 10, 2009 at 2:05 PM, Scot Kreienkamp wrote:
> Hi everyone,
>
> I need some help with tracking down idle in transaction problems.  We have a
> custom application that is leaving queries in idle in transaction status for
> unknown reasons.  The developers are working on ways to track it down, but
> right now the options on their end are limited and it will be at least 1-2
> months until they have something working.  I am trying to track down the
> problem from the PG end in the meantime.  Is there any way to tell what
> query is hanging in idle in transaction status?  Or what the current or
> previous query was/is, since idle in transaction doesn’t tell me anything?
> I’m kind of at a loss on what if anything I can do from the database end to
> help (read push) the programmers to find and fix this problem.
>
>
>
> My first priority is helping them find and fix the problem if I can.  My
> second priority is finding an automated way to deal with the idle in
> transactions as they are locking tables and rows, causing other transactions
> to hang also.  None of the timeouts appear to affect idle in transactions.

Assuming that tracking down the process that's connected might help,
you can use pg_stat_activity to find the port that the client is
connecting from, then on the client machine, use lsof to hunt down the
process that is connecting via that port.

For instance, I connect from my laptop with two connections.  One I do
a begin; in and in the other I look it up like so:

select * from pg_stat_activity where current_query ilike
'%idle%trans%' and current_query not ilike 'select%';
 datid | datname  | procpid | usesysid | usename  | current_query
   | waiting |  xact_start   |  query_start
  | backend_start | client_addr  | client_port
---+--+-+--+--+---+-+---+---+---+--+-
 11511 | postgres |   24893 |16413 | smarlowe |  in
transaction | f   | 2009-07-10 16:20:15.056385-06 | 2009-07-10
16:20:15.056385-06 | 2009-07-10 15:27:48.944738-06 | 192.168.0.74 |
   48727

The client port is 48727.  Now, on my laptop I can do:

sudo lsof |grep 48727 and I have this line in there:

psql  27964   smarlowe3u IPv41114765
  TCP steamboat:48727->192.168.0.247:postgresql (ESTABLISHED)

Note that 27964 is the pid of the psql command that's connected to the
server.  Hope that helps a little.


Re: [GENERAL] Idle in transaction help

2009-07-10 Thread Scott Marlowe
On Fri, Jul 10, 2009 at 2:05 PM, Scot Kreienkamp wrote:
> Hi everyone,
>
> I need some help with tracking down idle in transaction problems.  We have a
> custom application that is leaving queries in idle in transaction status for
> unknown reasons.  The developers are working on ways to track it down, but
> right now the options on their end are limited and it will be at least 1-2
> months until they have something working.  I am trying to track down the
> problem from the PG end in the meantime.  Is there any way to tell what
> query is hanging in idle in transaction status?  Or what the current or
> previous query was/is, since idle in transaction doesn’t tell me anything?
> I’m kind of at a loss on what if anything I can do from the database end to
> help (read push) the programmers to find and fix this problem.
>
>
>
> My first priority is helping them find and fix the problem if I can.  My
> second priority is finding an automated way to deal with the idle in
> transactions as they are locking tables and rows, causing other transactions
> to hang also.  None of the timeouts appear to affect idle in transactions.

Assuming that tracking down the process that's connected might help,
you can use pg_stat_activity to find the port that the client is
connecting from, then on the client machine, use lsof to hunt down the
process that is connecting via that port.

For instance, I connect from my laptop with two connections.  One I do
a begin; in and in the other I look it up like so:

select * from pg_stat_activity where current_query ilike
'%idle%trans%' and current_query not ilike 'select%';
 datid | datname  | procpid | usesysid | usename  | current_query
   | waiting |  xact_start   |  query_start
  | backend_start | client_addr  | client_port
---+--+-+--+--+---+-+---+---+---+--+-
 11511 | postgres |   24893 |16413 | smarlowe |  in
transaction | f   | 2009-07-10 16:20:15.056385-06 | 2009-07-10
16:20:15.056385-06 | 2009-07-10 15:27:48.944738-06 | 192.168.0.74 |
   48727

The client port is 48727.  Now, on my laptop I can do:

sudo lsof |grep 48727 and I have this line in there:

psql  27964   smarlowe3u IPv41114765
  TCP steamboat:48727->192.168.0.247:postgresql (ESTABLISHED)

Note that 27964 is the pid of the psql command that's connected to the
server.  Hope that helps a little.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] change location of postmaster.pid file?

2009-07-10 Thread Janet Jacobsen
Hi.  We are looking into the possibility of running a Postgres
server on an underutilized machine.  This machine has very
little local disk space, so we would have to create the data
directory on a shared file system.

The underutilized machine was set up so that it can *only
read* from the shared file system, i.e., cannot write to the
shared file system.

Is it possible to create a database cluster on a machine that
has write access to the shared file system, shut down the
Postgres server on that machine, and then start up the
Postgres server on the machine that cannot write to the
shared file system, and thereafter, *only query* the database.

Since Postgres writes the postmaster.pid file to the data
directory (which would be on the shared file system), the
answer would appear to be no, since the 'underutilized'
machine cannot write any files to the shared file system.

Would it be possible to write the postmaster.pid file to the
local file system on the 'underutilized' machine even though
the data directory is on the shared file system? 

I realize that this seems like a bad idea - given that the
purpose of the postmaster.pid file as I understand it is to
prevent more than one postmaster running in a data
directory - but I wanted to ask whether this is a possibility.

Even if it were possible to write the postmaster.pid to the
local file system on the 'underutilized' machine, does Postgres
write other temporary files even if only SELECT statements
are being executed against the database?  And where does it
write those files - in subdirectories of the data directory?

Thank you,
Janet

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BR/

2009-07-10 Thread Alvaro Herrera
James B. Byrne wrote:

> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOG:  0: duration: 0.782 ms  statement:
> SELECT * FROM "currencies"
> 
> The client program that receives this result reports that there are
> no rows returned. So where did they go"?

Is it using a different PG connection than the one doing the insert?  In
that case, it won't see the new row until the inserting transaction
commits.

BTW it seems necessary to clarify that LOCATION lines correspond to the
LOG/NOTICE/WARNING/ERROR line immediately _above_ it, not the one below.
So if you see this:

LOG: foo bar
LOCATION: somewhere line N
ERROR: baz qux
LOCATION: another line

you know what to make of it, and it's not this:

LOCATION: somewhere line N
ERROR: baz qux

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] psql language

2009-07-10 Thread Stuart McGraw
Hiroshi Saito wrote:
> Is the state where you wish this?
>
>  example 
> C:\Program Files\PostgreSQL\8.4\bin>psql -p 5433 postgres postgres
> psql (8.4.0)
> "help" でヘルプを表示します.
> 
> C:\Program Files\PostgreSQL\8.4\bin>set LANG=C
> 
> C:\Program Files\PostgreSQL\8.4\bin>psql -p 5433 postgres postgres
> psql (8.4.0)
> Type "help" for help.
> 
> Have I missed something?

Saito-san,

No, you missed nothing :-)  That is exactly what I wanted.
I thought I tried setting LANG before I posted but I must 
have done something wrong because it works fine now.
ありがとうございます!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Idle in transaction help

2009-07-10 Thread John R Pierce

Scot Kreienkamp wrote:

It is Java.  I asked our programmers to check on the JDBC version as I
had seen that on the list previously.  It is using postgresql-8.2-504.
Is that one of the problem versions? I had thought it was new enough
that it would not be subject to that problem.
  


well, the current JDBC for 8.2 is build 510.  see 
http://jdbc.postgresql.org/download.html


It does appear the idle-in-transaction bug I'm remembering is pretty 
old...  it was fixed in Version 8.0-dev302 (2004-06-15), heh.  your 
version is from 2006-12-01. I would still consider upgrading, there 
have been a lot of fixes and enhancements between 504 and 510...  See 
http://jdbc.postgresql.org/changes.html  for a revision history.



Remember, if you have autocommit OFF, then even read-only (select-only) 
connections need commits, or they stay in transaction.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] is autovacuum recommended?

2009-07-10 Thread Scott Marlowe
On Fri, Jul 10, 2009 at 2:47 PM, Willy-Bas Loos wrote:
> Hi,
>
> Thanks for your answers!
> I'm using 8.1 and 8.2 on windows2003 servers, and it's true that i could
> probably configure them much better.

Note that support for 8.1 on windows is gone, as it is no longer
considered supportable due to design / engineering issues.  Upgrading
them all to 8.3 or 8.4 is probably a good idea since they are better
at autovacuum and such than 8.2 and before.

> We've recently moved to brand new dedicated database servers with pg8.3 on
> debian in 2 projects and it has been much easier to configure these
> correctly. There I don't encounter the probems that i described.

Smart move.  The amount of effort needed to learn debian or any other
linux distro is usually less than the amount of ongoing effort to keep
a production pg server happy on windows.  Also, 64 bit pgsql on 64 bit
unix/linux is capable of better scaling and handling more memory.

> The thing is that the whole concept of autovacuum is not feeling right.
> Per design, the vacuum is likely to kick off when i am doing something big.

That assumes that autovacuum always runs in some mode that must
interfere with db operation.  If you set the
autovacuum_vacuum_cost_delay, autovacuum_vacuum_cost_limit,
autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold parameters
properly, as well as the free space map settings large enough to hold
all your dead tuples, then autovacuum should not cause a lot of
issues, unless your machine is already IO bound.  And if it's already
IO bound and dragging butt, then the problem isn't autovacuum, but a
machine without enough IO bandwidth to do its job well.

> And when i am doing something big, a vacuum is the last thing i'd wish for.

I don't even notice when it kicks in on my servers.

> I'd wish for a vacuum when the database is doing nothing at all, but the
> autovacuum will NEVER kick off in such a moment.

Again, if the delay and such are set right, then autovac will use so
little IO as to be unnoticeable.

> That's why i feel better scheduling the vacuum at times at which i know
> things will be generally quiet.

For some very large tables on slow IO machines, it makes sense to
remove them from the purview of autovac, I agree. Take a look at the
pg_autovacuum table. it's pretty easy to see how it works.

> To be honest, i am a bit surprised that all 3 reactions recommend using
> autovacuum, even if it means i have to buy a new server for this purpouse.

Well, autovacuum is more primitive in the versions you're running, and
more likely to get in the way.  8.3, and to a greater extent 8.4,
remove a lot of these issues.

> I was thinking that autovacuum was just a mechanism to ensure that postgres
> works well out of the box, but that it would be recommended to schedule your
> own vacuum tailored to your specific needs.
> I agree though, that it is a tough tailoring job and that the autovacuum
> must be doing a better job than i am. It just fires at the wrong time.

And that's kind of the point, that it's better to have a db that runs
a little slow than one that explodes in sheets of flame.

> Just a thought (to think positively..): wouldn't it be possible to let the
> autovacuum wait until the load goes down, or until the end of the
> transaction that triggered the autovacuum?

You could set up a cron job that updated the pg_autovacuum table at
certain times to accomplish this.  I'm not sure how easy it would be
to program autovac to do the same thing.  You could certainly set the
cost delay higher than normal (like 20 or 40 ms) for some tables so
that autovac didn't get in the way, but then you run the risk of it
never keeping up, and on 8.1 with only one thread to autovac, that
could be bad.  Definitely consider upgrading pg versions on your
windows machines.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store genetic data

2009-07-10 Thread Steve Crawford

Peter Hunsberger wrote:
We're looking at potentially using Postgres to store a variety of 
molecular and genetic data.  At this point I have a bunch of general 
questions...


I don't know enough about your area of expertise to know if this is 
useful, but I'd look at the Unison project to see what they are doing. 
There are also videos/slides up from a recent San Francisco PUG meeting 
at UCSF that might be of interest:


http://www.vimeo.com/3732938
http://www.slideshare.net/linuxpoet/unison-ucsf-sfpug


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] is autovacuum recommended?

2009-07-10 Thread Willy-Bas Loos
Hi,

Thanks for your answers!
I'm using 8.1 and 8.2 on windows2003 servers, and it's true that i could
probably configure them much better.
We've recently moved to brand new dedicated database servers with pg8.3 on
debian in 2 projects and it has been much easier to configure these
correctly. There I don't encounter the probems that i described.

The thing is that the whole concept of autovacuum is not feeling right.
Per design, the vacuum is likely to kick off when i am doing something big.
And when i am doing something big, a vacuum is the last thing i'd wish for.
I'd wish for a vacuum when the database is doing nothing at all, but the
autovacuum will NEVER kick off in such a moment.

That's why i feel better scheduling the vacuum at times at which i know
things will be generally quiet.

To be honest, i am a bit surprised that all 3 reactions recommend using
autovacuum, even if it means i have to buy a new server for this purpouse.
I was thinking that autovacuum was just a mechanism to ensure that postgres
works well out of the box, but that it would be recommended to schedule your
own vacuum tailored to your specific needs.
I agree though, that it is a tough tailoring job and that the autovacuum
must be doing a better job than i am. It just fires at the wrong time.

Just a thought (to think positively..): wouldn't it be possible to let the
autovacuum wait until the load goes down, or until the end of the
transaction that triggered the autovacuum?

Cheers,

WBL


Re: [GENERAL] Inserted data is disappearing

2009-07-10 Thread James B. Byrne

On Fri, July 10, 2009 16:20, Bill Moran wrote:
>
>
> Also, look for a BEGIN statement that is never COMMITed.  If
> the client starts a transaction, INSERTs a bunch of stuff, then
> disconnects without issuing a COMMIT, Postgres will rollback
> the transaction, thus it will be as if the data was never
> inserted.
>

There is one ROLLBACK statement, but it occurs after all of the
problems have evidenced themselves and not before. I believe this to
be the Rails test harness unrolling the transaction that it wraps
all test runs in.  There is one BEGIN. This is located close to the
very top of the run log, which seems congruent with the one ROLLBACK
just before the very end.

Evidently, all this test processing takes place within a single,
never completed, transaction.

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Idle in transaction help

2009-07-10 Thread Scot Kreienkamp
Hi John,

It is Java.  I asked our programmers to check on the JDBC version as I
had seen that on the list previously.  It is using postgresql-8.2-504.
Is that one of the problem versions? I had thought it was new enough
that it would not be subject to that problem.

The unexplained part is why are there locks acquired, sometimes on the
row level, prior to the connection going to idle in transaction status?
That makes me think it's not the JDBC driver.  

Thanks,
 
Scot Kreienkamp

-Original Message-
From: John R Pierce [mailto:pie...@hogranch.com] 
Sent: Friday, July 10, 2009 4:21 PM
To: Scot Kreienkamp
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Idle in transaction help

Scot Kreienkamp wrote:
>
> Hi everyone,
>
> I need some help with tracking down idle in transaction problems. We 
> have a custom application that is leaving queries in idle in 
> transaction status for unknown reasons. The developers are working on 
> ways to track it down, but right now the options on their end are 
> limited and it will be at least 1-2 months until they have something 
> working. I am trying to track down the problem from the PG end in the 
> meantime. Is there any way to tell what query is hanging in idle in 
> transaction status? Or what the current or previous query was/is, 
> since idle in transaction doesn't tell me anything? I'm kind of at a 
> loss on what if anything I can do from the database end to help (read 
> push) the programmers to find and fix this problem.
>


there is no active query, thats why its idle. they did a "BEGIN" to 
start a transaction, then left the connection idle.

is this software, by any chance, Java based? older versions of the 
Postgres JDBC module had a nasty habit of doing this, as JDBC 
autogenerates the BEGIN if its not in autocommit mode. the older version

would generate the begin immediately after a COMMIT or ROLLBACK to 
prepare for the next transaction, and if the app simply stopped using 
the connection, it was left IDLE IN TRANSACTION. The updated version 
postpones the BEGIN until you issue your first query.

if you enable statement logging and set up a log prefix to show the 
Process ID (and I usually prefix with a timestamp, database name and 
other useful stuff), then you can grep the logs for the PID of the IDLE 
IN TRANSACTION process. Note logging all statements is pretty CPU and 
disk intensive, so likely will impact your system performance, so should

only be done for debug purposes.






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Fwd: Re: [GENERAL] How to trace client sql requests?]

2009-07-10 Thread James B. Byrne


On Fri, July 10, 2009 16:10, hubert depesz lubaczewski wrote:

>
> truncate. but first simple question - did you commit the inserts?
>

But if it were done with truncate then I would see truncate in the
log file, yes?

Second, I am working with PG through an ORM called ActiveRecord,
part of the Rails framework.  I do not see a COMMIT anywhere in the
log. Should I?  This is a test run using the Rails test, actually
cucumber, environment.  I know that they do some things differently
with DB connections in this environment but I believe that this is
limited to transactions and rollbacks.

The thing is that this problem only arises when testing the script
inside the test harness. In production it runs just fine.  Further,
if I list the contents of the table from a call in the test harness
immediately prior to executing the script under test then the data
is there.

I do not know what is going on.

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Idle in transaction help

2009-07-10 Thread John R Pierce

Scot Kreienkamp wrote:


Hi everyone,

I need some help with tracking down idle in transaction problems. We 
have a custom application that is leaving queries in idle in 
transaction status for unknown reasons. The developers are working on 
ways to track it down, but right now the options on their end are 
limited and it will be at least 1-2 months until they have something 
working. I am trying to track down the problem from the PG end in the 
meantime. Is there any way to tell what query is hanging in idle in 
transaction status? Or what the current or previous query was/is, 
since idle in transaction doesn’t tell me anything? I’m kind of at a 
loss on what if anything I can do from the database end to help (read 
push) the programmers to find and fix this problem.





there is no active query, thats why its idle. they did a "BEGIN" to 
start a transaction, then left the connection idle.


is this software, by any chance, Java based? older versions of the 
Postgres JDBC module had a nasty habit of doing this, as JDBC 
autogenerates the BEGIN if its not in autocommit mode. the older version 
would generate the begin immediately after a COMMIT or ROLLBACK to 
prepare for the next transaction, and if the app simply stopped using 
the connection, it was left IDLE IN TRANSACTION. The updated version 
postpones the BEGIN until you issue your first query.


if you enable statement logging and set up a log prefix to show the 
Process ID (and I usually prefix with a timestamp, database name and 
other useful stuff), then you can grep the logs for the PID of the IDLE 
IN TRANSACTION process. Note logging all statements is pretty CPU and 
disk intensive, so likely will impact your system performance, so should 
only be done for debug purposes.







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Inserted data is disappearing

2009-07-10 Thread Bill Moran
In response to "James B. Byrne" :
> 
> This is a portion of the log for the most recent run that exhibits
> the problem:
> 
> ...
> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOG:  0: duration: 0.446 ms  statement:
> INSERT INTO "currencies" ("is_invoicable", "is_payable",
> "changed_by", "created_by", "premium_factor", "discount_factor",
> "effective_from", "currency_name", "superseded_after", "changed_at",
> "currency_code", "created_at") VALUES('f', 'f', E'not available',
> E'not available', 0.9, 1.1, '1785-07-06 04:56:02.00', E'United
> States Dollar', NULL, '2009-07-10 19:59:17', E'USD', '2009-07-10
> 19:59:17.634473')
> 
> That seems a valid insert.
> 
> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOCATION:  exec_simple_query,
> postgres.c:1105
> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOG:  0: duration: 0.172 ms  statement:
> SELECT currval('currencies_id_seq')
> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOCATION:  exec_simple_query,
> postgres.c:1105
> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOG:  0: duration: 0.067 ms  statement:
> RELEASE SAVEPOINT active_record_1
> 
> This seems ok but the absence of proof does not ensure the absence
> of error.  If the insert failed would I see this fact reflected in a
> log entry?

Unless you have some really bizarre config in your postgresql.conf,
then a failure of that insert would result in a logged error message.

Why not just intentionally try an invalid insert statement to be
sure that it logs.

> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOCATION:  exec_simple_query,
> postgres.c:1105
> 2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
> hll_theheart_db_admin : LOG:  0: duration: 0.782 ms  statement:
> SELECT * FROM "currencies"
> 
> The client program that receives this result reports that there are
> no rows returned. So where did they go"?

What happens between the INSERT and the SELECT?  Are there DELETE,
TRUNCATE, or ROLLBACK statements?

Also, look for a BEGIN statement that is never COMMITed.  If the client
starts a transaction, INSERTs a bunch of stuff, then disconnects without
issuing a COMMIT, Postgres will rollback the transaction, thus it will
be as if the data was never inserted.

HTH.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BR/

2009-07-10 Thread James B. Byrne
I am sorry for this but I do not know how else to communicate what
is apparently happening:

This is a portion of the log for the most recent run that exhibits
the problem:

...
2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
hll_theheart_db_admin : LOG:  0: duration: 0.446 ms  statement:
INSERT INTO "currencies" ("is_invoicable", "is_payable",
"changed_by", "created_by", "premium_factor", "discount_factor",
"effective_from", "currency_name", "superseded_after", "changed_at",
"currency_code", "created_at") VALUES('f', 'f', E'not available',
E'not available', 0.9, 1.1, '1785-07-06 04:56:02.00', E'United
States Dollar', NULL, '2009-07-10 19:59:17', E'USD', '2009-07-10
19:59:17.634473')

That seems a valid insert.

2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
hll_theheart_db_admin : LOCATION:  exec_simple_query,
postgres.c:1105
2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
hll_theheart_db_admin : LOG:  0: duration: 0.172 ms  statement:
SELECT currval('currencies_id_seq')
2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
hll_theheart_db_admin : LOCATION:  exec_simple_query,
postgres.c:1105
2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
hll_theheart_db_admin : LOG:  0: duration: 0.067 ms  statement:
RELEASE SAVEPOINT active_record_1

This seems ok but the absence of proof does not ensure the absence
of error.  If the insert failed would I see this fact reflected in a
log entry?

2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
hll_theheart_db_admin : LOCATION:  exec_simple_query,
postgres.c:1105
2009-07-10 15:59:17 EDT hll_theheart_test 216.185.71.24(49133)
hll_theheart_db_admin : LOG:  0: duration: 0.782 ms  statement:
SELECT * FROM "currencies"

The client program that receives this result reports that there are
no rows returned. So where did they go"?



-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using Postgres to store genetic data

2009-07-10 Thread Peter Hunsberger
We're looking at potentially using Postgres to store a variety of molecular
and genetic data.  At this point I have a bunch of general questions which I
can take to other lists if someone can tell me where
they would be most appropriate:
1) are there groups or individuals already doing this that have things they
can share?  I've seen the BLASTgres work and it needs rework to compile
against 8.3 but might be potentially useful some of our forms of data;

2) if we do decided to implement our own data types where's the best place
for discussion about such efforts?

3) Can anyone talk about what it would take to add columnar indexes to
Postgres?  Where would be the best place for discussion about what this
would take?

I can go into more details here if appropriate and if that will help...

-- 
Peter Hunsberger


Re: [GENERAL] How to trace client sql requests?

2009-07-10 Thread hubert depesz lubaczewski
On Fri, Jul 10, 2009 at 03:45:35PM -0400, James B. Byrne wrote:
> I believe that this is what I want to examine.  Is there a server
> side technique that I can use which will tell me what data this
> statement returned or if it found nothing?

not really, sorry.

> In any case, I see the INSERTS and I can find NO DELETES at all.  Is
> there any other way to remove some or all data from a table?

truncate. but first simple question - did you commit the inserts?

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Idle in transaction help

2009-07-10 Thread Scot Kreienkamp
Hi everyone,

 

I need some help with tracking down idle in transaction problems.  We
have a custom application that is leaving queries in idle in transaction
status for unknown reasons.  The developers are working on ways to track
it down, but right now the options on their end are limited and it will
be at least 1-2 months until they have something working.  I am trying
to track down the problem from the PG end in the meantime.  Is there any
way to tell what query is hanging in idle in transaction status?  Or
what the current or previous query was/is, since idle in transaction
doesn't tell me anything?  I'm kind of at a loss on what if anything I
can do from the database end to help (read push) the programmers to find
and fix this problem.   

 

My first priority is helping them find and fix the problem if I can.  My
second priority is finding an automated way to deal with the idle in
transactions as they are locking tables and rows, causing other
transactions to hang also.  None of the timeouts appear to affect idle
in transactions.  

 

I'm on PG 8.3.7.  Upgrading to 8.4 won't be an option for several
months, enough time for the developers to test and develop against 8.4.

 

Thanks for any help you can give me.

 

Scot Kreienkamp



Re: [GENERAL] How to trace client sql requests?

2009-07-10 Thread James B. Byrne

On Fri, July 10, 2009 14:58, hubert depesz lubaczewski wrote:
> You can enable by database:
>
> alter database x set log_min_duration_statement = 0;


Many, many thanks.  Now of course I need more help...

The situation is that data inserted into the DB is not being found
on a subsequent select and I am unaware of any deletes being done. 
So, I am hoping to find where the data is going or why the select is
not working.

This is the critical insert:

...
2009-07-10 15:13:00 EDT hll_theheart_test 216.185.71.24(57637)
hll_theheart_db_admin : LOCATION:  exec_simple_query,
postgres.c:1105

2009-07-10 15:13:00 EDT hll_theheart_test 216.185.71.24(57637)
hll_theheart_db_admin : LOG:  0: duration: 1.366 ms  statement:
INSERT INTO "currencies" ("is_invoicable", "is_payable",
"changed_by", "created_by", "premium_factor", "discount_factor",
"effective_from", "currency_name", "superseded_after", "changed_at",
"currency_code", "created_at") VALUES('f', 'f', E'not available',
E'not available', 1.0, 1.0, '1858-01-01 04:56:02.00', E'Canadian
Dollar', NULL, '2009-07-10 19:13:00', E'CAD', '2009-07-10
19:13:00.151885')

2009-07-10 15:13:00 EDT hll_theheart_test 216.185.71.24(57637)
hll_theheart_db_admin : LOCATION:  exec_simple_query,
postgres.c:1105

2009-07-10 15:13:00 EDT hll_theheart_test 216.185.71.24(57637)
hll_theheart_db_admin : LOG:  0: duration: 0.379 ms  statement:
SELECT currval('currencies_id_seq')

2009-07-10 15:13:00 EDT hll_theheart_test 216.185.71.24(57637)
hll_theheart_db_admin : LOCATION:  exec_simple_query,
postgres.c:1105

2009-07-10 15:13:00 EDT hll_theheart_test 216.185.71.24(57637)
hll_theheart_db_admin : LOG:  0: duration: 0.073 ms  statement:
RELEASE SAVEPOINT active_record_1
...

This seems to have worked.  Would the log show if it did not?

The I see a bunch of these:

2009-07-10 15:13:02 EDT hll_theheart_test 216.185.71.24(57638)
hll_theheart_db_admin : LOG:  0: duration: 0.082 ms  statement:
SET client_min_messages TO 'notice'

2009-07-10 15:13:02 EDT hll_theheart_test 216.185.71.24(57638)
hll_theheart_db_admin : LOCATION:  exec_simple_query,
postgres.c:1105

2009-07-10 15:13:02 EDT hll_theheart_test 216.185.71.24(57638)
hll_theheart_db_admin : LOG:  0: duration: 6.155 ms  statement: 
   SELECT a.attname, format_type(a.atttypid, a.atttypmod),
d.adsrc, a.attnotnull

2009-07-10 15:13:02 EDT hll_theheart_test 216.185.71.24(57638)
hll_theheart_db_admin : LOCATION:  exec_simple_query,
postgres.c:1105

2009-07-10 15:13:02 EDT hll_theheart_test 216.185.71.24(57638)
hll_theheart_db_admin : LOG:  0: duration: 1.285 ms  statement: 
   SELECT a.attname, format_type(a.atttypid, a.atttypmod),
d.adsrc, a.attnotnull

and finally, I get a long list of these:

2009-07-10 15:13:02 EDT hll_theheart_test 216.185.71.24(57638)
hll_theheart_db_admin : LOG:  0: duration: 1.779 ms  statement:
SELECT * FROM "currencies" WHERE ("currencies"."currency_code" =
E'CAD')  LIMIT 1
...

I believe that this is what I want to examine.  Is there a server
side technique that I can use which will tell me what data this
statement returned or if it found nothing?

In any case, I see the INSERTS and I can find NO DELETES at all.  Is
there any other way to remove some or all data from a table?


-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Dan Armbrust
> Hm, I'm not sure I believe any of that except the last bit, seeing that
> he's got plenty of excess CPU capability.  But the last bit fits with
> the wimpy-I/O problem, and it also offers something we could test.
> Dan, please see what happens when you vary the wal_buffers setting.
> (Note you need a postmaster restart to change that.)
>
>                        regards, tom lane
>

Ok, I tried a few different values - 32kb, 64kb, 512kb, 2MB and 10MB.

I'm not seeing any highly noticeable change in behaviour with any
setting - it wasn't a scientific test, but I seem to have about the
same size hiccup with each setting.  The hiccup may be slightly
shorter with the 10MB setting, but barely, if it is.

Thanks,

Dan

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to trace client sql requests?

2009-07-10 Thread hubert depesz lubaczewski
On Fri, Jul 10, 2009 at 01:38:57PM -0400, James B. Byrne wrote:
> I have a situation with a Rails project where test data in
> mysteriously "disappearing" in the middle of a test run.  I would
> like to see the exact SQL of all client requests issued against a
> single table during a fixed time span.
> How can I best accomplish this in PostgreSQL?
> #client_min_messages = notice
> #log_min_messages = notice
> #log_min_duration_statement = -1
> ...
> #log_duration = off
> 
> Which of these, if any, should I alter; and to what?  Am I

I prefer to set log_min_duration_statement to 0. It will log all queries
and their running time.

> constrained to system wide logging or can this be enabled by
> database?

You can enable by database:

alter database x set log_min_duration_statement = 0;

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Jeff Davis
On Fri, 2009-07-10 at 18:47 +0100, Greg Stark wrote:
> > -- foo has a primary key

> Well no, it's equivalent to SELECT DISTINCT * FROM foo;

I think you missed that "foo" has a primary key.

Regards,
Jeff Davis


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Scott Marlowe
On Fri, Jul 10, 2009 at 11:47 AM, Greg Stark wrote:
> On Fri, Jul 10, 2009 at 6:37 PM, Jeff Davis wrote:
>>
>> -- foo has a primary key
>> SELECT * FROM foo UNION SELECT * FROM foo;
>>
>> That's logically equivalent to:
>>
>> SELECT * FROM foo;
>>
>> But postgresql will add a sort anyway.
>
>
> Well no, it's equivalent to SELECT DISTINCT * FROM foo;

And honestly, I'd rather see development effort go into making complex
queries run faster (the things like bitmap indexes on disk etc) rather
than optimising things that i can optimise myself.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] XML import with DTD

2009-07-10 Thread artacus
Post a snippet of the xml and xpath you are trying to use. 


Scott 

- Original Message - 
From: "Roy Walter"  
To: pgsql-general@postgresql.org 
Sent: Friday, July 10, 2009 7:49:00 AM GMT -08:00 US/Canada Pacific 
Subject: [GENERAL] XML import with DTD 

Hi 

I'm trying to use the XPath functionality of Postgres. 

I can populate a text field (unparsed) with XML data but as far as I can see 
the xpath() function [now] only works on the xml data type. 

When I try to populate a text field with XML data containing a DTD, however, 
the parser chokes. If I strip the DTD the parser chokes on undefined entities 
which are defined in the DTD. 

(I switched the app' to from MySQL to Postgres because while MySQL works it 
returns matches in undelimited form which is next to useless if, for example, 
you return multiple attributes from a node.) 

Does anyone know of a solution to this problem? 

Windows 2000 Server 
Postgres 8.4 

Regards 
Roy Walter 


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Greg Stark
On Fri, Jul 10, 2009 at 6:37 PM, Jeff Davis wrote:
>
> -- foo has a primary key
> SELECT * FROM foo UNION SELECT * FROM foo;
>
> That's logically equivalent to:
>
> SELECT * FROM foo;
>
> But postgresql will add a sort anyway.


Well no, it's equivalent to SELECT DISTINCT * FROM foo;


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How to trace client sql requests?

2009-07-10 Thread James B. Byrne
I have a situation with a Rails project where test data in
mysteriously "disappearing" in the middle of a test run.  I would
like to see the exact SQL of all client requests issued against a
single table during a fixed time span.

How can I best accomplish this in PostgreSQL?

#client_min_messages = notice
#log_min_messages = notice
#log_min_duration_statement = -1
...
#log_duration = off

Which of these, if any, should I alter; and to what?  Am I
constrained to system wide logging or can this be enabled by
database?

Regards,

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] REINDEX "is not a btree"

2009-07-10 Thread decibel

On Jul 10, 2009, at 6:47 AM, Vanessa Lopez wrote:
I discovered the table that was causing the error, delete it and  
create it again (I miss some data but at least everything else is  
working now)


Yes, for the backup we copy everything we had under /data (the  
directory containing "base", "global", and so on ... we do backups  
every day from the server), and then we restore the whole /data  
directory at once ... but it did not solve the problem ..



Given the problems you've had, I strongly suggest you take a pg_dump  
of the database, restore that dump, and use the restored copy. I bet  
there's probably other problems lurking in your database.

--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Jeff Davis
On Fri, 2009-07-10 at 14:22 +0100, Simon Riggs wrote:
> I mean it seems possible to prove that the distinct removal step is not
> necessary, by proving that the various sub-queries are already disjoint.
> It's a common manual optimization, so automating it seems a reasonable
> future goal.

There are even simpler cases that postgresql can't optimize. Consider:

-- foo has a primary key
SELECT * FROM foo UNION SELECT * FROM foo;

That's logically equivalent to:

SELECT * FROM foo;

But postgresql will add a sort anyway.

There are lots of optimizations along these lines. They seem obscure,
but these optimizations become much more useful when using views or
complex queries where the same table appears multiple times. For
instance, if you have two views that are projections of the same table,
then, you join the views together, you can optimize away the join in
some cases, and just scan the original table.

I think a lot of these optimizations depend on knowing which tables (or
subqueries) are relations in the relational theory sense; i.e.
unordered, distinct, and have no NULLs in the relevant attributes.

Regards,
Jeff Davis


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SELECT DISTINCT very slow

2009-07-10 Thread Jeff Davis
On Fri, 2009-07-10 at 01:36 +0100, Greg Stark wrote:
> Arguably the missing feature here is skip-scans where we scan the
> index but only pull out one record for each distinct value. I'm not
> sure there's anything particularly stopping Postgres from being able
> to do them, but it might be a lot of code for a narrow use case.

Hypothetically, would something like a "sort distinct" operator be of
any use? I wonder how much work it would save if the sort could save
steps by weeding out duplicate tuples while sorting. That might make
sort into a better plan in cases where don't have a good estimate of the
distinct values.

Regards,
Jeff Davis


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UNION question

2009-07-10 Thread Brandon Metcalf
t == t...@sss.pgh.pa.us writes:

 t> Brandon Metcalf  writes:
 t> > I tried moving the last group of WHERE, GROUP BY, and ORDER BY before
 t> > the UNION with the query it belongs to, but that results in a
 t> > different syntax error.

 t> I think that's probably what you want to do.  What you're missing is
 t> you need parentheses to put an ORDER BY into an arm of a UNION:

 t> (SELECT ... ORDER BY ...) UNION SELECT ...

 t> Otherwise it wants to consider the ORDER BY as applying to the UNION
 t> output.


Indeed.  It was the fact that I was trying to move the ORDER BY along
with WHERE and GROUP BY that was giving me grief.

Thanks.

-- 
Brandon

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UNION question

2009-07-10 Thread Brandon Metcalf
M == matthew.hart...@krcc.on.ca writes:

 M> > > From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 M> > > ow...@postgresql.org] On Behalf Of Brandon Metcalf
 M> > > Sent: Friday, July 10, 2009 12:16 PM
 M> >
 M> > Change it to this:

 M> Sorry, I forgot that you need to split the GROUP BY clause as well in a
 M> similar manner to the WHERE clause. And unless you have duplicate rows
 M> to eliminate, use UNION ALL rather than UNION for a speed increase.


Thanks.  Got it to work.

-- 
Brandon

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UNION question

2009-07-10 Thread Tom Lane
Brandon Metcalf  writes:
> I tried moving the last group of WHERE, GROUP BY, and ORDER BY before
> the UNION with the query it belongs to, but that results in a
> different syntax error.

I think that's probably what you want to do.  What you're missing is
you need parentheses to put an ORDER BY into an arm of a UNION:

(SELECT ... ORDER BY ...) UNION SELECT ...

Otherwise it wants to consider the ORDER BY as applying to the UNION
output.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UNION question

2009-07-10 Thread Hartman, Matthew
> > From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> > ow...@postgresql.org] On Behalf Of Brandon Metcalf
> > Sent: Friday, July 10, 2009 12:16 PM
> 
> Change it to this:

Sorry, I forgot that you need to split the GROUP BY clause as well in a
similar manner to the WHERE clause. And unless you have duplicate rows
to eliminate, use UNION ALL rather than UNION for a speed increase.

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
.now.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UNION question

2009-07-10 Thread Hartman, Matthew
> -Original Message-
> From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> ow...@postgresql.org] On Behalf Of Brandon Metcalf
> Sent: Friday, July 10, 2009 12:16 PM

Change it to this:

>   SELECT t.name   AS machine_type_name,
>   j.workorder,
>   round(sum(EXTRACT(epoch FROM(j.clockout-
>   j.clockin))/3600/w.quantity_made)::numeric,2)
>   AS avgtime
>   NULLAS employees
>   FROM jobclock j
>   JOIN employee e ON e.employee_id=j.employee_id
>   JOIN machine m  ON m.machine_id=j.machine_id
>   JOIN machine_type t ON t.machine_type_id=m.machine_type_id
>   JOIN workorder wON w.workorder=j.workorder
>   JOIN part p ON p.part_id=w.part_id
>   WHERE p.part_id=379
>   UNION
>   SELECT t.name   AS machine_type_name,
>   NULLAS workorder,
>   h.time  AS avgtime,
>   employees
>   FROM part_time_historical h
>   JOIN machine_type t ON
> t.machine_type_id=h.machine_type_id
>   WHERE h.part_id=379 AND h.machine_type_id=1
>   WHERE t.machine_type_id=1
>   GROUP BY t.name,j.workorder
>   ORDER BY avgtime

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital.now.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] UNION question

2009-07-10 Thread Brandon Metcalf
Is the following even possible?  I keep getting a syntax error at the
last WHERE:

  ERROR:  syntax error at or near "WHERE"
  LINE 20: WHERE p.part_id=379 AND t.machine_type_id=1

The SQL is

  SELECT t.name   AS machine_type_name,
  j.workorder,
  round(sum(EXTRACT(epoch FROM(j.clockout-
  j.clockin))/3600/w.quantity_made)::numeric,2)
  AS avgtime
  NULLAS employees
  FROM jobclock j
  JOIN employee e ON e.employee_id=j.employee_id
  JOIN machine m  ON m.machine_id=j.machine_id
  JOIN machine_type t ON t.machine_type_id=m.machine_type_id
  JOIN workorder wON w.workorder=j.workorder
  JOIN part p ON p.part_id=w.part_id
  UNION
  SELECT t.name   AS machine_type_name,
  NULLAS workorder,
  h.time  AS avgtime,
  employees
  FROM part_time_historical h
  JOIN machine_type t ON t.machine_type_id=h.machine_type_id
  WHERE h.part_id=379 AND h.machine_type_id=1
  WHERE p.part_id=379 AND t.machine_type_id=1
  GROUP BY t.name,j.workorder
  ORDER BY avgtime

I tried moving the last group of WHERE, GROUP BY, and ORDER BY before
the UNION with the query it belongs to, but that results in a
different syntax error.

I'm basically looking to concatenate these two results.

-- 
Brandon

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Tom Lane
Simon Riggs  writes:
> I think its a traffic jam.

> After checkpoint in XLogInsert(), we discover that we now have to backup
> a block that we didn't think so previously. So we have to drop the lock
> and then re-access WALInsertLock. So every backend has to go through the
> queue twice the first time it tries to write WAL immediately after a
> checkpoint. Also, suddenly, every block needs to be copied to WAL, so
> the CRC checks make each lock holder take longer than normal, so the
> whole queue begins to backup. Then, because of wal_buffers being small
> we find that the increased volume of WAL being written causes
> WALInsertLock to be held across I/O.

Hm, I'm not sure I believe any of that except the last bit, seeing that
he's got plenty of excess CPU capability.  But the last bit fits with
the wimpy-I/O problem, and it also offers something we could test.
Dan, please see what happens when you vary the wal_buffers setting.
(Note you need a postmaster restart to change that.)

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Simon Riggs

On Fri, 2009-07-10 at 10:27 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > ISTM more likely to be a problem with checkpointing clog or subtrans.
> > That would block everybody and the scale of the problem is about right.
> 
> That's what I had been thinking too, but the log_checkpoint output
> conclusively disproves it: those steps are taking less than 20msec.

OK, I was looking at total -write, not total - write - sync.

I think its a traffic jam.

After checkpoint in XLogInsert(), we discover that we now have to backup
a block that we didn't think so previously. So we have to drop the lock
and then re-access WALInsertLock. So every backend has to go through the
queue twice the first time it tries to write WAL immediately after a
checkpoint. Also, suddenly, every block needs to be copied to WAL, so
the CRC checks make each lock holder take longer than normal, so the
whole queue begins to backup. Then, because of wal_buffers being small
we find that the increased volume of WAL being written causes
WALInsertLock to be held across I/O.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] REINDEX "is not a btree"

2009-07-10 Thread Alan Hodgson
On Friday 10 July 2009, Vanessa Lopez  wrote:
> What do you mean by we can't simply take a filesystem copy of a
> running database? :-O ... How should we then do the backups (so next
> time I will not have the same problem again) ?

There is extensive documentation on how to do backups. For filesystem 
backups, see PITR.

You might also want to examine all your backup strategies  - most running 
applications are not happy about being backed up without taking special 
steps to ensure data consistency.

-- 
Anyone who believes exponential growth can go on forever in a finite world, 
is either a madman or an economist.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] REINDEX "is not a btree"

2009-07-10 Thread Tom Lane
Vanessa Lopez  writes:
> What do you mean by we can't simply take a filesystem copy of a  
> running database? :-O ... How should we then do the backups (so next  
> time I will not have the same problem again) ?

Read the fine manual ...
http://www.postgresql.org/docs/8.3/static/backup.html
Section 24.2 explains the pitfalls of trying to use a filesystem-level
backup.  It is possible to do, but you have to be very very careful
to get a consistent snapshot.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SPI_ERROR_CONNECT within pl/pgsql, PG 8.4

2009-07-10 Thread Tom Lane
Marek Lewczuk  writes:
> I have made an upgrade to PG 8.4 and following error was thrown during
> execution of some pl/pgsql function:
> ERROR:  XX000: SPI_connect failed: SPI_ERROR_CONNECT

Really?  Could we see a self-contained example?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] XML import with DTD

2009-07-10 Thread Roy Walter

Hi

I'm trying to use the XPath functionality of Postgres.

I can populate a text field (unparsed) with XML data but as far as I can 
see the xpath() function [now] only works on the xml data type.


When I try to populate a text field with XML data containing a DTD, 
however, the parser chokes. If I strip the DTD the parser chokes on 
undefined entities which are defined in the DTD.


(I switched the app' to from MySQL to Postgres because while MySQL works 
it returns matches in undelimited form which is next to useless if, for 
example, you return multiple attributes from a node.)


Does anyone know of  a solution to this problem?

Windows 2000 Server
Postgres 8.4

Regards
Roy Walter


[GENERAL] SPI_ERROR_CONNECT within pl/pgsql, PG 8.4

2009-07-10 Thread Marek Lewczuk
Hello,
I have made an upgrade to PG 8.4 and following error was thrown during
execution of some pl/pgsql function:
ERROR:  XX000: SPI_connect failed: SPI_ERROR_CONNECT
CONTEXT:  PL/pgSQL function "price_aftertrigger" line 30 at IF
SQL statement "update price set validFrom =  $1 , validTo =
$2  where id =  $3 [ $4 ][1]::Integer"
PL/pgSQL function "price_rebuildpricelistvalidity" line 54 at
SQL statement
SQL statement "SELECT  price_rebuildPriceListValidity( $1 )"
PL/pgSQL function "price_aftertrigger" line 54 at PERFORM
LOCATION:  plpgsql_call_handler, pl_handler.c:77


As I said it was thrown within PG8.4, but when it was executed within
PG8.3.5 there were no problems at all. Second execution of same
statement didn't throw an error too. Can anyone explain what it means.

Best regards,
ML

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] REINDEX "is not a btree"

2009-07-10 Thread Vanessa Lopez

Hello,

Thanks for all your answers!

I discovered the table that was causing the error, delete it and  
create it again (I miss some data but at least everything else is  
working now)


Yes, for the backup we copy everything we had under /data (the  
directory containing "base", "global", and so on ... we do backups  
every day from the server), and then we restore the whole /data  
directory at once ... but it did not solve the problem ..


What do you mean by we can't simply take a filesystem copy of a  
running database? :-O ... How should we then do the backups (so next  
time I will not have the same problem again) ?


Millions of thanks again!
Vanessa


On 10 Jul 2009, at 04:06, decibel wrote:


On Jul 4, 2009, at 8:06 AM, Craig Ringer wrote:

On Fri, 2009-07-03 at 15:00 +0100, Vanessa Lopez wrote:

I don't know much about postgre, I have no clue what else I can do.
Please, please any help is very very much appreciated I have lots of
databases and months of work in postgre (also lots of backups for  
the

data in /data)


When you say "in /data", do you mean the directory that contains the
directories "pg_xlog", "base", "global", "pg_clog", etc ?

Did you back up and restore the WHOLE data directory at once? Or
did you
restore only parts of it?



And how exactly did you make the backups? You can't simply take a
filesystem copy of a running database; that won't work.
--
Decibel!, aka Jim C. Nasby, Database Architect  deci...@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database storage

2009-07-10 Thread Scott Marlowe
On Fri, Jul 10, 2009 at 8:43 AM, John R Pierce wrote:
> nabble.30.miller_2...@spamgourmet.com wrote:
>>
>> The database server is a quad core machine, so it sounds as though
>> software RAID should work fine for the present setup. However, it
>> sounds as though I should put some money into a hardware RAID
>> controller if the database becomes more active. I had assumed RAID-5
>> would be fine, but please let me know if there is another RAID level
>> more appropriate for this implementation. Thanks for the valuable
>> insight!
>>
>
> raid-5 performs very poorly on random small block writes, which is hte
> majority of what databases do.   raid10 is the preferred raid for databases.
>
>
>
> btw: re earlier discussion of raid controllers vs software... I'm surprised
> nooone mentioned that a 'real' raid controller with battery backed writeback
> cache can hugely speed up committed 8kbyte block random writes, which are
> quite often the big bottleneck in a transactional database.

Given that the OP's usage pattern was bulk imports and reporting
queries it didn't seem very important.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database storage

2009-07-10 Thread Bill Moran
In response to nabble.30.miller_2...@spamgourmet.com:

> > On Thu, Jul 9, 2009 at 7:29 PM, Greg Stark wrote:
> >> On Fri, Jul 10, 2009 at 1:28 AM, Scott Marlowe 
> >> wrote:
> >>>
> >>> $750 is about what a decent RAID controller would cost you, but again
> >>> it's likely that given your bulk import scenario,  you're probably ok
> >>> without one.  In this instance, you're probably best off with software
> >>> RAID than a cheap RAID card which will cost extra and probably be
> >>> slower than linux software RAID.
> ...
> >> The main advantage of hardware raid is the error handling. When you
> >> get low level errors or pull a drive a lot of consumer level
> >> controllers and their drivers don't respond very well and have long
> >> timeouts or keep retrying tragically unaware that the software raid
> >> would be able to handle recoverying. A good server-class RAID
> >> controller should handle those situations without breaking a sweat.
> 
> > Definitely a big plus of a quality HW controller, and one of the
> > reasons I don't scrimp on the HW controllers I put in our 24/7
> > servers.  OTOH, if you can afford a bit of downtime to handle
> > failures, linux software RAID works pretty well, and since quad core
> > CPUs are now pretty much the standard, it's ok if parity calculation
> > uses up a bit of one core for lower performing servers like the
> > reporting server the OP was talking about.
> 
> The database server is a quad core machine, so it sounds as though
> software RAID should work fine for the present setup. However, it
> sounds as though I should put some money into a hardware RAID
> controller if the database becomes more active. I had assumed RAID-5
> would be fine, but please let me know if there is another RAID level
> more appropriate for this implementation. Thanks for the valuable
> insight!

RAID 10 is pretty much the fastest RAID level for disk IO.  Every pair
of disks you add to a RAID-10 array makes the array faster (assuming
you don't hit any controller bottlenecks)

Another advantage of hardware RAID controllers (that I'm surprised nobody
has mentioned) is battery-backed cache.  Using said cache, you can
configure the controller to lie about fsyncs, which make them essentially
free from PostgreSQL's standpoint.  Since the cache is backed by a
battery, your concerns about data loss in the event of power failure are
much less.  The cache doesn't usually increase the overall throughput of
the system, but it usually improves peak load performance by deferring
writes until things are calmer.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database storage

2009-07-10 Thread John R Pierce

nabble.30.miller_2...@spamgourmet.com wrote:

The database server is a quad core machine, so it sounds as though
software RAID should work fine for the present setup. However, it
sounds as though I should put some money into a hardware RAID
controller if the database becomes more active. I had assumed RAID-5
would be fine, but please let me know if there is another RAID level
more appropriate for this implementation. Thanks for the valuable
insight!
  


raid-5 performs very poorly on random small block writes, which is hte 
majority of what databases do.   raid10 is the preferred raid for databases.




btw: re earlier discussion of raid controllers vs software... I'm 
surprised nooone mentioned that a 'real' raid controller with battery 
backed writeback cache can hugely speed up committed 8kbyte block random 
writes, which are quite often the big bottleneck in a transactional 
database.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database storage

2009-07-10 Thread nabble . 30 . miller_2555
> On Thu, Jul 9, 2009 at 7:29 PM, Greg Stark wrote:
>> On Fri, Jul 10, 2009 at 1:28 AM, Scott Marlowe 
>> wrote:
>>>
>>> $750 is about what a decent RAID controller would cost you, but again
>>> it's likely that given your bulk import scenario,  you're probably ok
>>> without one.  In this instance, you're probably best off with software
>>> RAID than a cheap RAID card which will cost extra and probably be
>>> slower than linux software RAID.
...
>> The main advantage of hardware raid is the error handling. When you
>> get low level errors or pull a drive a lot of consumer level
>> controllers and their drivers don't respond very well and have long
>> timeouts or keep retrying tragically unaware that the software raid
>> would be able to handle recoverying. A good server-class RAID
>> controller should handle those situations without breaking a sweat.

> Definitely a big plus of a quality HW controller, and one of the
> reasons I don't scrimp on the HW controllers I put in our 24/7
> servers.  OTOH, if you can afford a bit of downtime to handle
> failures, linux software RAID works pretty well, and since quad core
> CPUs are now pretty much the standard, it's ok if parity calculation
> uses up a bit of one core for lower performing servers like the
> reporting server the OP was talking about.

The database server is a quad core machine, so it sounds as though
software RAID should work fine for the present setup. However, it
sounds as though I should put some money into a hardware RAID
controller if the database becomes more active. I had assumed RAID-5
would be fine, but please let me know if there is another RAID level
more appropriate for this implementation. Thanks for the valuable
insight!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Tom Lane
Simon Riggs  writes:
> ISTM more likely to be a problem with checkpointing clog or subtrans.
> That would block everybody and the scale of the problem is about right.

That's what I had been thinking too, but the log_checkpoint output
conclusively disproves it: those steps are taking less than 20msec.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] psql language

2009-07-10 Thread Hiroshi Saito

Hi Stuart-san.

Is the state where you wish this?

 example 
C:\Program Files\PostgreSQL\8.4\bin>psql -p 5433 postgres postgres
psql (8.4.0)
"help" でヘルプを表示します.

C:\Program Files\PostgreSQL\8.4\bin>set LANG=C

C:\Program Files\PostgreSQL\8.4\bin>psql -p 5433 postgres postgres
psql (8.4.0)
Type "help" for help.

Have I missed something?

Regards,
Hiroshi Saito

- Original Message - 
From: "Stuart McGraw" 

To: 
Sent: Friday, July 10, 2009 11:00 AM
Subject: [GENERAL] psql language



Hello,

I just installed pg-8.4 on Windows XP but ran into
some unexpected problems.

I am working on some tools to aid English-speaking learners of Japanese.  This of course requires 
me
to regularly display and enter Japanese text on my machine, so I have the Regional setting, 
"Language

for non-unicode programs" set to Japanese although
the locale language is English.  This allows me to
to work with both english and japanese text in the
windows console (cmd.exe) just fine.  Psql also worked fine until 8.4.

With 8.4 though, psql presents messages in Japanese.
Since I am still a beginner at Japanese myself, this
is a problem.

FWIW, I tried doing "SET LANGUAGE en_US" and
with every other locale-related variable "LC_ALL",
"LANG", "LC_MESSAGES", etc, I could think of,
before running psql but with no effect.

How can I tell psql (and any other command line tools) to use english messages?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Simon Riggs

On Fri, 2009-07-10 at 09:46 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > or a query like this
> > 
> >  Select '1', ...
> >  ...
> >  union
> >  Select status, ...
> >  ...
> >  where status != '1';
> >  ;
> > 
> > then it is clear that we could automatically prove that the the distinct
> > step is redundant and so we could either hash or sort. This is the same
> > as replacing the UNION with UNION ALL.
> 
> In the last example, how do you know that status != '1' produces unique
> output?  

You don't. I was assuming that you could already prove that each
subquery was distinct in itself.

It's one for the TODO, that's all. I see it often, but I'm not planning
to work on the code for this myself.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Bruce Momjian
Simon Riggs wrote:
> or a query like this
> 
>  Select '1', ...
>  ...
>  union
>  Select status, ...
>  ...
>  where status != '1';
>  ;
> 
> then it is clear that we could automatically prove that the the distinct
> step is redundant and so we could either hash or sort. This is the same
> as replacing the UNION with UNION ALL.

In the last example, how do you know that status != '1' produces unique
output?  I assumed UNION gave distinct for the entire output, not just
remove duplicates from the two UNION branches;  that's how Postgres
behaves now:

test=> SELECT 1 UNION (SELECT 2 UNION ALL SELECT 2);
 ?column?
--
1
2
(2 rows)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Simon Riggs

On Fri, 2009-07-10 at 09:28 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > 
> > On Fri, 2009-07-10 at 08:59 -0400, Bruce Momjian wrote:
> > 
> > > > I think it should be possible to use predtest theorem proving to
> > > discard
> > > > the sort/hash step in cases where we can prove the sets are
> > > disjoint.
> > > > Often there are top-level quals that can be compared in the WHERE
> > > > clauses of the sub-queries, so a shallow search could be quite
> > > > profitable in allowing us to rewrite a UNION into a UNION ALL.
> > > 
> > > I assume we would still need the distinct removal step;  we just avoid
> > > the sort/hash.
> > 
> > I mean it seems possible to prove that the distinct removal step is not
> > necessary, by proving that the various sub-queries are already disjoint.
> > It's a common manual optimization, so automating it seems a reasonable
> > future goal.
> 
> I am confused what sub-queries produce _distinct_ output.  I know there
> are some that produce _ordered_ output.

None, that was not my point.

If you have a query like this

 Select ..., status, ...
 ...
 where status = '1'
 union
 Select ..., status, ...
 ...
 where status = '2';

or a query like this

 Select '1', ...
 ...
 union
 Select '2', ...
 ...
 ;

or a query like this

 Select '1', ...
 ...
 union
 Select status, ...
 ...
 where status != '1';
 ;

then it is clear that we could automatically prove that the the distinct
step is redundant and so we could either hash or sort. This is the same
as replacing the UNION with UNION ALL.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Bruce Momjian
Simon Riggs wrote:
> 
> On Fri, 2009-07-10 at 08:59 -0400, Bruce Momjian wrote:
> 
> > > I think it should be possible to use predtest theorem proving to
> > discard
> > > the sort/hash step in cases where we can prove the sets are
> > disjoint.
> > > Often there are top-level quals that can be compared in the WHERE
> > > clauses of the sub-queries, so a shallow search could be quite
> > > profitable in allowing us to rewrite a UNION into a UNION ALL.
> > 
> > I assume we would still need the distinct removal step;  we just avoid
> > the sort/hash.
> 
> I mean it seems possible to prove that the distinct removal step is not
> necessary, by proving that the various sub-queries are already disjoint.
> It's a common manual optimization, so automating it seems a reasonable
> future goal.

I am confused what sub-queries produce _distinct_ output.  I know there
are some that produce _ordered_ output.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Simon Riggs

On Fri, 2009-07-10 at 08:59 -0400, Bruce Momjian wrote:

> > I think it should be possible to use predtest theorem proving to
> discard
> > the sort/hash step in cases where we can prove the sets are
> disjoint.
> > Often there are top-level quals that can be compared in the WHERE
> > clauses of the sub-queries, so a shallow search could be quite
> > profitable in allowing us to rewrite a UNION into a UNION ALL.
> 
> I assume we would still need the distinct removal step;  we just avoid
> the sort/hash.

I mean it seems possible to prove that the distinct removal step is not
necessary, by proving that the various sub-queries are already disjoint.
It's a common manual optimization, so automating it seems a reasonable
future goal.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL and Poker

2009-07-10 Thread Jasen Betts
On 2009-07-08, Massa, Harald Armin  wrote:
> a quite interesting read.
>
> http://www.codingthewheel.com/archives/stranger-than-fiction-story-online-poker-tracker-postgresql
>
>
> especially as an explanation of the growing number of questions from
> Windows-Users of PostgreSQL
>
> And ... for a tag line: "PostgreSQL. Thousends bet on it."

I liked this bit:

. "Product Manager: So wait, let me get this straight. You want us to
. force our users, who are some of the laziest people on the planet, to
. install a full-fledged relational database management system??? On
. their home computer??? Like what, they're going to become DBAs? And
. you're calling that a feature? Well, why stop there? Why not just ship
. them our source code directly and force them to compile it on the
. COMMAND LINE? Every user is a programmer, right? Well? ARE YOU OUT OF
. YOUR F--KING MIND??"

We do that!  We were using the non-interactive MSI installer, those who
know what's going on seem pleased to see a real database, those who don't, 
trust us :)

it started life as a linux-only application but then someone ported PG
and GTK to windows, and NSIS to linux. now we bulld windows installer
CDs on linux.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Best practices to WorkFlow design?

2009-07-10 Thread Andre Lopes
Hi,

I will be developing a WorkFlow Application, but I don't know the best
practices on how to design a WorkFlow on a Database.

Can you give me some clues? Books, links on the Internet, etc...


Best Regards,
André.


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Bruce Momjian
Simon Riggs wrote:
> 
> On Thu, 2009-07-09 at 20:41 -0600, Scott Marlowe wrote:
> > On Thu, Jul 9, 2009 at 7:58 PM, Bruce Momjian wrote:
> > > Scott Bailey wrote:
> > >> Alvaro Herrera wrote:
> > >> > Tim Keitt wrote:
> > >> >> I am combining query results that I know are disjoint. I'm wondering
> > >> >> how much overhead there is in calling union versus union all. (Just
> > >> >> curious really; I can't see a reason not to use union all.)
> > >> >
> > >> > UNION needs to uniquify the output, for which it plasters an additional
> > >> > sort step, whereas UNION ALL does not need to uniquify its output and
> > >> > thus it can avoid the sort step.  Using UNION ALL is recommended
> > >> > wherever possible.
> > >> >
> > >> I think I read somewhere that as of 8.4 it no longer required the sort
> > >> step, due to the improvements in hashing. Here it is
> > >>
> > >> http://wiki.postgresql.org/wiki/WhatsNew84#Performance
> > >
> > > Oh, yea, hashing is used in some cases rather than sort.  I assume sort
> > > is still used if the hash exceeds workmem size.
> > 
> > The important point being that it's still more expensive than a plain
> > union all thought, right?
> 
> I think it should be possible to use predtest theorem proving to discard
> the sort/hash step in cases where we can prove the sets are disjoint.
> Often there are top-level quals that can be compared in the WHERE
> clauses of the sub-queries, so a shallow search could be quite
> profitable in allowing us to rewrite a UNION into a UNION ALL.

I assume we would still need the distinct removal step;  we just avoid
the sort/hash.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SELECT DISTINCT very slow

2009-07-10 Thread Greg Stark
On Fri, Jul 10, 2009 at 1:41 PM, Ben Harper wrote:
>
> Unfortunately I can't use GROUP BY, because what I'm really doing is
> SELECT DISTINCT ON(unique_field) id FROM table;

You could do that using GROUP BY if you define a first() aggregate. In
this case that would just be SELECT first(id) AS id from (select *
from table ORDER BY unique_field, ...) GROUP BY unique_field. In cases
with more fields it gets tiresome fast.

In this case 8.4 won't actually help you. It only uses hash aggregates
for DISTINCT not DISTINCT ON.


> I'm not familiar with the Postgres internals, but in my own DB system
> that I have written, I do the skip-scanning thing, and for my system
> it was a really trivial optimization to code.

Well things get tricky quickly when you have to deal with concurrent
inserts and potentially page splits from other transactions. Also
consider how hard it is to prove that the query falls into this
category of queries.


> Inside a GIS application, the user wants to categorize the display of
> some information based on, in this case, the suburb name.
> He clicks a button that says "Add All Unique Categories". This is a
> very common operation in this domain.

That doesn't look like what this query is doing to me. It's taking one
exemplar from each suburb based on some other constraint (the minimum
of whatever your order by key is) and taking the id of that data
point. If the order by key doesn't specify a specific data point then
it's a non-deterministic record.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SELECT DISTINCT very slow

2009-07-10 Thread Ben Harper
Thanks for all the feedback.

Using GROUP BY is indeed much faster (about 1 second).

Unfortunately I can't use GROUP BY, because what I'm really doing is
SELECT DISTINCT ON(unique_field) id FROM table;

I'm not familiar with the Postgres internals, but in my own DB system
that I have written, I do the skip-scanning thing, and for my system
it was a really trivial optimization to code. I know, I'm always free
to submit a patch, and hopefully someday I will, if it hasn't already
been done by then.

I can't comment on whether this skip-scan optimization is general
enough to warrant the lines of code, but I might as well explain my
use case:
Inside a GIS application, the user wants to categorize the display of
some information based on, in this case, the suburb name.
He clicks a button that says "Add All Unique Categories". This is a
very common operation in this domain.

Again, thanks for all the feedback. I'll upgrade to 8.4 soon.
Ben Harper

On Fri, Jul 10, 2009 at 2:50 AM, Tom Lane wrote:
> Greg Stark  writes:
>> Not really. The OP doesn't say how wide the record rows are but unless
>> they're very wide it wouldn't pay to use an index for this even if you
>> didn't have to access the heap also. It's going to be faster to scan
>> the whole heap and either sort or use a hash. Currently there aren't
>> many cases where a btree with 6,000 copies of 111 distinct keys is
>> going to be useful.
>
> It was 600,000 not 6,000 ... so a skip-scan might be worth the trouble,
> but as you say we haven't done it.
>
> In any case I think the real issue is that the OP is probably using a
> pre-8.4 release which will always do SELECT DISTINCT via sort-and-unique.
> Hash aggregation would be a whole lot faster for these numbers, even
> if not exactly instantaneous.  He could update to 8.4, or go over to
> using GROUP BY as was recommended upthread.
>
>                        regards, tom lane
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ubuntu packages for 8.4

2009-07-10 Thread Tim Uckun
On Fri, Jul 10, 2009 at 9:22 PM, Stuart Bishop wrote:
> On Fri, Jul 10, 2009 at 8:28 AM, Tim Uckun wrote:
>> I don't see any ubuntu packages for 8.4 in the default repositories.
>>
>> Does anybody know if they will be upgrading the postgresql package to
>> 8.4 or creating a new package for it.
>
> The postgresql-8.4 packages arrived in 9.10 (Karmic) about two hours
> ago. https://launchpad.net/ubuntu/+source/postgresql-8.4
>
> Assuming you are not running the 9.10 prerelease, you may want to
> request a backport to whatever Ubuntu release you are running -
> https://help.ubuntu.com/community/UbuntuBackport. This should be no
> drama.
>
> Major postgresql releases always get a new package. This allows you to
> have multiple major versions installed and running simultaneously.
>


Awesome. Thanks a lot for the information.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Checkpoint Tuning Question

2009-07-10 Thread Simon Riggs

On Wed, 2009-07-08 at 18:22 -0400, Tom Lane wrote:

> As Greg commented upthread, we seem to be getting forced to the
> conclusion that the initial buffer scan in BufferSync() is somehow
> causing this.  There are a couple of things it'd be useful to try
> here:

Not sure why you're forced to that conclusion?

ISTM more likely to be a problem with checkpointing clog or subtrans.
That would block everybody and the scale of the problem is about right.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ubuntu packages for 8.4

2009-07-10 Thread Stuart Bishop
On Fri, Jul 10, 2009 at 8:28 AM, Tim Uckun wrote:
> I don't see any ubuntu packages for 8.4 in the default repositories.
>
> Does anybody know if they will be upgrading the postgresql package to
> 8.4 or creating a new package for it.

The postgresql-8.4 packages arrived in 9.10 (Karmic) about two hours
ago. https://launchpad.net/ubuntu/+source/postgresql-8.4

Assuming you are not running the 9.10 prerelease, you may want to
request a backport to whatever Ubuntu release you are running -
https://help.ubuntu.com/community/UbuntuBackport. This should be no
drama.

Major postgresql releases always get a new package. This allows you to
have multiple major versions installed and running simultaneously.


-- 
Stuart Bishop 
http://www.stuartbishop.net/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Overhead of union versus union all

2009-07-10 Thread Simon Riggs

On Thu, 2009-07-09 at 20:41 -0600, Scott Marlowe wrote:
> On Thu, Jul 9, 2009 at 7:58 PM, Bruce Momjian wrote:
> > Scott Bailey wrote:
> >> Alvaro Herrera wrote:
> >> > Tim Keitt wrote:
> >> >> I am combining query results that I know are disjoint. I'm wondering
> >> >> how much overhead there is in calling union versus union all. (Just
> >> >> curious really; I can't see a reason not to use union all.)
> >> >
> >> > UNION needs to uniquify the output, for which it plasters an additional
> >> > sort step, whereas UNION ALL does not need to uniquify its output and
> >> > thus it can avoid the sort step.  Using UNION ALL is recommended
> >> > wherever possible.
> >> >
> >> I think I read somewhere that as of 8.4 it no longer required the sort
> >> step, due to the improvements in hashing. Here it is
> >>
> >> http://wiki.postgresql.org/wiki/WhatsNew84#Performance
> >
> > Oh, yea, hashing is used in some cases rather than sort.  I assume sort
> > is still used if the hash exceeds workmem size.
> 
> The important point being that it's still more expensive than a plain
> union all thought, right?

I think it should be possible to use predtest theorem proving to discard
the sort/hash step in cases where we can prove the sets are disjoint.
Often there are top-level quals that can be compared in the WHERE
clauses of the sub-queries, so a shallow search could be quite
profitable in allowing us to rewrite a UNION into a UNION ALL.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance problem with low correlation data

2009-07-10 Thread m_lists


> > testinsert contains t values between '2009-08-01' and '2009-08-09', and 
> > ne_id 
> from 1 to 2. But only 800 out of 2 ne_id have to be read; there's no 
> need for a table scan!
> > I guess this is a reflection of the poor "correlation" on ne_id; but, as I 
> said, I don't really think ne_id is so bad correlated.
> > In fact, doing a "select ne_id, t from testinsert limit 10"  I can see 
> that data is laid out pretty much by "ne_id, t", grouped by day (that is, 
> same 
> ne_id for one day, then next ne_id and so on until next day).
> > How is the "correlation" calculated? Can someone explain to me why, after 
> > the 
> procedure above,correlation is so low???
> 
> Did you run ANALYZE after the procedure above?

Yes I did; the correlation on that column stays low.
Of course, I didn't expect a correlation = 1, since data is layed out (pretty 
much) like this:
(ne_id1) (t1 day1)
(ne_id1) (t2 day1)
...
(ne_id1) (tn day1) 
(ne_id2) (t1 day1)
(ne_id2) (t2 day1)
...
(ne_id2) (tn day1) 
...
(pretty much all the ne_ids)
(ne_id1) (t1 day2)
(ne_id1)  (t2 day2)
...
(ne_id1)  (tn day2) 
(ne_id2)  (t1 day2)
(ne_id2) (t2 day2)
...
(ne_id2)  (tn day2) 
... and so on
so I ne_id is not strictly incrementing, but it is pretty much the same 
(sequencially) for a whole whole day...





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general