Re: [GENERAL] pgpool2 vs sequoia

2007-08-03 Thread mljv
Am Donnerstag, 2. August 2007 22:37 schrieben Sie:
> On Thu, Aug 02, 2007 at 11:58:40AM +0200, [EMAIL PROTECTED] wrote:
> > Hi,
> >
> > i would like to use a statement replication for postgresql
>
> Why?

i have read
http://www.postgresql.org/docs/8.2/interactive/high-availability.html

i want 4 synchronous databases with load balancing, so my application has high 
performance and high availability.

I am using "Data Partitioning" at this time. Each db has the same schema but 
different data at the moment. But i am not very happy with it as few tables 
must be kept synchronous and i don't have any failover or load balancing.  
"statement replication" is my the only way, to have a synchronous replication 
with load balancing, isn't it?. At least the one which i understand best. 

kind regards,
Janning

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] PG for DataWarehouse type Queries

2007-08-03 Thread Ow Mun Heng
Can anyone shed some light on this. I just would like to know if queries
for raw data (not aggregregates) is expected to take a long time.
Running times between 30 - 2 hours for large dataset pulls.

Involves lots of joins on very large tables (min 1 millon rows each
table, 300 columns per table)

Joins are done in the form of Left joins (sometimes on the same tables,
due to normalisation)

Is 30min - 2hours too long or is this considered "normal"??



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] GiST index implementation

2007-08-03 Thread Elena Camossi
Hi Gregory,

thank you very much for you answer!

>
> > what is the default implementation for GiST index? B-Tree or R-Tree?
> > That is, if i execute the following SQL command:
> >
> >  CREATE index ON table USING Gist (column)
> >
> > what is the type of the index that is actually built?
>
> uhm, GIST. GIST is a particular type of index just like btree.


according to the documentation (ch 11.2) "GiST indexes are not a single kind
of index, but rather an infrastructure within wich many different indexing
strategies can be implemented", and (ch 50.1) "B-Tree, R-Tree and many other
indexing schemes can be implemented in GiST".
Moreover, according to what i found on the web (see
http://www.sai.msu.su/~megera/postgres/gist/ ) some of such implementations
are now included in the core of Postgres.
I supposed there was a way to specify one among the implementation
provided.

What are you actually trying to do? Do you have a particular problem you're
> trying to solve?


I wanted to test the suitability and the efficiency of  R-Tree/GiST for
query involving standard PostgreSQL temporal column data.

> How can I specify in SQL one of the two implementations provided (e.g.
> > R-Tree)?
>
> R-Tree indexes don't exist in Postgres any more. GIST indexes are very
> similar
> only more general. They handle 2D geometric data types like RTree did
> previously as well as n-dimensional data types and other more exotic
> things
> like intarrays and full text search.


Are these  functionalities all included by default in the standard GiST
indexing?

There are different "kinds" of GIST indexes which you can specify by
> specifying an operator class with you define the index.

But most data types
> only have a single operator class available to them so that's probably not
> what you need unless you're trying to do something unusual.

An "operator class" defines a set of operators which can be optimized by an
> index organized the same way and usually correspond to a particular
> interpretation of the data type.


Is the specification of the operator class done when I create the index?
Or is the  suitable  operator class automatically loaded according to the
data type of the column on which I build the index?
And if I would specify a different operator class, how can i do it?
May I provide a different operator class beyond the ones provided?

Thank you very much.

Regards,
-Elena


[GENERAL] invalid page header

2007-08-03 Thread Markus Schiltknecht

Hi,

I'm in the unfortunate position of having "invalid page header(s) in 
block 58591 of relation "pg_toast_302599". I'm well aware that the 
hardware in question isn't the most reliable one. None the less, I'd 
like to restore as much of the data as possible.


A pg_filedump analysis of the file in question spits out the following 
information (relation pg_toast_302599 has oid 302602):



# pg_filedump -R 58590 58592 base/296788/302602

***
* PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1
*
* File: base/296788/302602
* Options used: -R 58590 58592
*
* Dump created on: Fri Aug  3 10:34:23 2007
***

Block 58590 
 -
 Block Offset: 0x1c9bc000 Offsets: Lower  36 (0x0024)
 Block: Size 8192  Version3Upper  48 (0x0030)
 LSN:  logid  4 recoff 0x24b6ee88  Special  8192 (0x2000)
 Items:4   Free Space:   12
 Length (including item array): 40

 --
 Item   1 -- Length: 2034  Offset: 6156 (0x180c)  Flags: USED
 Item   2 -- Length: 2034  Offset: 4120 (0x1018)  Flags: USED
 Item   3 -- Length: 2034  Offset: 2084 (0x0824)  Flags: USED
 Item   4 -- Length: 2034  Offset:   48 (0x0030)  Flags: USED


Block 58591 
 -
 Block Offset: 0x1c9be000 Offsets: Lower12858 (0x323a)
 Block: Size 28160  Version   73Upper14900 (0x3a34)
 LSN:  logid 627535472 recoff 0x3a693b22  Special  9506 (0x2522)
 Items: 3209   Free Space: 2042
 Length (including item array): 8192

 Error: Invalid header information.

 Error: End of block encountered within the header. Bytes read: 8192.


Block 58592 
 -
 Block Offset: 0x1c9c Offsets: Lower  36 (0x0024)
 Block: Size 8192  Version3Upper  48 (0x0030)
 LSN:  logid  4 recoff 0x24b73110  Special  8192 (0x2000)
 Items:4   Free Space:   12
 Length (including item array): 40

 --
 Item   1 -- Length: 2034  Offset: 6156 (0x180c)  Flags: USED
 Item   2 -- Length: 2034  Offset: 4120 (0x1018)  Flags: USED
 Item   3 -- Length: 2034  Offset: 2084 (0x0824)  Flags: USED
 Item   4 -- Length: 2034  Offset:   48 (0x0030)  Flags: USED


*** End of Requested Range Encountered. Last Block Read: 58592 ***


It seems obvious, that the block in question is absolutely screwed, as 
the block size as well as the version don't match.


What's the best cure? Can I just wipe out the block with something like:

# dd if=/dev/zero of=base/296788/302602 seek=58591 bs=8192 count=1



This is PostgreSQL 8.1.9 (from Debian 8.1.9-0etch1), but it has been 
running earlier 8.1.x versions in the past.



Regards

Markus



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Latin1 to UTF-8 ?

2007-08-03 Thread Aarni Ruuhimäki
Hi,

I've set up a new CentOs server with PostgreSQL 8.2.4 and initdb'ed it with 
UTF-8.

Ok, and runs fine.

I have a problem with encodings, however. And mainly with the russian cyrillic 
characters.

When I testdumped some dbs from the old FC / Pg 8.0.2, all Latin1, I noticed 
that some of the dumps show in the Konqueror file browser as 'Plain Text 
Documents' and some as 'C++ Source Files'. Both have Latin1 as client 
encoding at the top of the files. Changing that gives errors, as expected.

Looking in to the plain text dumps I see all cyrillic characters as Р... 
and these go in display fine from the new server's UTF-8 environment.

Some of the 'C++' files have the cyrillics as 'îñåòèòåëåé'. Some have both 
'îñåòèòåëåé' and Р... and ofcourse the 'îñåò' characters come out wrong 
and unreadable to the browser. (not sure if you an see single quoted ones, 
but they look something like hebrew or similar) 

I have no idea what browsers / encodings or even keyboard layouts have been 
used when the data has been inserted by users through their web 
interfaces ...

I tried the -F p switch as the earlier version has no -E for dumps. Same 
output. Also with pg_dumpall.

I tried various encodings with iconv too.

So, what would be the proper way to convert the dumps to UTF-8 ? Or any other 
solution ? Any other tool to work with the problem files ?

BR,

Aarni
-- 
Aarni Ruuhimäki


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] PG for DataWarehouse type Queries

2007-08-03 Thread Josh Tolley
On 8/3/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote:
> Can anyone shed some light on this. I just would like to know if queries
> for raw data (not aggregregates) is expected to take a long time.
> Running times between 30 - 2 hours for large dataset pulls.
>
> Involves lots of joins on very large tables (min 1 millon rows each
> table, 300 columns per table)
>
> Joins are done in the form of Left joins (sometimes on the same tables,
> due to normalisation)
>
> Is 30min - 2hours too long or is this considered "normal"??

That depends entirely on your query, your dataset, the machine you're
using, etc. Your best bet is probably to post an EXPLAIN ANALYZE of
your query along with some description of what's going on and what
you're running this all on (pg version, machine specs, etc.) and see
if someone can come up with an acceptable way to make the query
faster.

- Josh

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Suse RPM's

2007-08-03 Thread Devrim GÜNDÜZ
Hi,

On Fri, 2007-08-03 at 11:38 +0100, Chris Coleman wrote:
> Hi,
> 
> Does anyone know of anywhere to get Suse 10.1 RPMs of recent (8.2 and
> 8.3) versions of postgres?

Complain to Reinhard, he is CC'ed to this e-mail.

ftp://ftp.suse.com/pub/projects/postgresql/

has only 8.2.0...

BTW, I have promised to build SuSE RPMs some time ago; however Reinhard
said that they will keep the packages up2date. 

Reinhard, if SuSE is busy and/or is not willing to keep the packages
updated, I can also build SuSE RPMs for the community...

Kind regards,
-- 
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/




signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] PG for DataWarehouse type Queries

2007-08-03 Thread Tom Lane
Ow Mun Heng <[EMAIL PROTECTED]> writes:
> Can anyone shed some light on this. I just would like to know if queries
> for raw data (not aggregregates) is expected to take a long time.
> Running times between 30 - 2 hours for large dataset pulls.

> Involves lots of joins on very large tables (min 1 millon rows each
> table, 300 columns per table)

> Joins are done in the form of Left joins (sometimes on the same tables,
> due to normalisation)

> Is 30min - 2hours too long or is this considered "normal"??

This question is nearly content-free.  How many joins is "lots"?  How
many result rows are you expecting?  What PG version are you using?
What have you got work_mem set to?  What does EXPLAIN say the plan is
(EXPLAIN ANALYZE output would be better, but you'll have to wait for the
results...)?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] GiST index implementation

2007-08-03 Thread Gregory Stark
"Elena Camossi" <[EMAIL PROTECTED]> writes:

> Hi Gregory,
>
> thank you very much for you answer!
>
>>
>> > what is the default implementation for GiST index? B-Tree or R-Tree?
>> > That is, if i execute the following SQL command:
>> >
>> >  CREATE index ON table USING Gist (column)
>> >
>> > what is the type of the index that is actually built?
>>
>> uhm, GIST. GIST is a particular type of index just like btree.
>
>
> according to the documentation (ch 11.2) "GiST indexes are not a single kind
> of index, but rather an infrastructure within wich many different indexing
> strategies can be implemented", and (ch 50.1) "B-Tree, R-Tree and many other
> indexing schemes can be implemented in GiST".

Hm, well that's kind of true from an abstract point of view. But from an
practical point of view it's not really relevant. What you get when you use
GIST indexing Postgres calls a GIST index regardless of what operator class
you use.

Most datatypes only have one GIST operator class. The geometric data types
have a class that is most similar to the old RTree implementation. There is a
GIST operator class for integers which implements an ordered btree style index
-- but you probably would just use a Postgres BTree index if you wanted such
an index.


> I wanted to test the suitability and the efficiency of  R-Tree/GiST for
> query involving standard PostgreSQL temporal column data.
...
> Are these  functionalities all included by default in the standard GiST
> indexing?

The only GIST operator classes which come with the Postgres core are box_ops,
poly_ops, and circle_ops which are the equivalents to the old RTree operator
classes. There are a bunch more in the contrib modules. But I don't see any
related to temporal data. 

You might want to look at the seg module and see if it can be altered to work
with temporal data instead. You might also look on Pgfoundry, there might be
something there.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Suse RPM's

2007-08-03 Thread Jorge Godoy
On Friday 03 August 2007 07:38:19 Chris Coleman wrote:
> Hi,
>
> Does anyone know of anywhere to get Suse 10.1 RPMs of recent (8.2 and
> 8.3) versions of postgres?
>
> The postgres website only has fredora and redhat ones listed, and using
> rpmfind.net I can only find 8.0.13 ones for 10.0.

I usually get a .src.rpm, the tarball and run rpmbuild -bb.  It works fine and 
I have new packages very fast.

-- 
Jorge Godoy  <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] invalid page header

2007-08-03 Thread Tom Lane
Markus Schiltknecht <[EMAIL PROTECTED]> writes:
>> Block 58591 
>>  -
>> Block Offset: 0x1c9be000 Offsets: Lower12858 (0x323a)
>> Block: Size 28160  Version   73Upper14900 (0x3a34)
>> LSN:  logid 627535472 recoff 0x3a693b22  Special  9506 (0x2522)
>> Items: 3209   Free Space: 2042
>> Length (including item array): 8192
>> 
>> Error: Invalid header information.

Hm, looks suspiciously ASCII-like.  If you examine the page as text,
is it recognizable?  We've seen cases where, for instance, a page
of a system's mail spool had replaced a database page.  It's hard to
be sure whether that sort of thing is a hardware error or an operating
system bug.

> What's the best cure? Can I just wipe out the block with something like:
> # dd if=/dev/zero of=base/296788/302602 seek=58591 bs=8192 count=1

Yeah, try that.  I think the net effect will be that some wide (toasted)
values will be truncated.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] pgpool2 vs sequoia

2007-08-03 Thread David Fetter
On Fri, Aug 03, 2007 at 09:25:41AM +0200, [EMAIL PROTECTED] wrote:
> Am Donnerstag, 2. August 2007 22:37 schrieben Sie:
> > On Thu, Aug 02, 2007 at 11:58:40AM +0200, [EMAIL PROTECTED] wrote:
> > > Hi,
> > >
> > > i would like to use a statement replication for postgresql
> >
> > Why?
> 
> i have read
> http://www.postgresql.org/docs/8.2/interactive/high-availability.html
> 
> i want 4 synchronous databases with load balancing, so my
> application has high performance and high availability.

Very few people actually need synchronous replication, and those who
do buy Oracle's RAC (and curse it) or use DB2's offering (and also
curse it ;).  For most purposes, fast asynchronous replication is good
enough.

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] parsed queries (cursors) cashing issues

2007-08-03 Thread Sibte Abbas
On 8/3/07, Sergey Moroz <[EMAIL PROTECTED]> wrote:
> No that is not I meant. The problem in Prepared statements is in that you
> should determine SQL inside the function. I want to pass a query as a
> parameter, as well as query parameters.
> For example (I want to create a function like the following):
>
> select *
>   from exec_query(
>   /*query text  => */  'select f1, f2 from table
> where f3 = $1' ,
>/*param1  => */  1::integer
>   )
>  as (f1 integer, f2 text)
>
> so function exec_query got a query text as parameter, query parameters,
> executed it and returned result as SETOF. In case of such a query had been
> executed at least once, prepare step should be excluded (stored execution
> plan should be used).
>

In this case you need to store query text along with its plan name.
This will allow you to simply execute the plan each time a previously
parsed/planned query is executed.

However storing raw queries can be a *very* expensive operation, not
to mention the high cost of performing comparison on them. Due to the
associated cost, I'll
recommend using(and storing) hashes for query text.

If I were you, i'll write the hash calculation and storage and
retrieval functions in C and the top level function in Plpgsql.

Hope that helps.

regards,
-- Sibte

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] index bloat WAS: reindexing pg_shdepend

2007-08-03 Thread Joseph Shraibman

Tom Lane wrote:

Joseph S <[EMAIL PROTECTED]> writes:
... and when I notice that the tuplesperpage for the indexes is low (or 
that the indexes are bigger then the tables themselves) I know it is 
time for a VACUUM FULL and REINDEX on that table.


If you are taking the latter as a blind must-be-wrong condition, you are
fooling yourself -- it's not true for small tables.


I know it isn't true for small tables.  Tables can have a size of 0 but 
the minimum size for an index seems to be two pages.  Indexes can also 
rival the size of the table when the table when the index is on all the 
columns of the table.  But most of the time having an index bigger than 
the table itself mean I need a REINDEX.





Have you checked whether the VACUUM FULL + REINDEX actually makes
anything smaller?


Yes.  I'm mostly seeing the problem on tables of counts that are updated 
frequently by triggers on other tables.  It seems autovacuum can't keep 
up with the frequency of updates.  The table size itself can shrink by 
50%, but the indexes can shrink by 90%.


I just ran my VACUUM FULL/REINDEX script at 11am.  Last time I ran it 
was 930pm last night.  Some before/afters:


BEFORE

pg_catalog  pg_classtable   172,032 19.476
pg_catalog  pg_class_oid_index  index   57,344  58.429
pg_catalog  pg_class_relname_nsp_index  index   180,224 18.591

AFTER

pg_catalog  pg_classtable   90,112  41.3
pg_catalog  pg_class_oid_index  index   32,768  103
pg_catalog  pg_class_relname_nsp_index  index   73,728  59

BEFORE

public  acount  table   434,176 119.302
public  acount_pkey index   172,032 301.095
public  ad_x_idxindex   638,976 36.551

AFTER

public  acount  table   335,872 155.561
public  acount_pkey index   163,840 318.9
public  a_x_idx index   131,072 221.143

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] pgpool2 vs sequoia

2007-08-03 Thread Joshua D. Drake

David Fetter wrote:

On Fri, Aug 03, 2007 at 09:25:41AM +0200, [EMAIL PROTECTED] wrote:

Am Donnerstag, 2. August 2007 22:37 schrieben Sie:

On Thu, Aug 02, 2007 at 11:58:40AM +0200, [EMAIL PROTECTED] wrote:

Hi,

i would like to use a statement replication for postgresql

Why?

i have read
http://www.postgresql.org/docs/8.2/interactive/high-availability.html

i want 4 synchronous databases with load balancing, so my
application has high performance and high availability.


synchronous replication <> high performance without lots of $$

Use a caching server or something like plproxy.



Very few people actually need synchronous replication, and those who
do buy Oracle's RAC (and curse it) or use DB2's offering (and also
curse it ;).  For most purposes, fast asynchronous replication is good
enough.

Cheers,
David.



--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] Suse RPM's

2007-08-03 Thread Chris Coleman
Hi,

Does anyone know of anywhere to get Suse 10.1 RPMs of recent (8.2 and
8.3) versions of postgres?

The postgres website only has fredora and redhat ones listed, and using
rpmfind.net I can only find 8.0.13 ones for 10.0.

Thanks
Chris Coleman


Chris Coleman
Programmer 
Information Systems
Room PKL1 Phone 369
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

This e-mail is confidential and may be read only by the intended recipient.
If you are not the intended recipient, please do not forward, copy or take
any action based on it and, in addition, please delete this email and
inform the sender.
We cannot be sure that this e-mail or its attachments are free from
viruses.  In keeping with good computing practice, please ensure that
you take adequate steps to check for any viruses.  Before replying
or sending any email to us, please consider that the internet is inherently
insecure and is an inappropriate medium for certain kinds of information.
We reserve the right to access and read all e-mails and attachments
entering or leaving our systems.

Registered office: Eurocom House, Ashbourne Road, Derby DE22 4NB Company 
number: 01574696. 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] invalid page header

2007-08-03 Thread Markus Schiltknecht

Hi,

Tom Lane wrote:

Hm, looks suspiciously ASCII-like.  If you examine the page as text,
is it recognizable? 


Doh! Yup, is recognizable. It looks like some PHP serialized output:

png%";i:84;s:24:"%InfoToolIconActive.png%";i:85;s:29:"%InfoToolIconHighlighted.png%";i:86;s:26:"%InfoToolIconInactive.png%";i:87;

We do store serialized PHP objects like the above one in the database, 
so it's probably not a mail spool.



What's the best cure? Can I just wipe out the block with something like:
# dd if=/dev/zero of=base/296788/302602 seek=58591 bs=8192 count=1


Yeah, try that.  I think the net effect will be that some wide (toasted)
values will be truncated.


Thank you for you guidance. I've done that and hope the running pg_dump 
goes through fine.


Regards

Markus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: index bloat WAS: [GENERAL] reindexing pg_shdepend

2007-08-03 Thread Michael Fuhr
On Thu, Aug 02, 2007 at 10:40:24PM -0400, Joseph S wrote:
> Tom Lane wrote:
> >Heavy use of temp tables would expand pg_class, pg_type, and especially
> >pg_attribute, but as long as you have a decent vacuuming regimen (do you
> >use autovac?) they shouldn't get out of hand.
>
> I do use autovac.  Like I said they don't get really out of hand, only 
> up to 20 megs or so before I noticed that it was weird.  The large 
> indexes are what tipped me off that something strange was going on.

Unexpected bloat in pg_shdepend led me to discover a problem with
statistics for shared tables a couple of months ago:

http://archives.postgresql.org/pgsql-hackers/2007-06/msg00190.php
http://archives.postgresql.org/pgsql-hackers/2007-06/msg00245.php

> I only noticed this because I was making an effort to monitor index 
> bloat on my regular tables.  It could be there are a lot of people out 
> there who are experiencing this but don't notice because 20 megs here 
> and there don't cause any noticeable problems.

Anybody making heavy use of temporary tables and relying on autovacuum
is probably suffering bloat in pg_shdepend because no released
version of PostgreSQL has the fix for the statistics bug (it has
been fixed in CVS, however).  As I mention in the second message
above, vacuuming pg_shdepend resulted in an immediate performance
improvement in an application I was investigating.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] PG for DataWarehouse type Queries

2007-08-03 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Ow Mun Heng <[EMAIL PROTECTED]> writes:
>> Can anyone shed some light on this. I just would like to know if queries
>> for raw data (not aggregregates) is expected to take a long time.
>> Running times between 30 - 2 hours for large dataset pulls.
>
>> Involves lots of joins on very large tables (min 1 millon rows each
>> table, 300 columns per table)

One more question to pile on to the others: what version of Postgres is this.

I seem to recall some older versions had trouble with tables with hundreds of
columns, but that's just guessing in the dark without actual facts to look at.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] Table queue (locking)

2007-08-03 Thread Tomas Simonaitis
Hello,

I've got following two-tables events queue implementation
(general idea is that multiple writers put events, while multiple readers 
retrieve and handle them in order):

Table events:
ev_id: SERIAL
ev_data: bytea -- serialized event details

Table eventsconsumers:
con_name: text UNIQUE -- consumer name
con_lastevent: integer

Consumers issue:
SELECT * events WHERE ev_id > "con_lastevent" LIMIT XX
to fetch new events[1]
Once event is handled (or ignored) by a reader he sets con_lastevent to 
handled ev_id.

Obviuos problem with this simple implementation is following race condition:

-- Writer1
BEGIN;
INSERT INTO events (ev_id = 1)
-- Writer2
BEGIN;
INSERT INTO events... (ev_id = 2)
COMMIT;
--Reader1
SELECT * FROM events WHERE ev_id > 0; -- first round
UPDATE eventsconsumers SET con_lastevent = 2 WHERE con_name = 'Reader1';
--Writer1
COMMIT; -- Reader1 missed ev_id = 1

I've got two ideas to solve it:
1- BEGIN; LOCK TABLE events IN ACCESS EXCLUSIVE MODE; INSERT INTO events...; 
COMMIT;
Doesn't seem too bright: events might get posted in the begining of (rather 
long) transaction and there are many active writers.

2- 
:
{
 BEGIN;
 LOCK TABLE events IN SHARE MODE NOWAIT; -- block writers
}
SELECT * FROM events
COMMIT;
Intuitively I believe backing-off with NOWAIT is better (since readers 
performance is not that important).

Could You suggest better ways to solve this problem?,
maybe I'm missing something obviuos here.

Thanks,
Tomas


[1]{LISTEN/NOTIFY is used for "new-event-arrived" notifications}

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] PG for DataWarehouse type Queries

2007-08-03 Thread Simon Riggs
On Fri, 2007-08-03 at 15:12 +0800, Ow Mun Heng wrote:

> Is 30min - 2hours too long or is this considered "normal"??

Yes.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] PG for DataWarehouse type Queries

2007-08-03 Thread André Volpato

Ow Mun Heng escreveu:

Can anyone shed some light on this. I just would like to know if queries
for raw data (not aggregregates) is expected to take a long time.
Running times between 30 - 2 hours for large dataset pulls.

Involves lots of joins on very large tables (min 1 millon rows each
table, 300 columns per table)

Joins are done in the form of Left joins (sometimes on the same tables,
due to normalisation)

Is 30min - 2hours too long or is this considered "normal"??


What do you mean for 'Datawarehouse queries' ?
Is there any OLAP server between youp app and Postrgres?

--
[]´s,

André Volpato
ECOM Tecnologia Ltda
[EMAIL PROTECTED]
(41) 3014 2322



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] What do people like to monitor (or in other words, what might be nice in pgsnmpd)?

2007-08-03 Thread Gavin M. Roy
Hmm.. also data such as what is the background writer currently doing, where
are we at in checkpoint segments, how close to checkpoint timeouts are we,
etc.
On 8/2/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
>
> Josh Tolley escribió:
> > On 8/2/07, Gavin M. Roy <[EMAIL PROTECTED]> wrote:
> > > Are you contemplating providing access to data that's currently not
> stored
> > > in the pg_ catalog tables?  I currently monitor the statio data,
> > > transactions per second, and active/idle backends.  Things that I
> think
> > > would be useful would be average query execution time, longest
> execution
> > > time, etc.  Other pie in the sky ideas would include current level of
> total
> > > bloat in a database, total size on disk of a database broken down by
> tables,
> > > indexes, etc.
> >
> > My own goal is to have pgsnmpd able, as much as possible, to fill the
> > same role the set of scripts an arbitrary PostgreSQL DBA sets up on a
> > typical production server. That includes statistics tables and catalog
> > tables, but certainly isn't limited to just that. So doing things like
> > categorizing total sessions in interesting and useful ways (for
> > instance, # of idle connections, # of active connections, max
> > transaction length, etc.) are certainly within pgsnmpd's purview.
>
> More ideas: autovacuum metrics, for example how long since the last
> vacuum of tables, age(pg_class.relfrozenxid), how many dead tuples there
> are, pg_class.relpages (do tables shrink, grow or stay constant-size?),
> etc.
>
> --
> Alvaro Herrera  Developer,
> http://www.PostgreSQL.org/
> "La felicidad no es mañana. La felicidad es ahora"
>


Re: index bloat WAS: [GENERAL] reindexing pg_shdepend

2007-08-03 Thread Tom Lane
Michael Fuhr <[EMAIL PROTECTED]> writes:
> On Thu, Aug 02, 2007 at 10:40:24PM -0400, Joseph S wrote:
>> I do use autovac.  Like I said they don't get really out of hand, only 
>> up to 20 megs or so before I noticed that it was weird.  The large 
>> indexes are what tipped me off that something strange was going on.

> Unexpected bloat in pg_shdepend led me to discover a problem with
> statistics for shared tables a couple of months ago:

> http://archives.postgresql.org/pgsql-hackers/2007-06/msg00190.php
> http://archives.postgresql.org/pgsql-hackers/2007-06/msg00245.php

Hmm ... that problem would have caused autovac to mostly ignore the
shared tables, but in such a scenario you'd expect the table itself
and both indexes to all be bloated.  The thing that struck me about
Joseph's report was that the one index was so much more bloated than
the other.  The index entries are only slightly larger (3 OIDs not 2)
so there's no obvious reason for this.

The fact that the indexes are bloated and the table itself not can be
explained by a history of manual VACUUM FULLs, but that should have
had similar effects on both indexes.

We know that vacuum's inability to merge mostly-but-not-entirely-empty
index pages can lead to index bloat given a sufficiently unfriendly
usage pattern, and I think that must be what happened here, but I'm not
clear what that usage pattern is.  If we had those details we could
possibly work around it by changing the column ordering in the index
--- AFAIR there isn't any particular reason for
pg_shdepend_depender_index to have one column order rather than another.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] What do people like to monitor (or in other words, what might be nice in pgsnmpd)?

2007-08-03 Thread Jeff Davis
On Wed, 2007-08-01 at 20:41 -0600, Josh Tolley wrote:
> So please respond, if you feel so inclined, describing things you like
> to monitor in your PostgreSQL instances as well as things you would
> like to be able to easily monitor in a more ideal world. Many thanks,
> and apologies for any breach of netiquette I may have committed in
> posting to two lists simultaneously.

I think there's also a related question here: can we develop
implementations of these measurements that satisfy a lot of DBAs?

For instance, when I measure idle transactions, I poll periodically for
any transactions that have been idle for more than 1 minute. That's
simple and probably useful to a lot of DBAs to catch certain types of
problems. This would probably be useful as a trap, or could be polled. 

However, some of the ideas, like trying to come up with numbers that
represent the amount of time queries are waiting on locks, or the
behavior of checkpoints/bgwriter, aren't as obvious to me. If one person
posts their script to monitor one of these things, will other DBAs want
to use the same instrumentation, or would they end up reinventing it
anyway? Can the numbers be effectively graphed with something like
OpenNMS on a 5-minute poll interval, and maybe have effective thresholds
for notifications?

I think -- even aside from pgsnmpd -- a lot of people would be
interested in seeing a variety of monitoring/notification scripts used
by other DBAs.

Also, here are some relevant pgfoundry projects:
http://pgfoundry.org/projects/nagiosplugins/
http://pgfoundry.org/projects/pgtools/

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] virtual database

2007-08-03 Thread Farhan Mughal

Does PostgreSQL support a Virtual Database like Oracle?

--Farhan






  ___ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today 
http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 

Re: [GENERAL] index bloat WAS: reindexing pg_shdepend

2007-08-03 Thread Tom Lane
OK, as far as I saw you never mentioned what PG version you are running,
but if it's 8.2.x then I think I know what's going on.  The thing that
was bothering me was the discrepancy in size of the two indexes.  Now
the entries in pg_shdepend_reference_index are all going to be
references to roles, and the index dump you sent me showed that they're
all references to the *same* role, ie, there's only one user ID doing
all the creation and deletion of temp tables.  On the other hand, the
entries in pg_shdepend_depender_index will be pg_class and pg_type
references with all different OIDs as the OID counter advances over
time.  So the keys in pg_shdepend_depender_index are all distinct
whereas the ones in pg_shdepend_reference_index are mostly the same.

What apparently is happening is that the 8.2 optimization to avoid
splitting btree pages when we are able to make room by removing
LP_DELETED index tuples kicks in for pg_shdepend_reference_index but not
for pg_shdepend_depender_index.  I think this is precisely because the
keys are all the same in the former: when we scan to mark the latest
temp table's entry deleted, we will visit all the other keys that aren't
yet LP_DELETED, and if they are now dead they'll get marked, and then
the next time we fill up the page we will find we can remove them.  But
in pg_shdepend_reference_index there is not any reason for the system to
revisit an old index entry and discover that it can be marked LP_DELETED.
So those entries will stay there until they get vacuumed.

In short, I think what happened is that pg_shdepend bloated because
autovacuum wasn't touching it (because of that statistics bug), and
pg_shdepend_depender_index bloated right along with it, but
pg_shdepend_reference_index didn't bloat because it was able to recycle
tuples sooner.  The current state you report (in which the table is
small too) would have been reached after a manual VACUUM or VACUUM FULL.

I was able to duplicate this behavior by having an unprivileged user
create and immediately drop a temp table, several thousand times in
succession, and then finally vacuuming pg_shdepend (autovac was off to
prevent bollixing the experiment).  The vacuum shows

regression=# vacuum verbose pg_shdepend;
INFO:  vacuuming "pg_catalog.pg_shdepend"
INFO:  scanned index "pg_shdepend_depender_index" to remove 4000 row versions
DETAIL:  CPU 0.00s/0.03u sec elapsed 0.02 sec.
INFO:  scanned index "pg_shdepend_reference_index" to remove 4000 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_shdepend": removed 4000 row versions in 26 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_shdepend_depender_index" now contains 1 row versions in 19 
pages
DETAIL:  4000 index row versions were removed.
15 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_shdepend_reference_index" now contains 1 row versions in 2 
pages
DETAIL:  346 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_shdepend": found 4000 removable, 1 nonremovable row versions in 26 
pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
26 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.03u sec elapsed 0.02 sec.
INFO:  "pg_shdepend": truncated 26 to 1 pages
DETAIL:  CPU 0.01s/0.00u sec elapsed 0.01 sec.
VACUUM

Notice that pg_shdepend_reference_index never actually split at all ---
it's still just the metapage and the root page.  So every time the root
filled, it was able to clean most of it out.

In short: nothing much to see here after all.  We'll just have to keep
in mind that the LP_DELETED recycling patch can have drastically
different effectiveness in different indexes of the same table, and
so it's very possible now for indexes to be of much different sizes.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Suse RPM's

2007-08-03 Thread Jorge Godoy
On Friday 03 August 2007 10:04:27 Devrim GÜNDÜZ wrote:
> Complain to Reinhard, he is CC'ed to this e-mail.
>
> ftp://ftp.suse.com/pub/projects/postgresql/
>
> has only 8.2.0...
>
> BTW, I have promised to build SuSE RPMs some time ago; however Reinhard
> said that they will keep the packages up2date.
>
> Reinhard, if SuSE is busy and/or is not willing to keep the packages
> updated, I can also build SuSE RPMs for the community...
>
> Kind regards,

Having updated packages would be great!  Specially if they would be updated 
for, e.g., OpenSuSE 10.1 and OpenSuSE 10.2 (both already released) and not 
just for OpenSuSE 10.3 (still under "development").

Having packages for some of the older releases makes using both PostgreSQL and 
OpenSuSE eaiser (for PostgreSQL it doesn't make much difference since we can 
compile things by hand, but would definitely draw more attention to 
OpenSuSE...  I had a few cases where it had to be handed down because of the 
lack of updates -- not security related, of course -- to "older" releases).

-- 
Jorge Godoy  <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] Select question..... is there a way to do this?

2007-08-03 Thread Karl Denninger

Assume the following tables:

Table ITEM (user text, subject text, number integer, changed timestamp);
table SEEN (user text, number integer, lastviewed timestamp);

Ok, now the data in the "SEEN" table will have one tuple for each user 
and number in the table ITEM which a user has viewed, and the last time 
they looked at it.


It will also have one entry per user with a NULL number, which will be 
written to the table when the user scans the table and finds no new 
records (to keep the "SEEN" table from becoming exponentially large as 
otherwise it would require USER X ITEMs tuples!); that record marks the 
last time the user was there and there were no newer ITEMs.


If I do the following query:

select item.user, item.subject, item.number from item, seen where 
(item.user = seen.user) and (item.number = seen.number) and 
(item.changed > seen.lastviewed);


I get all items which have a record in SEEN.  So far so good.

But what I want is all items which EITHER have (1) a record in SEEN 
which matches (and is older) OR which are NEWER than the SEEN record 
with a NULL "number".


That I'm having trouble doing.

The following does not do what I want:

select item.user, item.subject, item.number from item, seen where 
(item.user = seen.user and item.number = seen.number and item.changed > 
seen.lastviewed) OR (item.user = seen.user and item.changed > 
seen.lastviewed and seen.number is null);


That second query returns TWO entries for a SEEN record (which I can 
control out with "Select distinct") BUT it breaks in another nasty way - 
if I have an item that has a CHANGED time that is later than the null 
record, *IT GETS SELECTED EVEN IF IT HAS A DISTINCT RECORD*.  That's no 
good.


Ideas?

(Attempted to use "AND NOT" as a conditional on the second clause to the 
OR and that didn't work; it excluded all of the NULL records)


--
Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Select question..... is there a way to do this?

2007-08-03 Thread Rodrigo De León
On Aug 3, 11:17 pm, [EMAIL PROTECTED] (Karl Denninger) wrote:
> Ideas?

SELECT item.user, item.subject, item.number
  FROM item, seen
 WHERE item.user = seen.user
   AND item.number = seen.number
   AND item.changed > seen.lastviewed
UNION
SELECT item.user, item.subject, item.number
  FROM item, seen
 WHERE item.user = seen.user
   AND seen.number IS NULL
   AND item.changed > seen.lastviewed;


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings