[PERFORM] pls reply ASAP

2006-04-09 Thread Chethana, Rao (IE10)









Hello!



Kindly go through the following ,





I wanted to know whether, the command line arguments(function
arguments) -- $1 $2 $3 -- can be used as in
the following , like, 





CREATE
TYPE TT AS(something,something,etc);

CREATE OR REPLACE
FUNCTION f1(varchar,varchar,varchar,varchar) RETURNS ..(something).



BEGIN

SELECT a1,a2,a3,a4,a5,a6

FROM (SELECT
* FROM T1, T2WHERE etc
Flag = 0 $1 $2 $3 $4)

ORDER BY 


.

RETURN
NEXT ;


END LOOP;


RETURN;

END;

'
LANGUAGE 'plpgsql';



NOTE :
The values for $1 $2 $3 $4 will be passed when the function
is invoked(called) from the command prompt.



I tried implementing
the above, but this type of usage is not supported , how should use
it? 



I am converting
from (sprintf, SELECT query stmts (which uses %s %s %s %s
 ) to functions.





Any help will be
deeply appreciated. Thank you.





Kind regards,


Chethana.














[PERFORM]

2006-04-09 Thread Doron Baranes








Hi



I
am new at postgres and I'm having performance issues.

I
am running on postgres 7.4.6 on a pineapp with 512MB RAM.

I
did a database vacuum analyze and rebuild my indexes.

When
I perform queries on tables of 2M-10M of rows it takes several minutes and

I
see at sar and top that the cpu and memory is heavily used.



I
would be glad for guidance on server parameters or other configurations which
would help.



10x.


Doron.








Re: [PERFORM]

2006-04-09 Thread Luckys
It'd be helpful if posted with the EXPLAIN of the slow running queries on the respective table.

cool.
L.

On 4/9/06, Doron Baranes [EMAIL PROTECTED] wrote:



Hi

I am new at postgres and I'm having performance issues.
I am running on postgres 7.4.6 on a pineapp with 512MB RAM.
I did a database vacuum analyze and rebuild my indexes.
When I perform queries on tables of 2M-10M of rows it takes several minutes and

I see at sar and top that the cpu and memory is heavily used.

I would be glad for guidance on server parameters or other configurations which would help.


10x.
Doron.


Re: [PERFORM]

2006-04-09 Thread Ragnar
On sun, 2006-04-09 at 12:47 +0200, Doron Baranes wrote:
 Hi
 

 I am running on postgres 7.4.6 on a pineapp with 512MB RAM.
 
 I did a database vacuum analyze and rebuild my indexes.

If you have previously done a lot of deletes or updates
without regular vacuums, you may have to do a
  VACUUM FULL ANALYZE
once to get the table into normal state.

After this, regular normal VACUUM ANALYZE should be
enough.

 When I perform queries on tables of 2M-10M of rows it takes several
 minutes and

We would need to see the output of EXPLAIN ANALYZE
for your query, along with some information about
the schema of the tables involved, such as what indexes
have been created.

Also, let us know about any non-default configuration. 

gnari



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] pls reply ASAP

2006-04-09 Thread Rajesh Kumar Mallah
On 4/9/06, Chethana, Rao (IE10) [EMAIL PROTECTED] wrote: Hello! 
  Kindly go through the following ,   
I wanted to know whether, the command line arguments(function
arguments) --$1 $2 $3 --can be
usedas in the following , like, CREATE TYPE TT AS(something,something,……etc……);  CREATE OR REPLACE FUNCTION f1(varchar,varchar,varchar,varchar) RETURNS ………..(something).


the overall idea expressed is doable.
following are comments

1. you have to put RETURNS setof TT (if you plan to return TT) since you used RETURN NEXT
2. you have to use SELECT INTO rec in the function where rec is rowtype TT

hope it helps

--- non technical comments --
3. its not a performance question , it shud have been marked more appropriately to pgsql-sql i think.
4. its not a good etiquette to address email to someone and mark Cc to a list.

kind regds
mallah.
  BEGIN SELECT a1,a2,a3,a4,a5,a6 FROM (SELECT * FROM T1, T2……WHERE etc… Flag = 0 $1 $2 $3 $4) ORDER
BY
  …….  RETURN NEXT ………;  END LOOP;  RETURN;  END;
  ' LANGUAGE 'plpgsql';  
NOTE :The values for$1 $2 $3
$4will be passed when the function is
invoked(called)from the command prompt.  
Itried implementing the above, butthis type of
usage is not supported , how should use it?  I
am convertingfrom (sprintf, "SELECT query stmts (which
uses %s %s %s %s……
)to
functions.Any help will be deeply appreciated.Thank you.Kind regards,  Chethana.
   


[PERFORM] serious problems with vacuuming databases

2006-04-09 Thread Tomas Vondra
Hello,

we have some performance problems with postgres 8.0.4, more precisely
with vacuuming 'large' database with a lot of deleted rows.

We had a 3.2 GB database, consisting mainly from 4 large tables, two of
them (say table A and B) having about 14.000.000 of rows and 1 GB of
size each, and two (say C and D) having about 4.000.000 of rows and 500
MB each. The rest of the database is not important.

We've decided to remove unneeded 'old' data, which means removing about
99.999% of rows from tables A, C and D (about 2 GB of data). At the
beginning, the B table (containing aggregated from A, C and D) was
emptied (dropped and created) and filled in with current data. Then,
before the deletion the data from tables A, C, D were backed up using
another tables (say A_old, C_old, D_old) filled in using

   INSERT INTO A SELECT * FROM A_old ...

and fixed so there are no duplicities (rows both in A and A_old). Then
these data were deleted from A, C, D and tables A_old, C_old and D_old
were dumped, truncated and all the tables were vacuumed (with FULL
ANALYZE options). So the procedure was this

1) drop, create and fill table B (aggregated data from A, C, D)
2) copy 'old' data from A, C and D to A_old, C_old a D_old
3) delete old data from A, C, D
4) dump data from A_old, C_old and D_old
5) truncate tables A, C, D
6) vacuum full analyze tables A, C, D, A_old, C_old and D_old

So the dump of the fatabase has about 1.2 GB of data, from which about
1 GB is in the B table (the one rebuilt in step 1). This was done yesterday.

The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for
the whole database, and it runs for about 10 hours already, which is
much more than usual (and it is still running).

The hardware is not too bad - it's Dell server with 2 x 3.0 GHz P4 HT,
4GB of RAM, 2x15k SCSI drives in hw RAID etc.

The question is why this happens and how to get round that. I guess it's
caused by a huge amount of data deleted yesterday, but on the other side
all the modified tables were vacuumed at the end. But I guess dropping
and reloading the whole database would be much faster (at most 1.5 hour
including creating indexes etc.)

thanks for your advices
Tomas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-04-09 Thread Bruce Momjian

Added to TODO:

* Experiment with multi-threaded backend better resource utilization
  This would allow a single query to make use of multiple CPU's or
  multiple I/O channels simultaneously.


---

Chris Browne wrote:
 [EMAIL PROTECTED] (Michael Stone) writes:
 
  On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
 A naive read on this is that you might start with one backend process,
 which then spawns 16 more.  Each of those backends is scanning through
 one of those 16 files; they then throw relevant tuples into shared
 memory to be aggregated/joined by the central one.
 
  Of course, table scanning is going to be IO limited in most cases, and
  having every query spawn 16 independent IO threads is likely to slow
  things down in more cases than it speeds them up. It could work if you
  have a bunch of storage devices, but at that point it's probably
  easier and more direct to implement a clustered approach.
 
 All stipulated, yes.  It obviously wouldn't be terribly useful to scan
 more aggressively than I/O bandwidth can support.  The point is that
 this is one of the kinds of places where concurrent processing could
 do some good...
 -- 
 let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;;
 http://cbbrowne.com/info/spiritual.html
 Save the whales. Collect the whole set. 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 

-- 
  Bruce Momjian   http://candle.pha.pa.us

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] serious problems with vacuuming databases

2006-04-09 Thread Tomas Vondra
Tom Lane wrote:
 Tomas Vondra [EMAIL PROTECTED] writes:
 1) drop, create and fill table B (aggregated data from A, C, D)
 2) copy 'old' data from A, C and D to A_old, C_old a D_old
 3) delete old data from A, C, D
 4) dump data from A_old, C_old and D_old
 5) truncate tables A, C, D
 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old
 
 Steps 3/5/6 make no sense at all to me: why bother deleting data retail
 when you are about to truncate the tables, and why bother vacuuming a
 table you just truncated?  Is the above *really* what you did?

Yes, the above is exactly what I did with the exception that there's an
error in the step (5) - there should be truncation of the _old tables.
The reasons that led me to this particular steps are two:

(a) I don't want to delete all the data, just data older than two days.
Until today we've kept all the data (containing two years access log
for one of our production websites), but now we've decided to remove
the data we don't need and leave just the aggregated version. That's
why I have used DELETE rather than TRUNCATE.

(b) I want to create 'incremental' backups, so once I'll need the data
I can take several packages (dumps of _old tables) and import them
one after another. Using pg_dump doesn't allow me this - dumping the
whole tables A, C and D is not an option, because I want to leave
some of the data in the tables.

From now on, the tables will be cleared on a daily (or maybe weekly)
basis, which means much smaller amount of data (about 50.000 rows
a day).
 
 The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for
 the whole database, and it runs for about 10 hours already, which is
 much more than usual (and it is still running).
 
 Is it actually grinding the disk, or is it just blocked waiting for
 someone's lock?  If it's actually doing work, which table is it working
 on?  (You should be able to figure that out by looking in pg_locks,
 or by strace'ing the process to see which files it's touching.)

Thanks for the hint, I'll try to figure that in case the dump/reload
recommended by Alvaro Herrera doesn't help. But as far as I know the
disks are not grinded right now, so I guess it's the problem with indexes.

t.v.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] serious problems with vacuuming databases

2006-04-09 Thread Tomas Vondra
 Probably the indexes are bloated after the vacuum full.  I think the
 best way to get rid of the fat is to recreate both tables and indexes
 anew.  For this the best tool would be to CLUSTER the tables on some
 index, probably the primary key.  This will be much faster than
 VACUUMing the tables, and the indexes will be much smaller as result.

I guess you're right. I forgot to mention there are 12 composed indexes
on the largest (and not deleted) table B, having about 14.000.000 rows
and 1 GB of data. I'll try to dump/reload the database ...

t.v.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] serious problems with vacuuming databases

2006-04-09 Thread Alvaro Herrera
Tomas Vondra wrote:
  Probably the indexes are bloated after the vacuum full.  I think the
  best way to get rid of the fat is to recreate both tables and indexes
  anew.  For this the best tool would be to CLUSTER the tables on some
  index, probably the primary key.  This will be much faster than
  VACUUMing the tables, and the indexes will be much smaller as result.
 
 I guess you're right. I forgot to mention there are 12 composed indexes
 on the largest (and not deleted) table B, having about 14.000.000 rows
 and 1 GB of data. I'll try to dump/reload the database ...

Huh, I didn't suggest to dump/reload.  I suggested CLUSTER.  You need to
apply it only to tables where you have lots of dead tuples, which IIRC
are A, C and D.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] serious problems with vacuuming databases

2006-04-09 Thread Tomas Vondra
 I guess you're right. I forgot to mention there are 12 composed indexes
 on the largest (and not deleted) table B, having about 14.000.000 rows
 and 1 GB of data. I'll try to dump/reload the database ...

Aaargh, the problem probably is not caused by the largest table, as it
was dropped, filled in with the data and after that all the indexes were
created. The problem could be caused by the tables with deleted data, of
course.

t.v.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] serious problems with vacuuming databases

2006-04-09 Thread Tomas Vondra
 Huh, I didn't suggest to dump/reload.  I suggested CLUSTER.  You need to
 apply it only to tables where you have lots of dead tuples, which IIRC
 are A, C and D.

Sorry, I should read more carefully. Will clustering a table according
to one index solve problems with all the indexes on the table (if the
table has for example two indexes?).

t.v.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] serious problems with vacuuming databases

2006-04-09 Thread Alvaro Herrera
Tomas Vondra wrote:
  Huh, I didn't suggest to dump/reload.  I suggested CLUSTER.  You need to
  apply it only to tables where you have lots of dead tuples, which IIRC
  are A, C and D.
 
 Sorry, I should read more carefully. Will clustering a table according
 to one index solve problems with all the indexes on the table (if the
 table has for example two indexes?).

Yes, it will rebuild all indexes.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] slow IN clause

2006-04-09 Thread Qingqing Zhou

[EMAIL PROTECTED] wrote
 I have a slow sql:
 SELECT * FROM mytable WHERE id IN (1,3,5,7,3k here...);
 mytable is about 10k rows.

 if don't use the IN clause, it will cost 0,11 second, otherwise it
 will cost 2.x second
 I guess pg use linear search to deal with IN clause, is there any way
 to let pg use other search method with IN clause? (ex.Binary Search or
 hash Search)


If you can put (1, 3, .., 3k) in a table, PG may choose a hash join.

Regards,
Qingqing



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] OT: Data structure design question: How do they count so fast?

2006-04-09 Thread Brendan Duddridge

Hi Brandon,

Thanks for your suggestion. I'll think about that one. Part of the  
problem is also trying to figure out what the remaining set of  
attributes and attribute values are, so that slows it down  
considerably too. There are many many combinations of attribute  
values that can be clicked on.


More work to do!

Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 9, 2006, at 9:56 PM, Brandon Hines wrote:


Brendan,

I have a number of applications the require similar functionality.   
What I typically do is to create a count table that gets updated  
with a trigger.  But instead of keeping absolute counts, I keep  
counts of the smallest necessary element.  For example, I have a  
table with approx 12 million elements, each element belongs to one  
of a thousand classes of elements.  The materialized view table  
with only about a thousand rows is small enough for sum() queries  
of various classes fast enough for web pages.


-Brandon

Brendan Duddridge wrote:

Hi,
First of all, the reason I'm posting on the PostgreSQL Performance  
list is we have a performance issue with one of our applications  
and it's related to the speed at which PostgreSQL can do counts.  
But it's also related to the data structure we've designed to  
develop our comparison shopping engine. It's very normalized and  
the speed of our queries are slowing down considerably as we add  
more and more data.
So I've been looking at some of the other comparison shopping  
engines and I'm trying to figure out how they manage to get the  
counts of their products for a set of attributes and attribute  
values so quickly.

For example, on the following page, they have 1,260,658 products:
http://www.mysimon.com/Home-Furnishings/9000-10975_8-0.html?tag=glnav
They display 3 attributes with values on the page: Price Range,  
Home Furnishing Type, and Store. Plus there are a selection of  
other attributes not displaying the value choices.
For Price Range, they have the following values and product counts  
(in brackets):

# Below $20 (204,315)
# $20 - $50 (234,694)
# $50 - $80 (188,811)
# $80 - $130 (182,721)
# $130 - $240 (222,519)
For Home Furnishing Type they have:
# Wall Art and Decor (438,493)
# Lighting (243,098)
# Bathroom Furnishings (132,441)
# Rugs (113,216)
# Decorative Accents (65,418)
And for Store they have:
# Art.com (360,933)
# HomeAnnex (130,410)
# AllPosters.com (72,529)
# HomeClick.com (61,423)
# 1STOPlighting Superstore (32,074)
Now, initially I thought they would just pre-compute these counts,  
but the problem is, when you click on any of the above attribute  
values, they reduce the remaining possible set of matching  
products (and set of possible remaining attributes and attribute  
values) by the amount displayed next to the attribute value  
selected. You can click on any combination of attribute values to  
filter down the remaining set of matching products, so there's a  
large combination of paths you can take to arrive at a set of  
products you might be interested in.
Do you think they are pre-computed? Or do you think they might use  
a query similar to the following?:

select pav.attribute_value_id, count(p.product_id)
from product_attribute_value pav,
 attribute a,
 product p
where a.attribute_id in (some set of attribute ids) and
pav.product_id = p.product_id and
pav.attribute_id = a.attribute_id and p.product_id in
(select product_id
 from category_product
 where category_id = some category id) and
p.is_active = 'true'
group by pav.attribute_value_id;
It would seem to me that although the above query suggests a  
normalized database structure, that joining with 3 tables plus a  
4th table in the sub-query with an IN qualifier and grouping to  
get the product counts would take a VERY long time, especially on  
a possible result set of 1,260,658 products. The other issue is  
trying to figure out what the remaining set of possible attribute  
and attribute values are in order to reach the remaining set of  
products.
Does anyone have any insights into what kind of data structures  
would be necessary to accomplish such a feat? I know that  
PostgreSQL has performance issues with counting rows, so I can't  
imagine being able to use the above kind of query to get the  
results we need. I don't know what kind of database backend  
mysimon is using either. It would also seem to be logical that  
having a flattened data structure would seem to be necessary in  
order to get the performance required. Their pages seem to load  
pretty fast.
We are possibly considering some kind of pre-computed decision- 
tree type data structure to get the counts of the set of products  
that could be reached by selecting any combination of attributes  
and attribute values. Does this seem like a reasonable idea?