Re: [PERFORM] auto vacuum, not working?

2012-01-13 Thread Mario Weilguni

Am 13.01.2012 13:08, schrieb Anibal David Acosta:


Hi,

yesterday I delete about 200 million rows of a table (about 150GB of 
data), after delete completes the autovacuum process start.


The autovacuum is running for about 11 hours but no space is released

Autovacuum parameters are with default values in postgresql.conf

The postgres version is 9.0.3

The pg activity reports:

select (now()-query_start) as duration, waiting, current_query from 
pg_stat_activity where current_query ilike '%auto%'


10:42:19.829   f  "autovacuum: VACUUM ANALYZE 
public.myTable"


How can I release the space used by deleted rows? Without block the table.

Thanks!



vacuum does not reclaim space, just marks tuples dead. You need vacuum full.


Re: [PERFORM] Postgresql 9.0.6 Raid 5 or not please help.

2011-12-23 Thread Mario Weilguni

Am 23.12.2011 08:05, schrieb Scott Marlowe:

On Thu, Dec 22, 2011 at 11:18 PM, tuanhoanganh  wrote:

Thanks for your answer. But how performance between raid5 and one disk.

One disk will usually win, 2 disks (in a mirror) will definitely win.
RAID-5 has the highest overhead and the poorest performance,
especially if it's degraded (1 drive out) that simple mirroring
methods don't suffer from.  But even in an undegraded state it is
usually the slowest method.  RAID-10 is generally the fastest with
redundancy, and of course pure RAID-0 is fastest of all but has no
redundancy.

You should do some simple benchmarks with something like pgbench and
various configs to see for yourself.  For extra bonus points, break a
mirror (2 disk ->  1 disk) and compare it to RAID-5 (3 disk ->  2 disk
degraded) for performance.  The change in performance for a RAID-1 to
single disk degraded situation is usually reads are half as fast and
writes are just as fast.  For RAID-5 expect to see it drop by a lot.

I'm not so confident that a RAID-1 will win over a single disk. When it 
comes to writes, the latency should be  ~50 higher (if both disk must 
sync), since the spindles are not running synchronously. This applies to 
softraid, not something like a battery-backend raid controller of course.


Or am I wrong here?




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large number of short lived connections - could a connection pool help?

2011-11-15 Thread Mario Weilguni

Am 15.11.2011 01:42, schrieb Cody Caughlan:

We have anywhere from 60-80 background worker processes connecting to
Postgres, performing a short task and then disconnecting. The lifetime
of these tasks averages 1-3 seconds.

I know that there is some connection overhead to Postgres, but I dont
know what would be the best way to measure this overheard and/or to
determine if its currently an issue at all.

If there is a substantial overheard I would think that employing a
connection pool like pgbouncer to keep a static list of these
connections and then dole them out to the transient workers on demand.

So the overall cumulative number of connections wouldnt change, I
would just attempt to alleviate the setup/teardown of them so quickly.

Is this something that I should look into or is it not much of an
issue? Whats the best way to determine if I could benefit from using a
connection pool?

Thanks.

I had a case where a pooler (in this case pgpool) resulted in a 140% 
application improvement - so - yes, it is probably a win to use a 
pooling solution.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Problem with postgresql 9.03, 8GB RAM,Quadcore Processor Server--Need help!!!!!!!

2011-11-08 Thread Mario Weilguni

Am 08.11.2011 13:15, schrieb Mohamed Hashim:

Hi Sam,Tomas

In my first post i have mentioned all how much shared (shared buffers, 
effective cache size, work mem, etc.) and my OS and hardware 
information and what are the basic settings i have changed


and regarding Explain analyze i gave one sample query because if i 
tune that particular table which has records almost 16crore i thought 
my problem will solve...


Just curios, are those array items [1] and [2] just samples and you 
actually use more which are performance-related (used as condition)? If 
just those two are relevant I would change the schema to use real 
columns instead. And you seem to use partitioning, but you have no 
partition condition?


Re: [PERFORM] Performance Problem with postgresql 9.03, 8GB RAM,Quadcore Processor Server--Need help!!!!!!!

2011-11-03 Thread Mario Weilguni

Am 03.11.2011 17:08, schrieb Tomas Vondra:

On 3 Listopad 2011, 16:02, Mario Weilguni wrote:

No doubt about that, querying tables using conditions on array columns is
not the best direction in most cases, especially when those tables are
huge.

Still, the interesting part here is that the OP claims this worked just
fine in the older version and after an upgrade the performance suddenly
dropped. This could be caused by many things, and we're just guessing
because we don't have any plans from the old version.

Tomas




Not really, Mohamed always said he has 9.0.3, Marcus Engene wrote about 
problems after the migration from 8.x to 9.x. Or did I miss something here?


Regards,
Mario


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Problem with postgresql 9.03, 8GB RAM,Quadcore Processor Server--Need help!!!!!!!

2011-11-03 Thread Mario Weilguni

Am 02.11.2011 08:12, schrieb Mohamed Hashim:

Dear All

Thanks for your suggestions & replies.

The below are the sample query which i put for particular one bill_id

EXPLAIN ANALYZE SELECT abd.bill_no as 
bill_no,to_char(abd.bill_date,'dd/mm/') AS date,mp.product_desc as 
product_desc,std.quantity,std.area,rip.price AS rate

FROM acc_bill_items_106 abi
JOIN acc_bill_details_106 abd ON abd.bill_id=abi.bill_id
JOIN stk_source ss ON  ss.source_detail[1]=1 and 
ss.source_detail[2]=abi.item_id

JOIN stock_transaction_detail_106 std ON std.stock_id=ss.stock_id
JOIN stock_details_106 sd106 ON sd106.stock_id=std.stock_id
JOIN master_product_106 mp ON mp.product_id= sd106.product_id
JOIN receipt_item_price_106 rip ON rip.receipt_item_id=abi.item_id
WHERE abi.bill_id=12680;


First I would try this:
explain analyze select * from stk_source where source_detail[1] = 1;
explain analyze select * from stk_source where source_detail[2] = 12356;

Both times you'll get sequential scans, and that's the root of the 
problem. Oh, you mentioned that you use partitioning, but there seems to 
be no condition for that.


You should really rethink your database schema, at least try to pull out 
all indexable fields out of that int[] into columns, and use indices on 
those fields.


Regards
Mario







Re: [PERFORM] Which RAID Controllers to pick/avoid?

2011-02-03 Thread Mario Weilguni

Am 03.02.2011 00:15, schrieb Dan Birken:
I'm setting up a dedicated linux postgres box with 2x300GB 15k SAS 
drive in a RAID 1, though if future load dictates we would like to be 
able to upgrade to RAID 10.  The hosting provider offers the following 
options for a RAID controller (all are the same price):
Adaptec at least has good tools for managing the controller, and 
performance in our RAID-1 (DB) and RAID-5 setups (Files) is very good. I 
don't think you can do wrong with the Adaptec controllers.


Can't say much regarding LSI, but avoid cheap HP controllers.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-10 Thread Mario Weilguni

Aidan Van Dyk schrieb:

* Mario Weilguni <[EMAIL PROTECTED]> [081210 07:31]:

  
Why not? I know it's not performing as good as RAID-10, but it does not  
waste 50% diskspace. RAID-6 is no option, because the performance is  
even worse. And, on another system with RAID-5 + spare and SAS drives,  
the same controller is working very well.



Like Scott said, it's all about trade-offs.

With raid5, you get abysmal write performance, "make me not sleep at
night" inconsistent parity issues, and a degraded mode that will a
nightmare  ...

... and as a trade-off you save a little money, and get good "read only"
performance ...

... as long as you don't ever have a disk or system crash ...

... or can afford to rebuild if you do ...

... etc ...
  


In fact, for this system we're currently going to RAID10, I'm convinced 
now. With other systems we have, RAID5 is a safe option for one reason, 
the machines are clusters, so we have (sort of) RAID50 here:

Machine A/RAID5 <-- DRBD --> Machine B/RAID5

Seems reliable enough for me. But in this case, the machine will be 
standalone, and so RAID5 might really not be the best choice.



However, I'm pretty sure we'll have the same problems with RAID10, the 
problem seems  to have to do with P400 and/or SATA drives.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-10 Thread Mario Weilguni

Scott Marlowe schrieb:

On Wed, Dec 10, 2008 at 12:45 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:
  

A customer of us uses the P400 on a different machine, 8 SAS drives (Raid 5
as well), and the performance is very, very good. So we thought it's a good
choice. Maybe the SATA drives are the root of this problem?



What tests have you or the customer done to confirm that performance
is very very good?  A lot of times the system is not as fast as the
customer thinks, it's just faster than it was before and they're
happy.  Also, there could be problems in the driver or firmware on
your P400 versus the customer one.  I'd look for those differences as
well.  I doubt SATA versus SAS is the problem, but who knows...

  
Well, I cannot take the box offline to make usefull tests like tiobench 
or bonnie, but even with the current service running I get from a simple 
dd between 270 and 340 MB/sec sustained read over 30% of the disk.


It also performed extremly good when I put the box into production, 
pg_bench values were impressing, but I do not have them at hand.


However, currently we are seriously considering dropping RAID5 in favor 
of RAID10, we will test this week if this performs better.


Regards
Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-10 Thread Mario Weilguni

Aidan Van Dyk schrieb:

* Joshua D. Drake <[EMAIL PROTECTED]> [081209 11:01]:
 
  

Yes the SmartArray series is quite common and actually know to perform
reasonably well, in RAID 10. You still appear to be trying RAID 5.



*boggle* 


Are people *still* using raid5?

/me gives up!

  


Why not? I know it's not performing as good as RAID-10, but it does not 
waste 50% diskspace. RAID-6 is no option, because the performance is 
even worse. And, on another system with RAID-5 + spare and SAS drives, 
the same controller is working very well.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Mario Weilguni

Scott Marlowe schrieb:

On Tue, Dec 9, 2008 at 5:17 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:
  

Alan Hodgson schrieb:


Mario Weilguni <[EMAIL PROTECTED]> wrote:

  

strange values. An individual drive is capable of delivering 91
MB/sec
 sequential read performance, and we get values ~102MB/sec out of a
8-drive RAID5, seems to be ridiculous slow.
  

What command are you using to test the reads?

Some recommendations to try:

1) /sbin/blockdev --setra 2048 device (where device is the partition or
LVM volume)

2) Use XFS, and make sure your stripe settings match the RAID.

Having said that, 102MB/sec sounds really low for any modern controller
with 8 drives, regardless of tuning or filesystem choice.


  

First, thanks alot for this and all the other answers.

I measured the raw device performance:
dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null

I get poor performance when all 8 drives are configured as one, large
RAID-5, and slightly poorer performance when configured as JBOD. In
production, we use XFS as FS, but I doubt this has anything to do with FS
tuning.



Yeah, having just trawled the pgsql-performance archives, there are
plenty of instances of people having terrible performance from HP
smart array controllers before the P800.  Is it possible for you to
trade up to a better RAID controller?  Whichever salesman sold you the
P400 should take one for the team and make this right for you.

  
A customer of us uses the P400 on a different machine, 8 SAS drives 
(Raid 5 as well), and the performance is very, very good. So we thought 
it's a good choice. Maybe the SATA drives are the root of this problem?




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Mario Weilguni

Alan Hodgson schrieb:

Mario Weilguni <[EMAIL PROTECTED]> wrote:
  

strange values. An individual drive is capable of delivering 91
MB/sec
  
sequential read performance, and we get values ~102MB/sec out of a
8-drive RAID5, seems to be ridiculous slow. 
  



What command are you using to test the reads?

Some recommendations to try:

1) /sbin/blockdev --setra 2048 device (where device is the partition or LVM 
volume)


2) Use XFS, and make sure your stripe settings match the RAID.

Having said that, 102MB/sec sounds really low for any modern controller with 
8 drives, regardless of tuning or filesystem choice.


  


First, thanks alot for this and all the other answers.

I measured the raw device performance:
dd if=/dev/cciss/c0d0 bs=64k count=10 of=/dev/null

I get poor performance when all 8 drives are configured as one, large 
RAID-5, and slightly poorer performance when configured as JBOD. In 
production, we use XFS as FS, but I doubt this has anything to do with 
FS tuning.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Mario Weilguni

Kevin Grittner schrieb:
Mario Weilguni <[EMAIL PROTECTED]> wrote: 


Has anyone benchmarked this controller (PCIe/4x, 512 MB BBC)? We try

to 
  

use it with 8x SATA 1TB drives in RAID-5 mode under Linux, and

measure 
  

strange values. An individual drive is capable of delivering 91

MB/sec 
  
sequential read performance, and we get values ~102MB/sec out of a 
8-drive RAID5, seems to be ridiculous slow. Write performance seems

to 
  

be much better, ~300 MB /sec - seems ok to me.

 
What's your stripe size?
 
-Kevin
  

We used the default settings, it's 64k. Might a bigger value help here?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-09 Thread Mario Weilguni

Scott Marlowe schrieb:

On Tue, Dec 2, 2008 at 2:22 AM, Mario Weilguni <[EMAIL PROTECTED]> wrote:
  

Has anyone benchmarked this controller (PCIe/4x, 512 MB BBC)? We try to use
it with 8x SATA 1TB drives in RAID-5 mode under Linux, and measure strange
values. An individual drive is capable of delivering 91 MB/sec sequential
read performance, and we get values ~102MB/sec out of a 8-drive RAID5, seems
to be ridiculous slow. Write performance seems to be much better, ~300 MB
/sec - seems ok to me.

I guess I must be doing something wrong, I cannot believe that a 500 €
controller is delivering such poor performance.



A few suggestions...  Try to find the latest driver for your card, try
using the card as nothing but a caching controller and run your RAID
on software in linux (or whatever IS you're on).  Test a 2 drive
RAID-0 to see what kind of performance increase you get.  If you can't
dd a big file off of a RAID-0 at about 2x the rate of a single drive
then something IS wrong with it. Try RAID 10.  Try RAID-1 sets on the
controller and RAID 0 over that in software.

  


I've already tried Softraid with individual drives, performs much 
better. However, it's no option to use softraid, so I'm stuck. The card 
has the latest firmware installed, and there are no drivers needed, 
they're already included in the linux kernel.


I still think we must be doing something wrong here, I googled the 
controller and Linux, and did not find anything indicating a problem. 
The HP SmartArray series is quite common, so a lot of users would have 
the same problem.


Thanks!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Experience with HP Smart Array P400 and SATA drives?

2008-12-02 Thread Mario Weilguni
Has anyone benchmarked this controller (PCIe/4x, 512 MB BBC)? We try to 
use it with 8x SATA 1TB drives in RAID-5 mode under Linux, and measure 
strange values. An individual drive is capable of delivering 91 MB/sec 
sequential read performance, and we get values ~102MB/sec out of a 
8-drive RAID5, seems to be ridiculous slow. Write performance seems to 
be much better, ~300 MB /sec - seems ok to me.


I guess I must be doing something wrong, I cannot believe that a 500 € 
controller is delivering such poor performance.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deteriorating performance when loading large objects

2008-11-28 Thread Mario Weilguni

Tom Lane schrieb:

"=?iso-8859-1?Q?Vegard_B=F8nes?=" <[EMAIL PROTECTED]> writes:
  

Running VACUUM VERBOSE pg_largeobject took quite some time. Here's the
output:



  

INFO:  vacuuming "pg_catalog.pg_largeobject"
INFO:  index "pg_largeobject_loid_pn_index" now contains 11060658 row
versions in 230587 pages
DETAIL:  178683 index pages have been deleted, 80875 are currently reusable.
CPU 0.92s/0.10u sec elapsed 199.38 sec.
INFO:  "pg_largeobject": found 0 removable, 11060658 nonremovable row
versions in 6849398 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 84508215 unused item pointers.
0 pages are entirely empty.
CPU 0.98s/0.10u sec elapsed 4421.17 sec.
VACUUM



Hmm ... although you have no dead rows now, the very large number of
unused item pointers suggests that there were times in the past when
pg_largeobject didn't get vacuumed often enough.  You need to look at
your vacuuming policy.  If you're using autovacuum, it might need to have
its parameters adjusted.  Otherwise, how often are you vacuuming, and
are you doing it as superuser?

  

I will try to run VACUUM ANALYZE FULL after the next delete tonight, as
suggested by Ivan Voras in another post.



Actually, a CLUSTER might be more effective.

regards, tom lane

  


Does CLUSTER really help here? On my 8.2 database, I get:
CLUSTER pg_largeobject_loid_pn_index on pg_largeobject ;
ERROR:  "pg_largeobject" is a system catalog


Has this changed in >= 8.3?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance tuning queries

2008-11-27 Thread Mario Weilguni

Kevin Kempter schrieb:

Hi All;

I'm looking for tips / ideas per performance tuning some specific queries. 
These are generally large tables on a highly active OLTP system 
(100,000 - 200,000 plus queries per day)


First off, any thoughts per tuning inserts into large tables. I have a large 
table with an insert like this:


insert into public.bigtab1 (text_col1, text_col2, id) values ...

QUERY PLAN
--

 Result  (cost=0.00..0.01 rows=1 width=0)
(1 row)

The query cost is low but this is one of the slowest statements per pgfouine
  
Do you insert multiple values in one transaction, or one transaction per 
insert?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing pattern index query speed

2008-11-26 Thread Mario Weilguni

Andrus schrieb:

Richard,


These are the same but the times are different. I'd be very surprised if
you can reproduce these times reliably.


I re-tried today again and got same results: in production database 
pattern query  is many times slower that equality query.

toode and rid base contain only single product starting with 9910
So both queries should scan exactly same numbers of rows.


Can I give you some wider-ranging suggestions Andrus?
1. Fix the vacuuming issue in your hash-join question.


I have ran VACUUM FULL VERBOSE ANALYSE and set max_fsm_pages=15
So issue is fixed before those tests.

2. Monitor the system to make sure you know if/when disk activity is 
high.


I optimized this system. Now there are short (some seconds) sales 
queries about after every 5 - 300 seconds which cause few disk 
activity and add few new rows to some tables.

I havent seen that this activity affects to this test result.


3. *Then* start to profile individual queries and look into their plans.
Change the queries one at a time and monitor again.


How to change pattern matching query to faster ?

Andrus.

Btw.

I tried to reproduce this big difference in test server in 8.3 using 
sample data script below and got big difference but in opposite 
direction.


explain analyze   SELECT sum(1)
FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>'2006-01-01' and ...

different where clauses produce different results:

AND orders_products.product_id = '3370'  -- 880 .. 926 ms
AND  orders_products.product_id like '3370%' -- 41 ..98 ms

So patter index is 10 .. 20 times (!) faster always.
No idea why.

Test data creation script:

begin;
CREATE OR REPLACE FUNCTION Counter() RETURNS int IMMUTABLE AS
$_$
SELECT 350;
$_$ LANGUAGE SQL;

CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE 
NOT NULL);
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name 
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, 
product_id CHAR(20),

 id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
  'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

INSERT INTO orders
SELECT n,'2005-01-01'::date + (4000.0 * n/Counter() * '1 DAY'::interval)
FROM generate_series(0, Counter()/3 ) AS n;

SET work_mem TO 2097151;

INSERT INTO orders_products SELECT
  generate_series/3 as  order_id,
  ( (1+ (generate_series % 13410))*power( 10,13))::INT8::CHAR(20) AS 
product_id

FROM generate_series(1, Counter());

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES 
products(product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES 
orders(order_id) ON DELETE CASCADE;


CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );


COMMIT;
SET work_mem TO DEFAULT;
ANALYZE;


No wonder that = compares bad, you created the index this way:
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );

why not:
CREATE INDEX order_product_pattern_idx ON orders_products( product_id);

explain analyze   SELECT sum(1)
FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>'2006-01-01'
AND orders_products.product_id = '3370';

   QUERY 
PLAN
---
Aggregate  (cost=3013.68..3013.69 rows=1 width=0) (actual 
time=8.206..8.207 rows=1 loops=1)
  ->  Nested Loop  (cost=10.83..3013.21 rows=185 width=0) (actual 
time=2.095..7.962 rows=189 loops=1)
->  Index Scan using products_pkey on products  
(cost=0.00..8.27 rows=1 width=18) (actual time=0.036..0.038 rows=1 loops=1)

  Index Cond: ((product_id)::text = '3370'::text)
->  Nested Loop  (cost=10.83..3003.09 rows=185 width=18) 
(actual time=2.052..7.474 rows=189 loops=1)
  ->  Bitmap Heap Scan on orders_products  
(cost=10.83..949.68 rows=253 width=22) (actual time=0.161..0.817 
rows=261 loops=1)
Recheck Cond: ((product_id)::text = 
'3370'::text)
->  Bitmap Index Scan on foo  (cost=0.00..10.76 
rows=253 width=0) (actual time=0.116..0.116 rows=261 loops=1)
  Index Cond: ((product_id)::text = 
'3370'::text)
  ->  Index Scan using orders_pkey on orders  
(cost=0.00..8.10 rows=1 width=4) (actual time=0.020..0

Re: [PERFORM] Using PK value as a String

2008-08-12 Thread Mario Weilguni

Valentin Bogdanov schrieb:

--- On Mon, 11/8/08, Gregory Stark <[EMAIL PROTECTED]> wrote:

  

From: Gregory Stark <[EMAIL PROTECTED]>
Subject: Re: [PERFORM] Using PK value as a String
To: "Jay" <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Date: Monday, 11 August, 2008, 10:30 AM
"Jay" <[EMAIL PROTECTED]> writes:



I have a table named table_Users:

CREATE TABLE table_Users (
   UserID   character(40)  NOT NULL default
  

'',


   Username   varchar(256)  NOT NULL default
  

'',


   Email  varchar(256) NOT NULL default
  

''


   etc...
);

The UserID is a character(40) and is generated using
  

UUID function. We


started making making other tables and ended up not
  

really using


UserID, but instead using Username as the unique
  

identifier for the


other tables. Now, we pass and insert the Username to
  

for discussions,


wikis, etc, for all the modules we have developed. I
  

was wondering if


it would be a performance improvement to use the 40
  

Character UserID


instead of Username when querying the other tables, or
  

if we should


change the UserID to a serial value and use that to
  

query the other


tables. Or just keep the way things are because it
  

doesn't really make


much a difference.
  

Username would not be any slower than UserID unless you
have a lot of
usernames longer than 40 characters.

However making UserID an integer would be quite a bit more
efficient. It would
take 4 bytes instead of as the length of the Username which
adds up when it's
in all your other tables... Also internationalized text
collations are quite a
bit more expensive than a simple integer comparison.

But the real question here is what's the better design.
If you use Username
you'll be cursing if you ever want to provide a
facility to allow people to
change their usernames. You may not want such a facility
now but one day...




I don't understand Gregory's suggestion about the design. I thought using 
natural primary keys as opposed to surrogate ones is a better design strategy, 
even when it comes to performance considerations and even more so if there are 
complex relationships within the database.

Regards,
Valentin

  
UUID is already a surrogate key not a natural key, in no aspect better 
than a numeric key, just taking a lot more space.


So why not use int4/int8?




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] file system and raid performance

2008-08-07 Thread Mario Weilguni

Mark Kirkwood schrieb:

Mark Kirkwood wrote:
You are right, it does (I may be recalling performance from my other 
machine that has a 3Ware card - this was a couple of years ago...) 
Anyway, I'm thinking for the Hardware raid tests they may need to be 
specified.





FWIW - of course this somewhat academic given that the single disk xfs 
test failed! I'm puzzled - having a Gentoo system of similar 
configuration (2.6.25-gentoo-r6) and running the fio tests a little 
modified for my config (2 cpu PIII 2G RAM with 4x ATA disks RAID0 and 
all xfs filesystems - I changed sizes of files to 4G and no. processes 
to 4) all tests that failed on Marks HP work on my Supermicro P2TDER + 
Promise TX4000. In fact the performance is pretty reasonable on the 
old girl as well (seq read is 142Mb/s and the random read/write is 
12.7/12.0 Mb/s).


I certainly would like to see some more info on why the xfs tests were 
failing - as on most systems I've encountered xfs is a great performer.


regards

Mark

I can second this, we use XFS on nearly all our database servers, and 
never encountered the problems mentioned.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows -> better performance?

2008-07-21 Thread Mario Weilguni

Andreas Hartmann schrieb:

Mario Weilguni schrieb:

Andreas Hartmann schrieb:


[…]


I just verified that the autovacuum property is enabled.


[…]


Did you have:
stats_start_collector = on
stats_block_level = on
stats_row_level = on

Otherwise autovacuum won't run IMO.


Thanks for the hint! The section looks like this:

stats_start_collector = on
#stats_command_string = off
#stats_block_level = off
stats_row_level = on
#stats_reset_on_server_start = off


I'll check the logs if the vacuum really runs - as soon as I find them :)

-- Andreas

You might want to use these entries in your config:
redirect_stderr = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d

Fit those to your needs, then you will find log entries in $PGDATA/pg_log/

And BTW, I was wrong, you just need to have stats_row_level=On, 
stats_block_level doesn't matter. But in fact it's simple, if you don't 
have 24x7 requirements type VACUUM FULL ANALYZE; and check if your DB 
becomes smaller, I really doubt you can have that much indizes that 27MB 
dumps might use 2.3 GB on-disk.


You can check this too:
select relname, relpages, reltuples, relkind
 from pg_class
where relkind in ('r', 'i')
order by relpages desc limit 20;

Will give you the top-20 tables and their sizes, 1 page is typically 
8KB, so you can cross-check if relpages/reltuples is completly off, this 
is a good indicator for table/index bloat.


Regards,
Mario


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Altering a column type - Most efficient way

2008-07-11 Thread Mario Weilguni

Ow Mun Heng schrieb:

On Thu, 2008-07-10 at 09:57 -0400, Alvaro Herrera wrote:
  

Ow Mun Heng wrote:



This is what I see on the table

NEW attypmod = -1
OLD attypmod =  8
  

8 means varchar(4) which is what you said you had (4+4)
-1 means unlimited size.




This is cool. 


If it were this simple a change, I'm not certain why (I believe) PG is
checking each and every row to see if it will fit into the new column
definition/type. 


Thus, I'm still a bit hesitant to do the change, although it is
definitely a very enticing thing to do. ( I presume also that this
change will be instantaneous and does not need to check on each and
every row of the table?)

Thanks./

  


It should be safe, because the length limit is checked at insert/update 
time, and internally, a varchar(20) is treated as something like this:

foo  varchar(10) check (length(foo) <= 20)


The change is done without re-checking all rows, and will not fail IF 
the new size is longer than the old size.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Altering a column type - Most efficient way

2008-07-10 Thread Mario Weilguni

Ow Mun Heng schrieb:

Is there any quick hacks to do this quickly? There's around
20-30million
rows of data. 


I want to change a column type from varchar(4) to varchar()

table size is ~10-15GB (and another 10-15G for indexes)

What would be the preferrred way of doing it? SHould I be dropping the
indexes 1st to make things faster? Would it matter?

The punch line is that since the databases are connected via slony, this
makes it even harder to pull it off. My last try got the DDL change
completed in like 3 hours (smallest table of the bunch) and it hung
everything
  
Before Postgresql supported "alter table ... type ... " conversions, I 
did it a few times when I detected later that my varchar() fields were 
too short, and it worked perfectly.


Example:
{OLDLEN} = 4
{NEWLEN} = 60

update pg_attribute
  set atttypmod={NEWLEN}+4
where attname='the-name-of-the-column'
  and attrelid=(select oid from pg_class where 
relname='the-name-of-the-table')

  and atttypmod={OLDLEN}+4;


This worked very well when you want to increase the maximum length, 
don't try to reduce the maximum length this way!


Disclaimer: I do not know if slony might be have a problem with this.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow INFORMATION_SCHEMA

2008-05-05 Thread Mario Weilguni
1.22..1.22 rows=22 width=68) (actual time=0.049..0.049 rows=22 
loops=1)
->  Seq Scan on 
pg_namespace nc  (cost=0.00..1.22 rows=22 width=68) (actual 
time=0.008..0.022 rows=22 loops=1

)
  ->  Subquery Scan "*SELECT* 2"  
(cost=125.45..601.21 rows=745 width=138) (actual time=2.621..29.380 
rows=3120 loops=1)
->  Hash Join  
(cost=125.45..593.76 rows=745 width=138) (actual time=2.618..24.938 
rows=3120 loops=1)
  Hash Cond: 
(a.attrelid = r.oid)
  ->  Seq Scan on 
pg_attribute a  (cost=0.00..419.55 rows=5551 width=6) (actual 
time=0.009..8.111 rows=3399 loops=1)
Filter: 
(attnotnull AND (attnum > 0) AND (NOT attisdropped))
  ->  Hash  
(cost=121.78..121.78 rows=294 width=136) (actual time=2.578..2.578 
rows=458 loops=1)
->  Hash Join  
(cost=1.46..121.78 rows=294 width=136) (actual time=0.073..2.085 
rows=458 loops=1)
  Hash 
Cond: (r.relnamespace = nr.oid)
  ->  Seq 
Scan on pg_class r  (cost=0.00..115.76 rows=431 width=72) (actual 
time=0.011..1.358 rows=458 lo

ops=1)

Filter: ((relkind = 'r'::"char") AND (pg_has_role(relowner, 
'USAGE'::text) OR has_table_privilege
(oid, 'SELECT'::text) OR has_table_privilege(oid, 'INSERT'::text) OR 
has_table_privilege(oid, 'UPDATE'::text) OR has_table_privilege(oid, 
'DELETE'::text) OR has_table
_privilege(oid, 'REFERENCES'::text) OR has_table_privilege(oid, 
'TRIGGER'::text)))
  ->  Hash  
(cost=1.27..1.27 rows=15 width=68) (actual time=0.051..0.051 rows=15 
loops=1)


->  
Seq Scan on pg_namespace nr  (cost=0.00..1.27 rows=15 width=68) (actual 
time=0.010..0.033 row

s=15 loops=1)
  
Filter: (NOT pg_is_other_temp_schema(oid))
  ->  Nested Loop  (cost=1.34..130.80 rows=6 width=321) 
(actual time=0.040..52.244 rows=1845 loops=554)
->  Nested Loop  (cost=1.34..128.42 rows=8 
width=261) (actual time=0.017..16.949 rows=1251 loops=554)
  Join Filter: (pg_has_role(r.relowner, 
'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT'::text) OR 
has_table_privilege(c.oid, 'INSERT'::
text) OR has_table_privilege(c.oid, 'UPDATE'::text) OR 
has_table_privilege(c.oid, 'REFERENCES'::text))
  ->  Hash Join  (cost=1.34..109.27 rows=46 
width=193) (actual time=0.009..5.149 rows=1251 loops=554)

Hash Cond: (c.connamespace = nc.oid)
->  Seq Scan on pg_constraint c  
(cost=0.00..103.69 rows=1008 width=133) (actual time=0.007..2.765 
rows=1302 loops=554)
  Filter: (contype = ANY 
('{p,u,f}'::"char"[]))
->  Hash  (cost=1.33..1.33 rows=1 
width=68) (actual time=0.022..0.022 rows=1 loops=1)
  ->  Seq Scan on pg_namespace nc  
(cost=0.00..1.33 rows=1 width=68) (actual time=0.016..0.019 rows=1 loops=1)
Filter: ('public'::text = 
((nspname)::information_schema.sql_identifier)::text)
  ->  Index Scan using pg_class_oid_index on 
pg_class r  (cost=0.00..0.39 rows=1 width=76) (actual time=0.005..0.006 
rows=1 loops=693054)

Index Cond: (r.oid = c.conrelid)
Filter: (relkind = 'r'::"char")
->  Index Scan using pg_namespace_oid_index on 
pg_namespace nr  (cost=0.00..0.28 rows=1 width=68) (actual 
time=0.004..0.005 rows=1 loops=693054)

  Index Cond: (nr.oid = r.relnamespace)
  Filter: (NOT pg_is_other_temp_schema(oid))
->  Index Scan using pg_attribute_relid_attnum_index on 
pg_attribute a  (cost=0.00..4.27 rows=1 width=70) (actual 
time=0.006..0.007 rows=1 loops=4020)
  Index Cond: ((ss.roid = a.attrelid) AND (a.attnum = 
(ss.x).x))

  Filter: (NOT attisdropped)
Total runtime: 30346.174 ms
(60 rows)
X-AntiVirus: checked by AntiVir MailGuard (Version: 8.0.0.18; AVE: 8.1.0.37; 
VDF: 7.0.3.243)

This is Postgresql 8.2.4, on a Dual-Core XEON 3.6GHz. With nested_loops 
off, I get a very fast response (330ms).


Regards,
   Mario Weilguni

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Curious about dead rows.

2007-11-14 Thread Mario Weilguni

Jean-David Beyer schrieb:

I am doing lots of INSERTs on a table that starts out empty (I did a
TRUNCATE on it). I am not, AFAIK, doing DELETEs or UPDATEs. Autovacuum is
on. I moved logging up to debug2 level to see what was going on, and I get
things like this:

 "vl_as": scanned 3000 of 5296 pages, containing 232944 live rows and 1033
dead rows; 3000 rows in sample, 411224 estimated total rows

A little later, it says:

"vl_as": scanned 3000 of 6916 pages, containing 233507 live rows and 493
dead rows; 3000 rows in sample, 538311 estimated total rows

(I suppose that means autovacuum is working.) Is this normal, or have I got
something wrong? Why so many dead rows when just doing inserts? It is not
that I think the number is too high, considering the number of rows in the
table at the point where I copied this line. It is just that I do not
understand why there are any.

  
Did you rollback some transactions? It will generate dead rows too - at 
least I think so.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] efficient pattern queries (using LIKE, ~)

2007-09-11 Thread Mario Weilguni

Fernan Aguero schrieb:

Hi,

I have a table containing some ~13 million rows. Queries on
indexed fields run fast, but unanchored pattern queries on a
text column are slow. Indexing the column doesn't help
(this is already mentioned in the manual).
http://www.postgresql.org/docs/8.2/interactive/indexes-types.html

However, no alternative solution is mentioned for indexing
and/or optimizing queries based on unanchored patterns: 
i.e. description LIKE '%kinase%'.


Maybe trigram search might help you? Never tried it myself, but it seems 
to be able to handle substring searches.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] DRBD and Postgres: how to improve the perfomance?

2007-09-11 Thread Mario Weilguni

Simon Riggs schrieb:

On Fri, 2007-09-07 at 11:37 +0200, Maila Fatticcioni wrote:


 protocol C;


Try protocol B instead.



Sure? I've always heard that there has yet to be a case found, where B 
is better than C. We use DRBD with protocol C, and are quite happy with it.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [pgsql-advocacy] [PERFORM] 8.2 -> 8.3 performance numbers

2007-07-25 Thread Mario Weilguni
Am Mittwoch 25 Juli 2007 schrieb Simon Riggs:
> I have reasonable evidence that Referential Integrity is the major
> performance bottleneck and would like some objective evidence that this
> is the case.

Just curious, will 8.3 still check FK constraints (and use locks) even if the 
referencing column value does not change?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] index over timestamp not being used

2007-07-25 Thread Mario Weilguni
Am Dienstag 24 Juli 2007 schrieb Tom Lane:
> > I thought the
> > to_char/to_date/to_timestamp functions were intented for this purposes
>
> No, they're intended for dealing with wacky formats that the regular
> input/output routines can't understand or produce.

Really? I use them alot, because of possible problems with different date 
formats. 20070503 means May 3, 2007 for germans, I don't know what it means 
to US citizens, but following the strange logic of having the month first 
(8/13/2005) it might mean March 5, 2007 too. Therefore, using to_timestamp 
seems to be a safe choice for me, working in any environment regardless of 
the "date_style" setting.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Database Statistics???

2007-07-13 Thread Mario Weilguni
Am Freitag 13 Juli 2007 schrieb smiley2211:
> Hello all,
>
> I am a bit confused...I have a database which was performing very POORLY
> selecting from a view (posted earlier) on one server but extremely fast on
> another server...
>
> I just backed up the database from the FAST server and loaded to the SLOW
> server and it ran just as fast as it originally did...my questions are:
>
> Are STATISTICS some how saved with the database??  if so, how do I UPDATE
> view or update them?
>
> Should I backup the data \ drop the database and reload it to make it get
> new stats?? (vacuum analyze does nothing for this poor performing database)
>
> Thanks-a-bunch.

Try this on both machines:
select relname, relpages, reltuples
  from pg_class
 where relkind='i'
   order by relpages desc limit 20;

Compare the results, are relpages much higher on the slow machine?

If so, REINDEX DATABASE slow_database;


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] postgres: 100% CPU utilization

2007-04-23 Thread Mario Weilguni
Am Donnerstag, 19. April 2007 schrieb Sergey Tsukinovsky:
> 2. What would be the recommended set of parameters to tune up in order
> to improve the performance over the time, instead of considering an
> option to vacuum every 30 minutes or so?
>
> 3. Is it safe to run 'vacuum' as frequently as every 15-30 minutes?
No problem.

>
> 4. Suggestions?
Do yourself a favor and upgrade at least to 8.1.x and use autovacuum.

Best regards
Mario Weilguni

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Mario Weilguni
Am Donnerstag, 22. März 2007 16:17 schrieb Andreas Kostyrka:
> * Mario Weilguni <[EMAIL PROTECTED]> [070322 15:59]:
> > Am Donnerstag, 22. März 2007 15:33 schrieb Jonah H. Harris:
> > > On 3/22/07, Merlin Moncure <[EMAIL PROTECTED]> wrote:
> > > > As others suggest select count(*) from table is very special case
> > > > which non-mvcc databases can optimize for.
> > >
> > > Well, other MVCC database still do it faster than we do.  However, I
> > > think we'll be able to use the dead space map for speeding this up a
> > > bit wouldn't we?
> >
> > Which MVCC DB do you mean? Just curious...
>
> Well, mysql claims InnoDB to be mvcc ;)

Ok, but last time I tried count(*) with InnoDB tables did take roughly(*) the 
same time last time I tried - because InnoDB has the same problem as postgres 
and has to do a seqscan too (I think it's mentioned somewhere in their docs).

(*) in fact, postgres was faster, but the values were comparable, 40 seconds 
vs. 48 seconds 

Maybe the InnoDB have made some progress here, I tested it with MySQL 5.0.18.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Mario Weilguni
Am Donnerstag, 22. März 2007 15:33 schrieb Jonah H. Harris:
> On 3/22/07, Merlin Moncure <[EMAIL PROTECTED]> wrote:
> > As others suggest select count(*) from table is very special case
> > which non-mvcc databases can optimize for.
>
> Well, other MVCC database still do it faster than we do.  However, I
> think we'll be able to use the dead space map for speeding this up a
> bit wouldn't we?

Which MVCC DB do you mean? Just curious...

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Mario Weilguni
Am Donnerstag, 22. März 2007 12:30 schrieb [EMAIL PROTECTED]:
> approximated count?
>
> why? who would need it? where you can use it?
>
> calculating costs and desiding how to execute query needs
> approximated count, but it's totally worthless information for any user
> IMO.

No, it is not useless. Try: 
http://www.google.com/search?hl=de&q=test&btnG=Google-Suche&meta=

Do you really think google counted each of those individual 895 million 
results? It doesn't. In fact, the estimate of google can be off by an order 
of magnitude, and still nobody complains...


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Optimization of this SQL sentence

2006-10-18 Thread Mario Weilguni
Am Dienstag, 17. Oktober 2006 17:50 schrieb Alexander Staubo:
> On Oct 17, 2006, at 17:29 , Mario Weilguni wrote:
> >
> > Enforcing length constraints with varchar(xyz) is good database
> > design, not a
> > bad one. Using text everywhere might be tempting because it works,
> > but it's
> > not a good idea.
>
> Enforcing length constraints is generally a bad idea because it
> assumes you know the data domain as expressed in a quantity of
> characters. Off the top of your head, do you know the maximum length
> of a zip code? A street address? The name of a city?

It's not a bad idea. Usually I use postal codes with 25 chars, and never had 
any problem. With text, the limit would be ~1 GB. No matter how much testing 
in the application happens, the varchar(25) as last resort is a good idea.

And in most cases, the application itself limits the length, and thus it's 
good to reflect this in the database design.

Feel free to use text anywhere for your application, and feel free to use 
numeric(1000) instead of numeric(4) if you want to be prepared for really 
long numbers, but don't tell other people it's bad database design - it 
isn't.






---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimization of this SQL sentence

2006-10-17 Thread Mario Weilguni
Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo:
> Lastly, note that in PostgreSQL these length declarations are not  
> necessary:
>
>    contacto varchar(255),
>    fuente varchar(512),
>    prefijopais varchar(10)
>
> Instead, use:
>
>    contacto text,
>    fuente text,
>    prefijopais text
>
> See the PostgreSQL manual for an explanation of varchar vs. text.

Enforcing length constraints with varchar(xyz) is good database design, not a 
bad one. Using text everywhere might be tempting because it works, but it's 
not a good idea.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] optimizing LIKE '%2345' queries

2006-07-03 Thread Mario Weilguni
Am Sonntag, 2. Juli 2006 23:50 schrieb Gene:
> Is there any way to create a reverse index on string columns so that
> queries of the form:
>
> where column like '%2345';
>
> can use an index and perform as fast as searching with like '2345%'?
>
> Is the only way to create a reverse function and create an index using
> the reverse function and modify queries to use:
>
> where reverse(column) like reverse('%2345') ?
>
> thanks

create a trigger that computes this at insert/update time, index this fix, and 
rewrite the query this way:
where inverted_column like '5432%';


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Physical column size

2006-01-26 Thread Mario Weilguni
Am Donnerstag, 26. Januar 2006 11:06 schrieb Paul Mackay:
> Hi,
>
> I've created a table like this :
> CREATE TABLE tmp_A (
> c "char",
> i int4
> );
>
> And another one
> CREATE TABLE tmp_B (
> i int4,
> ii int4
> );
>
> I then inerted a bit more than 19 million rows in each table (exactly the
> same number of rows in each).
>
> The end result is that the physical size on disk used by table tmp_A is
> exactly the same as table tmp_B (as revealed by the pg_relation_size
> function) ! Given that a "char" field is supposed to be 1 byte in size and
> a int4 4 bytes, shouldn't the tmp_A use a smaller disk space ? Or is it
> that any value, whatever the type, requires at least 4 bytes to be stored ?

I think this is caused by alignment.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Order by behaviour

2005-12-23 Thread Mario Weilguni
I think whatever the reasons for the different query plans are (and if that 
can be fixed) - you CANNOT assume that data comes in sorted order when you do 
not use order by. Thats what every database does this way. So, use order by, 
or you'll be in trouble sooner or later.

Best regards,
    Mario Weilguni


Am Freitag, 23. Dezember 2005 13:34 schrieb Carlos Benkendorf:
> Hi,
>
>   We have more than 200 customers running 8.0.3 and two weeks ago started
> migration project to 8.1.1.After the first migration to 8.1.1 we had to
> return back to 8.0.3 because some applications were not working right.
>
>   Our user told me that records are not returning more in the correct
> order, so I started logging and saw that the select clause wasn´t not used
> with the ORDER BY clause. It seemed a simple problem to be solved.
>
>   I asked the programmers that they should add the ORDER BY clause if they
> need the rows in a certain order and they told me they could not do it
> because it will cost too much and the response time is bigger than not
> using ORDER BY. I disagreed with them because there was an index with the
> same order needed for the order by. Before starting a figth we decided to
> explain analyze both select types and discover who was right. For my
> surprise the select with order by was really more expensive than the select
> without the order by. I will not bet any more...;-)
>
>   For some implementation reason in 8.0.3 the query is returning the rows
> in the correct order even without the order by but in 8.1.1 probably the
> implementation changed and the rows are not returning in the correct order.
>
>   We need the 8.1 for other reasons but this order by behavior stopped the
> migration project.
>
>   Some friends of the list tried to help us and I did some configuration
> changes like increased work_mem and changed the primary columns from
> numeric types to smallint/integer/bigint but even so the runtime and costs
> are far from the ones from the selects without the ORDER BY clause.
>
>   What I can not understand is why the planner is not using the same
> retrieving method with the order by clause as without the order by clause.
> All the rows are retrieved in the correct order in both methods but one is
> much cheaper (without order by) than the other (with order by). Should not
> the planner choice that one?
>
>   Can someone explain me why the planner is not choosing the same method
> used with the selects without the order by clause instead of using a sort
> that is much more expensive?
>
>   Without order by:
> explain analyze
> SELECT * FROM iparq.ARRIPT
> where
> (ANOCALC =  2005
> and CADASTRO =  19
> and CODVENCTO =  00
> and PARCELA >=  00 )
> or
> (ANOCALC =  2005
> and CADASTRO =  19
> and CODVENCTO >  00 )
> or
> (ANOCALC =  2005
> and CADASTRO >  19 )
> or
> (ANOCALC >  2005 );
>  Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript 
> (cost=0.00..122255.35 rows=146602 width=897) (actual time=9.303..1609.987
> rows=167710 loops=1) Index Cond: (((anocalc = 2005::numeric) AND (cadastro
> = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR
> ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto >
> 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR
> (anocalc > 2005::numeric)) Total runtime: 1712.456 ms
> (3 rows)
>
>
> With order by:
> explain analyze
> SELECT * FROM iparq.ARRIPT
> where
> (ANOCALC =  2005
> and CADASTRO =  19
> and CODVENCTO =  00
> and PARCELA >=  00 )
> or
> (ANOCALC =  2005
> and CADASTRO =  19
> and CODVENCTO >  00 )
> or
> (ANOCALC =  2005
> and CADASTRO >  19 )
> or
> (ANOCALC >  2005 )
> order by ANOCALC asc, CADASTRO asc, CODVENCTO asc, PARCELA asc;
>  Sort  (cost=201296.59..201663.10 rows=146602 width=897) (actual
> time=9752.555..10342.363 rows=167710 loops=1) Sort Key: anocalc, cadastro,
> codvencto, parcela
>->  Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on
> arript  (cost=0.00..122255.35 rows=146602 width=897) (actual
> time=0.402..1425.085 rows=167710 loops=1) Index Cond: (((anocalc =
> 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric)
> AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro =
> 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric)
> AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric)) Total runtime:
> 10568.290 ms
> (5 rows)
>
>   Table definition:
>  Table "iparq.arript"
>   Column   | Type  | Modifiers
> ---+

Re: [PERFORM] 8.1 iss

2005-11-08 Thread Mario Weilguni
Am Montag, 7. November 2005 18:22 schrieb PostgreSQL:
> My most humble apologies to the pg development team (pg_lets?).
>
> I took Greg Stark's advice and set:
>
> shared_buffers = 1  # was 5
> work_mem = 1048576# 1Gb - was 16384
>
> Also, I noticed that the EXPLAIN ANALYZE consistently thought reads would
> take longer than they actually did, so I decreased random_page_cost down to
> 1 (the server has a SATA Raid at level 10).

Don't do that, use 1.5 or 2, setting it to 1 will only work well if you have 
small databases fitting completly in memory.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] What's better: Raid 0 or disk for seperate pg_xlog

2005-03-10 Thread Mario Weilguni
Am Donnerstag, 10. März 2005 08:44 schrieb Karim Nassar:
> From rom http://www.powerpostgresql.com/PerfList/
>
> "even in a two-disk server, you can put the transaction log onto the
> operating system disk and reap some benefits."
>
> Context: I have a two disk server that is about to become dedicated to
> postgresql (it's a sun v40z running gentoo linux).
>
> What's "theoretically better"?
>
> 1) OS and pg_xlog on one disk, rest of postgresql on the other? (if I
>understand the above correctly)
> 2) Everything striped Raid 0?
> 3) 

Because of hard disk seeking times, a separate disk for WAL will be a lot 
better.

regards

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] 8rc2 & BLCKSZ

2004-12-22 Thread Mario Weilguni
Am Mittwoch, 22. Dezember 2004 22:04 schrieb Tom Lane:
> Vivek Khera  writes:
> > One of the suggestions handed to me a long time ago for speeding up PG
> > on FreeBSD was to double the default blocksize in PG.  I tried it, but
> > found not a significant enough speed up to make it worth the trouble
> > to remember to patch every version of Pg during the upgrade path (ie,
> > 7.4.0 -> 7.4.2 etc.)  Forgetting to do that would be disastrous!
>
> Not really --- the postmaster will refuse to start if the BLCKSZ shown
> in pg_control doesn't match what is compiled in.  I concur though that
> there may be no significant performance gain.  For some workloads there
> may well be a performance loss from increasing BLCKSZ.

I've several databases of the same version 7.2 with rowsizes from 8k and 32k 
with the same workload (a content management system), and the performance of 
the 32k variants is slightly better for a few queries, overall responsivness 
seems to better with 8k (maybe because the 8k variant has 4x more buffers).

Regards,
 Mario Weilguni

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] vacuum locking

2003-10-23 Thread Mario Weilguni
Am Donnerstag, 23. Oktober 2003 15:26 schrieb Tom Lane:
> ... if all tuples are the same size, and if you never have any
> transactions that touch enough tuples to overflow your undo segment
> (or even just sit there for a long time, preventing you from recycling
> undo-log space; this is the dual of the VACUUM-can't-reclaim-dead-tuple
> problem).  And a few other problems that any Oracle DBA can tell you about.
> I prefer our system.

of course both approaches have advantages, it simply depends on the usage 
pattern. A case where oracle really rules over postgresql are m<-->n 
connection tables where each record consist of two foreign keys, the 
overwrite approach is a big win here.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] vacuum locking

2003-10-22 Thread Mario Weilguni
Am Donnerstag, 23. Oktober 2003 01:32 schrieb Rob Nagler:
> The concept of vacuuming seems to be problematic.  I'm not sure why
> the database simply can't garbage collect incrementally.  AGC is very
> tricky, especially AGC that involves gigabytes of data on disk.
> Incremental garbage collection seems to be what other databases do,
> and it's been my experience that other databases don't have the type
> of unpredictable behavior I'm seeing with Postgres.  I'd rather the
> database be a little bit slower on average than have to figure out the
> best time to inconvenience my users.

I think oracle does not do garbage collect, it overwrites the tuples directly 
and stores the old tuples in undo buffers. Since most transactions are 
commits, this is a big win.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly