[PERFORM] Performance delay

2005-01-13 Thread Hasnul Fadhly bin Hasan
Hi,
just want to share with all of you a wierd thing that i found when i 
tested it.

i was doing a query that will call a function long2ip to convert bigint 
to ips.

so the query looks something like this.
select id, long2ip(srcip), long2ip(dstip) from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59' order by id limit 30;

for your info, there are about 300k rows for that timeframe.
it cost me about 57+ secs to get the list.
which is about the same if i query
select id, long2ip(srcip), long2ip(dstip) from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59'

it will cost me about 57+ secs also.
Now if i did this
select id,long2ip(srcip), long2ip(dstip) from (
* from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59' order by id limit 30) as t;

it will cost me about 3+ secs
Anyone knows why this is the case?
Hasnul


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Postgres Optimizer is not smart enough?

2005-01-13 Thread Mark Kirkwood
Tom Lane wrote:
Mark Kirkwood <[EMAIL PROTECTED]> writes:
the costs of paths using these indexes are
quite similar, so are quite sensitive to (some) parameter values.

They'll be exactly the same, actually, as long as the thing predicts
exactly one row retrieved.  So it's quasi-random which plan you get.
btcostestimate needs to be improved to understand that in multicolumn
index searches with inequality conditions, we may have to scan through
tuples that don't meet all the qualifications.  It's not accounting for
that cost at the moment, which is why the estimates are the same.
I see some small differences in the numbers - I am thinking that these
are due to the calculations etc in cost_index(). e.g:
create_index_paths : index oid 12616389 (test_id2)
cost_index : cost=2.839112 (startup_cost=0.00 run_cost=2.839112)
   : tuples=1.00 cpu_per_tuple=0.017500
   : selectivity=0.02
   : run_index_tot_cost=2.003500 run_io_cost=0.818112)
create_index_paths : index oid 12616388 (test_id1)
cost_index : cost=2.933462 (startup_cost=0.002500 run_cost=2.930962)
   : tuples=1.00 cpu_per_tuple=0.01
   : selectivity=0.02
   : run_index_tot_cost=2.008500 run_io_cost=0.912462
Where:
run_index_tot_cost=indexTotalCost - indexStartupCost;
run_io_cost=max_IO_cost + csquared * (min_IO_cost - max_IO_cost)
selectivity=indexSelectivity
Hmmm ... so it's only the selectivity that is the same (sourced from
index->amcostestimate which I am guessing points to btcostestimate), is
that correct?
cheers
Mark

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance delay

2005-01-13 Thread Richard Huxton
Hasnul Fadhly bin Hasan wrote:
Hi,
just want to share with all of you a wierd thing that i found when i 
tested it.

i was doing a query that will call a function long2ip to convert bigint 
to ips.

so the query looks something like this.
select id, long2ip(srcip), long2ip(dstip) from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59' order by id limit 30;

for your info, there are about 300k rows for that timeframe.
it cost me about 57+ secs to get the list.
which is about the same if i query
select id, long2ip(srcip), long2ip(dstip) from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59'

it will cost me about 57+ secs also.
Now if i did this
select id,long2ip(srcip), long2ip(dstip) from (
* from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59' order by id limit 30) as t;

it will cost me about 3+ secs
The difference will be that in the final case you only make 30 calls to 
long2ip() whereas in the first two you call it 300,000 times and then 
throw away most of them.
Try running EXPLAIN ANALYSE ... for both - that will show how PG is 
planning the query.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Performance delay

2005-01-13 Thread Hasnul Fadhly bin Hasan
Hi Richard,
Thanks for the reply.. is that the case? i thought it would comply to 
the where condition first..
and after that it will format the output to what we want..

Hasnul
Richard Huxton wrote:
Hasnul Fadhly bin Hasan wrote:
Hi,
just want to share with all of you a wierd thing that i found when i 
tested it.

i was doing a query that will call a function long2ip to convert 
bigint to ips.

so the query looks something like this.
select id, long2ip(srcip), long2ip(dstip) from sometable
where timestamp between timestamp '01-10-2005' and timestamp 
'01-10-2005 23:59' order by id limit 30;

for your info, there are about 300k rows for that timeframe.
it cost me about 57+ secs to get the list.
which is about the same if i query
select id, long2ip(srcip), long2ip(dstip) from sometable
where timestamp between timestamp '01-10-2005' and timestamp 
'01-10-2005 23:59'

it will cost me about 57+ secs also.
Now if i did this
select id,long2ip(srcip), long2ip(dstip) from (
* from sometable
where timestamp between timestamp '01-10-2005' and timestamp 
'01-10-2005 23:59' order by id limit 30) as t;

it will cost me about 3+ secs

The difference will be that in the final case you only make 30 calls 
to long2ip() whereas in the first two you call it 300,000 times and 
then throw away most of them.
Try running EXPLAIN ANALYSE ... for both - that will show how PG is 
planning the query.
--
  Richard Huxton
  Archonet Ltd



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] MOVE command

2005-01-13 Thread PFC
Hello,
	Here I'm implementing a session management, which has a connections table  
partitioned between active and archived connections. A connection  
represents a connection between a user and a chatroom.

I use partitioning for performance reasons.
	The active table contains all the data for the active session : user_id,  
chatroom_id, session start time, and other information.
	The archive table contains just the user_id, chatroom_id, session start  
and end time, for logging purposes, and for displaying on the site, which  
user was logged to which chatroom and from when to when.

	Thus, when a user disconnects from a chatroom, I must move one row from  
the active to the archive table. This poses no problem as there is a  
UNIQUE index (iser_id,chatroom_id) so I select the row FOR UPDATE, insert  
it in the archive table, then delete it.

	Now, when a user logs out from the site, or when his session is purged by  
the auto-expiration cron job, I must also expire ALL his open chatroom  
connections.
	INSERT INTO archive (...) SELECT ... FROM active WHERE user_id = ...;
	DELETE FROM active WHERE user_id = ...;

	Now, if the user inserts a connection between the two queries above, the  
thing will fail (the connection will just be deleted). I know that there  
are many ways to do it right :
	- LOCK the table in exclusive mode
	- use an additional primary key on the active table which is not related  
to the user_id and the chatroom_id, select the id's of the sessions to  
expire in a temporary table, and use that
	- use an extra field in the table to mark that the rows are being  
processed
	- use transaction isolation level SERIALIZABLE

	However, all these methods somehow don't feel right, and as this is an  
often encountered problem, I'd really like to have a sql command, say  
MOVE, or SELECT AND DELETE, whatever, which acts like a SELECT, returning  
the rows, but deleting them as well. Then I'd just do INSERT INTO archive  
(...) SELECT ... AND DELETE FROM active WHERE user_id = ...;

which would have the following advantages :
- No worries about locks :
- less chance of bugs
- higher performance because locks have to be waited on, by 
definition
- No need to do the request twice (so, it is twice as fast !)
- Simplicity and elegance
	There would be an hidden bonus, that if you acquire locks, you better  
COMMIT the transaction as soon as possible to release them, whereas here,  
you can happily continue in the transaction.

	I think this command would make a nice cousin to the also very popular  
INSERT... OR UPDATE which tries to insert a row, and if it exists, UPDATES  
it instead of inserting it !

What do you think ?


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance delay

2005-01-13 Thread Jim C. Nasby
On Thu, Jan 13, 2005 at 07:14:10PM +0800, Hasnul Fadhly bin Hasan wrote:
> Hi Richard,
> 
> Thanks for the reply.. is that the case? i thought it would comply to 
> the where condition first..
> and after that it will format the output to what we want..

That is in fact exactly what it's doing. The second query is faster not
because of the where clause, but because of the limit clause. The first
query builds a list of id, long2ip(srcip), long2ip(dstip) for the
timestamp range, then it orders that list and gives you the first 30.
The second query builds a list of everything from sometable for the
timestamp range, orders it, keeps the first 30, THEN in calculates
long2ip based on that list of 30 items.

> Hasnul
> 
> Richard Huxton wrote:
> 
> >Hasnul Fadhly bin Hasan wrote:
> >
> >>Hi,
> >>
> >>just want to share with all of you a wierd thing that i found when i 
> >>tested it.
> >>
> >>i was doing a query that will call a function long2ip to convert 
> >>bigint to ips.
> >>
> >>so the query looks something like this.
> >>
> >>select id, long2ip(srcip), long2ip(dstip) from sometable
> >>where timestamp between timestamp '01-10-2005' and timestamp 
> >>'01-10-2005 23:59' order by id limit 30;
> >>
> >>for your info, there are about 300k rows for that timeframe.
> >>
> >>it cost me about 57+ secs to get the list.
> >>
> >>which is about the same if i query
> >>select id, long2ip(srcip), long2ip(dstip) from sometable
> >>where timestamp between timestamp '01-10-2005' and timestamp 
> >>'01-10-2005 23:59'
> >>
> >>it will cost me about 57+ secs also.
> >>
> >>Now if i did this
> >>select id,long2ip(srcip), long2ip(dstip) from (
> >>* from sometable
> >>where timestamp between timestamp '01-10-2005' and timestamp 
> >>'01-10-2005 23:59' order by id limit 30) as t;
> >>
> >>it will cost me about 3+ secs
> >
> >
> >The difference will be that in the final case you only make 30 calls 
> >to long2ip() whereas in the first two you call it 300,000 times and 
> >then throw away most of them.
> >Try running EXPLAIN ANALYSE ... for both - that will show how PG is 
> >planning the query.
> >-- 
> >  Richard Huxton
> >  Archonet Ltd
> >
> >
> 
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>   http://archives.postgresql.org
> 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-13 Thread Alex Turner
This is somewhat correct, and somewhat unfair - bear in mind that
Postgresql doesn't have the equivalent features of Oracle enterprise
edition including RAC and Enterprise Manager.

You can use Oracle Personal edition for development, or pay a  per
head cost of $149/user for your dev group for standard one, which if
you have a small team isn't really that much.

If you want commercial support for Postgresql, you must also pay for that too.

It's $5k/CPU for standard one edition, so $10k for a dual CPU box.

Upgrades are free - once you have an Oracle license it is pretty much
good for any version on your platform with your number of CPUs.

I'm not advocating that people switch to Oracle at all, It's still
much more expensive than Postgresql, and for most small and medium
applications Postgresql is much easier to manage and maintain.  I
would just like to make sure people get their facts straight.  I
worked for a company that selected MS SQL Server because it was
'cheaper' than Oracle, when infact with the correct Oracle pricing,
Oracle was cheaper, and had superior features.  I would have prefered
that they use Postgresql, which for the project in question would have
been more appropriate and cost much less in hardware and software
requirements, but they had to have 'Industry Standard'.  Oracle ended
up costing <$10k with licenses at $149 ea for 25 users, and the
support contract wasn't that much of a bear - I can't remember exactly
how much, I think it was around $1800/yr.


Alex Turner
NetEconomist
--
Remember, what most consider 'convential wisdom' is neither wise nor
the convention.  Don't speculate, educate.

On Wed, 12 Jan 2005 22:51:24 -0800, Joe Conway <[EMAIL PROTECTED]> wrote:
> Greg Sabino Mullane wrote:
> > Don't forget your support contract cost, as well as licenses for each
> > of your servers: development, testing, QA, etc.
> >
> > Is it really as "cheap" as 5K? I've heard that for any fairly modern
> > system, it's much more, but that may be wrong.
> >
> 
> Sort of -- see:
> http://oraclestore.oracle.com/OA_HTML/ibeCCtpSctDspRte.jsp?section=15105
>"It is available on single server systems supporting up to a maximum
> of 2 CPUs"
> 
> Also note that most industrial strength features (like table
> partitioning, RAC, OLAP, Enterprise Manager plugins, etc, etc) are high
> priced options (mostly $10K to $20K per CPU) and they can only be used
> with the Enterprise edition (which is $40K/CPU *not* $2.5K/CPU).
> http://oraclestore.oracle.com/OA_HTML/ibeCCtpSctDspRte.jsp?section=10103
> 
> And you are correct, they expect to be paid for each dev, test, and QA
> machine too.
> 
> The $5K edition is just there to get you hooked ;-) By the time you add
> up what you really want/need, figure you'll spend a couple of orders of
> magnatude higher, and then > 20% per year for ongoing
> maintenance/upgrades/support.
> 
> Joe
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-13 Thread Joe Conway
Alex Turner wrote:
I'm not advocating that people switch to Oracle at all, It's still
much more expensive than Postgresql, and for most small and medium
applications Postgresql is much easier to manage and maintain.  I
would just like to make sure people get their facts straight.  I
worked for a company that selected MS SQL Server because it was
'cheaper' than Oracle, when infact with the correct Oracle pricing,
Oracle was cheaper, and had superior features.  I would have prefered
that they use Postgresql, which for the project in question would have
been more appropriate and cost much less in hardware and software
requirements, but they had to have 'Industry Standard'.  Oracle ended
up costing <$10k with licenses at $149 ea for 25 users, and the
support contract wasn't that much of a bear - I can't remember exactly
how much, I think it was around $1800/yr.
My facts were straight, and they come from firsthand experience. The 
point is, it is easy to get trapped into thinking to yourself, "great, I 
can get a dual CPU oracle server for ~$10K, that's not too bad...". But 
then later you figure out you really need table partitioning or RAC, and 
suddenly you have to jump directly to multiple 6 figures. The entry 
level Oracle pricing is mainly a marketing gimmick -- it is intended to 
get you hooked.

Also note that the per named user license scheme is subject to per CPU 
minimums that guarantee you'll never spend less than half the per CPU 
price. Oracle's licensing is so complex that there are businesses out 
there that subsist solely on helping companies figure it out to save 
money, and they take a cut of the savings. Oracle's own account reps had 
a hard time answering this question -- does a hyperthreaded Intel CPU 
count as 1 or 2 CPUs from a licensing standpoint? We were eventually 
told 1, but that the decision was "subject to change in the future".

Joe
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Postgres Optimizer is not smart enough?

2005-01-13 Thread Tom Lane
Mark Kirkwood <[EMAIL PROTECTED]> writes:
> Hmmm ... so it's only the selectivity that is the same (sourced from
> index->amcostestimate which I am guessing points to btcostestimate), is
> that correct?

No, the point is that btcostestimate will compute not only the same
selectivities but the identical index access cost values, because it
thinks that only one index entry will be fetched in both cases.  It
needs to account for the fact that the inequality condition will cause a
scan over a larger range of the index than is actually returned.  See
_bt_preprocess_keys() and _bt_checkkeys().

The small differences you are showing have to do with different
assumptions about where the now() function will get evaluated (once per
row or once at scan start).  That's not the effect that I'm worried
about.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql

2005-01-13 Thread William Yu
Gavin Sherry wrote:
There is no problem with free Linux distros handling > 4 GB of memory. The
problem is that 32 hardware must make use of some less than efficient
mechanisms to be able to address the memory.
The theshold for using PAE is actually far lower than 4GB. 4GB is the 
total memory address space -- split that in half for 2GB for userspace, 
2GB for kernel. The OS cache resides in kernel space -- after you take 
alway the memory allocation for devices, you're left with a window of 
roughly 900MB.

Since the optimal state is to allocate a small amount of memory to 
Postgres and leave a huge chunk to the OS cache, this means you are 
already hitting the PAE penalty at 1.5GB of memory.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-13 Thread Joe Conway
Alex Turner wrote:
I appreciate your information, but it's not valid.  Most people don't
need RAC or table partitioning.
From a small company perspective, maybe, but not in the least invalid 
for larger companies.

Many of the features in Oracle EE are just not available in Postgresql at all, 
and many aren't available in
any version of SQL Server (table partitioning, bitmap indexes and
others).
I never claimed otherwise. I said the low end product gets you hooked. 
Once you're hooked, you'll start to wish for all the wiz-bang features 
-- after all, that's why you picked Oracle in the first place.

Just because Oracle reps are a little clueless
sometimes doesn't mean that the product pricing sucks.
The minimum user requirement for standard one is 5 users.  5*149=$745,
much less than half the price of a dual or single CPU config.
And what happens once you need a quad server?
I'm sorry that you had a bad experience with Oracle, but Oracle is a
fine product, that is available for not alot of $$ if you are willing
to use a bit of elbow grease to learn how it works and don't need
enterprise features, which many other database product simply don't
have, or work very poorly.
I never said I had a "bad experience" with Oracle. I pointed out the 
gotchas. We have several large Oracle boxes running, several MSSQL, and 
several Postgres -- they all have their strengths and weaknesses.

Nuff said -- this thread is way off topic now...
Joe
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-13 Thread Alex Turner
Joe,
I appreciate your information, but it's not valid.  Most people don't
need RAC or table partitioning.  Many of the features in Oracle EE are
just not available in Postgresql at all, and many aren't available in
any version of SQL Server (table partitioning, bitmap indexes and
others).  If you want all the wiz-bang features, you have to pay the
wiz-bang price.  Just because Oracle reps are a little clueless
sometimes doesn't mean that the product pricing sucks.
The minimum user requirement for standard one is 5 users.  5*149=$745,
much less than half the price of a dual or single CPU config.

I'm sorry that you had a bad experience with Oracle, but Oracle is a
fine product, that is available for not alot of $$ if you are willing
to use a bit of elbow grease to learn how it works and don't need
enterprise features, which many other database product simply don't
have, or work very poorly.

Alex Turner
NetEconomist


On Thu, 13 Jan 2005 06:56:52 -0800, Joe Conway <[EMAIL PROTECTED]> wrote:
> Alex Turner wrote:
> > I'm not advocating that people switch to Oracle at all, It's still
> > much more expensive than Postgresql, and for most small and medium
> > applications Postgresql is much easier to manage and maintain.  I
> > would just like to make sure people get their facts straight.  I
> > worked for a company that selected MS SQL Server because it was
> > 'cheaper' than Oracle, when infact with the correct Oracle pricing,
> > Oracle was cheaper, and had superior features.  I would have prefered
> > that they use Postgresql, which for the project in question would have
> > been more appropriate and cost much less in hardware and software
> > requirements, but they had to have 'Industry Standard'.  Oracle ended
> > up costing <$10k with licenses at $149 ea for 25 users, and the
> > support contract wasn't that much of a bear - I can't remember exactly
> > how much, I think it was around $1800/yr.
> 
> My facts were straight, and they come from firsthand experience. The
> point is, it is easy to get trapped into thinking to yourself, "great, I
> can get a dual CPU oracle server for ~$10K, that's not too bad...". But
> then later you figure out you really need table partitioning or RAC, and
> suddenly you have to jump directly to multiple 6 figures. The entry
> level Oracle pricing is mainly a marketing gimmick -- it is intended to
> get you hooked.
> 
> Also note that the per named user license scheme is subject to per CPU
> minimums that guarantee you'll never spend less than half the per CPU
> price. Oracle's licensing is so complex that there are businesses out
> there that subsist solely on helping companies figure it out to save
> money, and they take a cut of the savings. Oracle's own account reps had
> a hard time answering this question -- does a hyperthreaded Intel CPU
> count as 1 or 2 CPUs from a licensing standpoint? We were eventually
> told 1, but that the decision was "subject to change in the future".
> 
> Joe
> 
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] sum of all values

2005-01-13 Thread Madison Kelly
Hi all,
  Is there a fast(er) way to get the sum of all integer values for a 
certain condition over many thousands of rows? What I am currently doing 
is this (which takes ~5-10sec.):

SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE 
a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND 
a.file_type=b.fs_type AND b.fs_backup='t';

  I need to keep parts of the data in two tables. I currently use 
'file_name/fs_name', 'file_parent_dir/fs_parent_dir' and 
'file_type/fs_type' to match the entries in the two tables. The 
'file_info_#' table is frequently dropped and re-created so this was the 
only way I could think to match the data.

  I am hoping that maybe there is something I can do differently that 
will return this value a lot faster (ideally within a second). I know 
that this is heavily dependant on the system underneath but the program 
is designed for Joe/Jane User so I am trying to do what I can in the 
script and within my DB calls to make this as efficient as possible. I 
realise that my goal may not be viable.

  Here are the schemas, in case they help:
tle-bu=> \d file_info_1Table "public.file_info_1"
 Column  |  Type   | Modifiers
-+-+
 file_acc_time   | bigint  | not null
 file_group_name | text| not null
 file_group_uid  | integer | not null
 file_mod_time   | bigint  | not null
 file_name   | text| not null
 file_parent_dir | text| not null
 file_perm   | text| not null
 file_size   | bigint  | not null
 file_type   | text| not null default 'f'::text
 file_user_name  | text| not null
 file_user_uid   | integer | not null
Indexes:
"file_info_1_display_idx" btree (file_parent_dir, file_name, file_type)
"file_info_1_search_idx" btree (file_parent_dir, file_name, file_type)
tle-bu=> \d file_set_1   Table "public.file_set_1"
Column |  Type   | Modifiers
---+-+
 fs_backup | boolean | not null default true
 fs_display| boolean | not null default false
 fs_name   | text| not null
 fs_parent_dir | text| not null
 fs_restore| boolean | not null default false
 fs_type   | text| not null default 'f'::text
Indexes:
"file_set_1_sync_idx" btree (fs_parent_dir, fs_name, fs_type)
  Thanks all!
Madison
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [ADMIN] [PERFORM] Assimilation of these "versus" and hardware

2005-01-13 Thread Matthew T. O'Connor
Josh Berkus wrote:
Matt,
 

I had one comment on the pg_autovacuum section.  Near the bottom it
lists some of it's limitations, and I want to clarify the 1st one: "Does
not reset the transaction counter".  I assume this is talking about the
xid wraparound problem?  If so, then that bullet can be removed.
pg_autovacuum does check for xid wraparound and perform a database wide
vacuum analyze when it's needed.
   

Keen.   That's an 8.0 fix?
Nope, been there since before 7.4 was released.
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Best filesystem for PostgreSQL Database Cluster under Linux

2005-01-13 Thread Pete de Zwart
Greetings to one and all,

I've been trying to find some information on selecting an optimal 
filesystem setup for a volume that will only contain a PostgreSQL Database 
Cluster under Linux. Searching through the mailing list archive showed some 
promising statistics on the various filesystems available to Linux, ranging 
from ext2 through reiserfs and xfs.

I have come to understand that PostgreSQLs Write Ahead Logging (WAL) 
performs a lot of the journal functionality provided by the majoirty of 
contemporary filesystems and that having both WAL and filesystem journalling 
can degrade performance.

Could anyone point me in the right direction so that I can read up some 
more on this issue to discern which filesystem to choose and how to tune 
both the FS and PostgreSQL so that they can compliment each other? I've 
attempted to find this information via the FAQ, Google and the mailing list 
archives but have lucked out for the moment.

Regards,

Pete de Zwart. 



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql

2005-01-13 Thread Gavin Sherry
On Wed, 12 Jan 2005 [EMAIL PROTECTED] wrote:

> I wonder  if I would like to increase more RAM from 4 Gb. to 6 Gb. [which I 
> hope
> to increase more performance ] and I now I used RH 9 and Pgsql 7.3.2 ON DUAL
> Xeon 3.0 server thay has the limtation of 4 Gb. ram, I should use which OS
> between FC 2-3 or redhat EL 3 [which was claimed to support 64 Gb.ram] .May I
> use FC 2 [which is freely downloaded] with 6 Gb. and PGsql 7.4 ?
> Amrit
> Thailand

Try 7.4 before the memory upgrade. If you still have performance issues,
try optimising your queries. As I mentioned before, you can join the
#postgresql channel on irc.freenode.net and we can assist.

Gavin


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] which dual-CPU hardware/OS is fastest for PostgreSQL?

2005-01-13 Thread Jan Dittmer
Joshua D. Drake wrote:
>>RAID controllers tend to use i960 or StrongARM CPUs that run at speeds
>>that _aren't_ all that impressive.  With software RAID, you can take
>>advantage of the _enormous_ increases in the speed of the main CPU.
>>
>>I don't know so much about FreeBSD's handling of this, but on Linux,
>>there's pretty strong indication that _SOFTWARE_ RAID is faster than
>>hardware RAID.
>> 
>>
> 
> Unless something has changed though, you can't run raid 10
> with linux software raid and raid 5 sucks for heavy writes.

You could always do raid 1 over raid 0, with newer kernels (2.6ish)
there is even a dedicated raid10 driver.

Jan

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Best filesystem for PostgreSQL Database Cluster under Linux

2005-01-13 Thread Pete de Zwart
Thanks for the info.

I managed to pull out some archived posts to this list from the PostgreSQL 
web site about this issue which have helped a bit.

Unfortunatly, the FS has been chosen before considering the impact of it on 
I/O for PostgreSQL. As the Cluster is sitting on it's on 200GB IDE drive for 
the moment and the system is partially live, it's not feasable to change the 
underlying file system without great pain and suffering.

In the great fsync debates that I've seen, the pervasive opinion about 
journalling file systems under Linux and PostgreSQL is to have the 
filesystem mount option data=writeback, assuming that fsync in PostgreSQL 
will handle coherency of the file data and the FS will handle metadata.

This is all academic to a point, as tuning the FS will get a small 
improvement on I/O compared to the improvement potential of moving to 
SCSI/FCAL, that and getting more memory.

Regards,

Pete de Zwart.

"Christopher Browne" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Your understanding of the impact of filesystem journalling isn't
> entirely correct.  In the cases of interest, journalling is done on
> metadata, not on the contents of files, with the result that there
> isn't really that much overlap between the two forms of "journalling"
> that are taking place.
>
> I did some benchmarking last year that compared, on a write-heavy
> load, ext3, XFS, and JFS.
>
> I found that ext3 was materially (if memory serves, 15%) slower than
> the others, and that there was a persistent _slight_ (a couple
> percent) advantage to JFS over XFS.
>
> This _isn't_ highly material, particularly considering that I was
> working with a 100% Write load, whereas "real world" work is likely to
> have more of a mixture.
>
> If you have reason to consider one filesystem or another better
> supported by your distribution vendor, THAT is a much more important
> reason to pick a particular filesystem than 'raw speed.'



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings