Re: [PERFORM] Postmaster using only 4-5% CPU

2006-03-23 Thread Edoardo Serra

At 18.44 21/03/2006, Scott Marlowe wrote:

Here's what's happening.  On the fast machine, you are almost
certainly using IDE drives.


Oh yes, the fast machine has IDE drives, you got it ;)


Meanwhile, back in the jungle...  The machine with IDE drives operates
differently.  Most, if not all, IDE drives, when told by the OS to
fsync() tell the OS immediately that the fsync() call has completed, and
the data is written to the drive.  Shortly thereafter, the drive
actually commences to write the data out.  When it gets a chance.


I really didn't know this behaviour of IDE drives.
I was stracing the postmaster while investigating the problem and noticed
many fsync syscalls (one after each INSERT).

I was investigating on it but I didn't explain me why SCSI was slower.
You helped me a lot ;) tnx


For PostgreSQL, the way IDE drives operate is dangerous.  Write data
out, call fsync(), get an immediate return, mark the data as committed,
move on the next operation, operator trips over power cord / power
conditioner explodes, power supply dies, brown out causes the machine to
reboot, et. al., and when the machine comes up, PostgreSQL politely
informs you that your database is corrupt, and you come to the
pgsql-general group asking how to get your database back online.  Very
bad.


Yes, it sounds very bad... what about SATA drives ?
I heard about command queueing in SATA but I don't know if the kernel 
handles it properly



Try wrapping the inserts in the sql file in begin; / commit; statements,
like so:

begin;
insert into table ...
(100,000 inserts here)
insert into table ...
commit;

and it should fly.


Oh, yes with the insert wrapped in a transaction the import time is as follows:
- SCSI: 35 secs
- IDE: 50 secs


When a good friend of mine first started using PostgreSQL, he was a
total MySQL bigot.  He was importing a 10,000 row dataset, and made a
smartassed remark after 10 minutes how it would have imported in minutes
on MySQL.  It was a test database, so I had him stop the import, delete
all the imported rows, and wrap the whole import inside begin; and
commit;

The import took about 20 seconds or so.


;)


Now, for the interesting test.  Run the import on both machines, with
the begin; commit; pairs around it.  Halfway through the import, pull
the power cord, and see which one comes back up.  Don't do this to
servers with data you like, only test machines, obviously.  For an even
more interesting test, do this with MySQL, Oracle, DB2, etc...


I will surely run a test like this ;)

Tnx a lot again for help

Regards

Edoardo Serra


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Indexes with descending date columns

2006-03-23 Thread Theo Kramer
On Fri, 2006-03-17 at 08:25, [EMAIL PROTECTED] wrote:
  I have a performance problem when traversing a table in index order with
  multiple columns including a date column in date reverse order. Below
  follows a simplified description of the table, the index and the
  associated query
  
  \d prcdedit
   prcdedit_prcd   | character(20)   |
   prcdedit_date   | timestamp without time zone |
  
  Indexes:
  prcdedit_idx btree (prcdedit_prcd, prcdedit_date)
 
 Depending on how you use the table, there are three possible solutions.
 
 First, if it makes sense in the domain, using an ORDER BY where _both_ 
 columns are used descending will make PG search the index in reverse and will 
 be just as fast as when both as searched by the default ascending.
 
 Second possibility: Create a dummy column whose value depends on the negative 
 of prcdedit_date, e.g., -extract(epoch from prcdedit_date), keep the dummy 
 column in sync with the original column using triggers, and rewrite your 
 queries to use ORDER BY prcdedit_prod, dummy_column.
 
 Third: Create an index on a function which sorts in the order you want, and 
 then always sort using the function index (you could use the 
 -extract(epoch...) gimmick for that, among other possibilities.)
 
 HTH.

All good input - thanks, however, before I start messing with my stuff
which I know will be complex - some questions to any of the developers
on the list.

i  Is it feasible to extend index creation to support descending 
   columns? ... this is supported on other commercial and non
   commercial databases, but I do not know if this is a SQL standard.

ii If no to i, is it feasible to extend PostgreSQL to allow traversing
   an index in column descending and column ascending order - assuming
   an order by on more than one column with column order not 
   in the same direction and indexes existing? ... if that makes sense.

-- 
Regards
Theo


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Problem with query, server totally unresponsive

2006-03-23 Thread Bendik Rognlien Johansen
Hello, I have a big problem with one of my databases. When i run my  
query, after a few minutes, the postmaster shows 99% mem i top, and  
the server becomes totally unresponsive.


I get this message when I try to cancel the query:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


This works fine on a different machine with the same database  
settings and about 30% less records. The other machine is running  
PostgreSQL 8.0.3

The troubled one is running 8.1.2


Any help is greatly appreciated!

Thanks





The machine has 2x Intel dual core processors (3GHz) and 2 Gigs of ram.

#--- 


# RESOURCE USAGE (except WAL)
#--- 



# - Memory -

shared_buffers = 8192   # min 16 or  
max_connections*2, 8KB each

#temp_buffers = 1000# min 100, 8KB each
#max_prepared_transactions = 5  # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of  
shared memory

# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 4096 # min 64, size in KB
maintenance_work_mem = 262144   # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB





My query:

SELECT r.id AS id, max(r.name) AS name, max(companyid) AS companyid,  
max(extract(epoch from r.updated)) as r_updated, hydra.join(co.value)  
AS contacts, hydra.join(ad.postalsite) AS postalsites FROM records r  
LEFT OUTER JOIN contacts co ON(r.id = co.record AND co.type IN 
(1,11,101,3)) LEFT OUTER JOIN addresses ad ON(r.id = ad.record) WHERE  
r.original IS NULL GROUP BY r.id;



The hydra.join function
-- Aggregates a column to an array

DROP FUNCTION hydra.join_aggregate(text, text) CASCADE;
DROP FUNCTION hydra.join_aggregate_to_array(text);

CREATE FUNCTION hydra.join_aggregate(text, text) RETURNS text
  AS 'select $1 || ''|'' || $2'
  LANGUAGE sql IMMUTABLE STRICT;

CREATE FUNCTION hydra.join_aggregate_to_array(text) RETURNS text[]
  AS 'SELECT string_to_array($1, ''|'')'
  LANGUAGE sql IMMUTABLE STRICT;

CREATE AGGREGATE hydra.join (
  BASETYPE = text
,SFUNC = hydra.join_aggregate
,STYPE = text
,FINALFUNC = hydra.join_aggregate_to_array
);





Tables:
records: 757278 rows
contacts: 2256253 rows
addresses: 741536 rows








Explain:

 QUERY PLAN
 
-

GroupAggregate  (cost=636575.63..738618.40 rows=757278 width=75)
   -  Merge Left Join  (cost=636575.63..694469.65 rows=1681120  
width=75)

 Merge Cond: (outer.id = inner.record)
 -  Merge Left Join  (cost=523248.93..552247.54  
rows=1681120 width=63)

   Merge Cond: (outer.id = inner.record)
   -  Sort  (cost=164044.73..165937.93 rows=757278  
width=48)

 Sort Key: r.id
 -  Seq Scan on records r  (cost=0.00..19134.78  
rows=757278 width=48)

   Filter: (original IS NULL)
   -  Sort  (cost=359204.20..363407.00 rows=1681120  
width=19)

 Sort Key: co.record
 -  Seq Scan on contacts co   
(cost=0.00..73438.06 rows=1681120 width=19)
   Filter: ((type = 1) OR (type = 11) OR  
(type = 101) OR (type = 3))

 -  Sort  (cost=113326.70..115180.54 rows=741536 width=16)
   Sort Key: ad.record
   -  Seq Scan on addresses ad  (cost=0.00..20801.36  
rows=741536 width=16)

(16 rows)







se_companies=# \d records;
  Table public.records
 Column  |   Type   |   
Modifiers
-+-- 
+--
id  | integer  | not null default nextval 
('records_id_seq'::regclass)
companyid   | character varying(16)| default ''::character  
varying

categories  | integer[]|
nace| integer[]|
name| character varying(255)   | default ''::character  
varying
updated | timestamp with time zone | default  
('now'::text)::timestamp(6) with time zone

updater | integer  |
owner   | integer  |
loaner  | integer  |
info| text |
original| integer  |
active  | boolean  | default true
categoryquality | integer  | not null default 0
searchwords | character varying(128)[] |
priority| integer  |
categorized

Re: [PERFORM] Indexes with descending date columns

2006-03-23 Thread Alvaro Herrera
Theo Kramer wrote:

 All good input - thanks, however, before I start messing with my stuff
 which I know will be complex - some questions to any of the developers
 on the list.
 
 i  Is it feasible to extend index creation to support descending 
columns? ... this is supported on other commercial and non
commercial databases, but I do not know if this is a SQL standard.

This can be done.  You need to create an operator class which specifies
the reverse sort order (i.e. reverse the operators), and then use it in
the new index.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers

2006-03-23 Thread Christopher Browne
[EMAIL PROTECTED] (Jojo Paderes) wrote:
 I'd like to know if the latest PostgreSQL release can scale up by
 utilizing multiple cpu or dual core cpu to boost up the sql
 executions.

 I already do a research on the PostgreSQL mailing archives and only
 found old threads dating back 2000. A lot of things have improved with
 PostgreSQL and hopefully the support for multiple cpu or dual cores is
 already provided.

If you submit multiple concurrent queries, they can be concurrently
processed on separate CPUs; that has long been supported, and people
have been using SMP systems to this end for years.
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://cbbrowne.com/info/spreadsheets.html
In other words  -- and this is the rock solid  principle on which the
whole  of the Corporation's  Galaxy-wide success  is founded  -- their
fundamental design  flaws are  completely hidden by  their superficial
design flaws. -- HHGTG

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Problem with query, forget previous message

2006-03-23 Thread Bendik Rognlien Johansen
Seems the problem was with the custom aggregate function not being  
able to handle thousands of rows.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Frank Wiles
On Thu, 23 Mar 2006 14:19:24 +0800
Jojo Paderes [EMAIL PROTECTED] wrote:

 I'd like to know if the latest PostgreSQL release can scale up by
 utilizing multiple cpu or dual core cpu to boost up the sql
 executions.
 
 I already do a research on the PostgreSQL mailing archives and only
 found old threads dating back 2000. A lot of things have improved with
 PostgreSQL and hopefully the support for multiple cpu or dual cores is
 already provided.

  Yes PostgreSQL can take advantage of multiple CPUs and core, has been
  able to for quite some time. 

 -
   Frank Wiles [EMAIL PROTECTED]
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Scott Marlowe
On Thu, 2006-03-23 at 00:19, Jojo Paderes wrote:
 I'd like to know if the latest PostgreSQL release can scale up by
 utilizing multiple cpu or dual core cpu to boost up the sql
 executions.
 
 I already do a research on the PostgreSQL mailing archives and only
 found old threads dating back 2000. A lot of things have improved with
 PostgreSQL and hopefully the support for multiple cpu or dual cores is
 already provided.

Can a single query be split up into parts and run on separate processors
at the same time?  No.

Can multiple incoming queries be run on different processors for better
performance?  Yes.

Has someone been working on the problem of splitting a query into pieces
and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
done that.  

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Joshua D. Drake



Has someone been working on the problem of splitting a query into pieces
and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
done that.  


I believe that is limited to Bizgress MPP yes?




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Scott Marlowe
On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
  Has someone been working on the problem of splitting a query into pieces
  and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
  done that.  
 
 I believe that is limited to Bizgress MPP yes?

Yep.  I hope that someday it will be released to the postgresql global
dev group for inclusion.  Or at least parts of it.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core

2006-03-23 Thread Christopher Browne
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) 
wrote:
 On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
  Has someone been working on the problem of splitting a query into pieces
  and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
  done that.  
 
 I believe that is limited to Bizgress MPP yes?

 Yep.  I hope that someday it will be released to the postgresql global
 dev group for inclusion.  Or at least parts of it.

Question: Does the Bizgress/MPP use threading for this concurrency?
Or forking?

If it does so via forking, that's more portable, and less dependent on
specific complexities of threading implementations (which amounts to
non-portability ;-)).

Most times Jan comes to town, we spend a few minutes musing about the
splitting queries across threads problem, and dismiss it again; if
there's the beginning of a split across processes, that's decidedly
neat :-).
-- 
output = reverse(moc.liamg @ enworbbc)
http://linuxfinances.info/info/internet.html
Why do we put suits in a garment bag, and put garments in a suitcase? 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Indexes with descending date columns

2006-03-23 Thread Theo Kramer
On Thu, 2006-03-23 at 16:16, Alvaro Herrera wrote:
 Theo Kramer wrote:
 
  All good input - thanks, however, before I start messing with my stuff
  which I know will be complex - some questions to any of the developers
  on the list.
  
  i  Is it feasible to extend index creation to support descending 
 columns? ... this is supported on other commercial and non
 commercial databases, but I do not know if this is a SQL standard.
 
 This can be done.  You need to create an operator class which specifies
 the reverse sort order (i.e. reverse the operators), and then use it in
 the new index.

Hmmm, would that then result in the following syntax  being valid?

  create index my_idx on my_table (c1, c2 desc, c3, c4 desc) ;

where my_table is defined as

  create table my_table (
c1 text,
c2 timestamp,
c3 integer,
c4 integer
  );

If so, I would appreciate any pointers on where to start on this -
already fumbling my way through Interfacing Extensions To Indexes in the
manual...

Regards
Theo
-- 
Regards
Theo


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Indexes with descending date columns

2006-03-23 Thread Tom Lane
Theo Kramer [EMAIL PROTECTED] writes:
 If so, I would appreciate any pointers on where to start on this -
 already fumbling my way through Interfacing Extensions To Indexes in the
 manual...

Search the PG list archives for discussions of reverse-sort opclasses.
It's really pretty trivial, once you've created a negated btree
comparison function for the datatype.

This is the sort of thing that we are almost but not quite ready to put
into the standard distribution.  The issues that are bugging me have to
do with whether NULLs sort low or high --- right now, if you make a
reverse-sort opclass, it will effectively sort NULLs low instead of
high, and that has some unpleasant consequences because the rest of the
system isn't prepared for variance on the point (in particular I'm
afraid this could break mergejoins).  I'd like to see us make NULLs
low vs NULLs high be a defined property of opclasses, and deal with
the fallout from that, and then we could put reverse-sort opclasses for
all the standard datatypes into the regular distribution.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly