Re: [PERFORM] How to determine cause of performance problem?

2005-09-23 Thread Ulrich Wisser

Hi Joost,

why do you convert programmatically? I would do something like

create sequence s_objectid;

insert into 
prototype.orders(objectid,ordernumber,orderdate,customernumber)

select next_val('s_objectid'),ordernummer, orderdatum, klantnummer from
odbc.orders


Sounds a lot faster to me.


/Ulrich

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Need for speed 3

2005-09-01 Thread Ulrich Wisser

Hi again,

first I want to say ***THANK YOU*** for everyone who kindly shared their 
thoughts on my hardware problems. I really appreciate it. I started to 
look for a new server and I am quite sure we'll get a serious hardware 
update. As suggested by some people I would like now to look closer at 
possible algorithmic improvements.


My application basically imports Apache log files into a Postgres 
database. Every row in the log file gets imported in one of three (raw 
data) tables. My columns are exactly as in the log file. The import is 
run approx. every five minutes. We import about two million rows a month.


Between 30 and 50 users are using the reporting at the same time.

Because reporting became so slow, I did create a reporting table. In 
that table data is aggregated by dropping time (date is preserved), ip, 
referer, user-agent. And although it breaks normalization some data from 
a master table is copied, so no joins are needed anymore.


After every import the data from the current day is deleted from the 
reporting table and recalculated from the raw data table.



Is this description understandable? If so

What do you think of this approach? Are there better ways to do it? Is 
there some literature you recommend reading?


TIA

Ulrich


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Need for speed 3

2005-09-01 Thread Ulrich Wisser

Hi Merlin,

schemas would be helpful.  


right now I would like to know if my approach to the problem makes 
sense. Or if I should rework the whole procedure of import and aggregate.



Just a thought: have you considered having apache logs write to a
process that immediately makes insert query(s) to postgresql? 


Yes we have considered that, but dismissed the idea very soon. We need 
Apache to be as responsive as possible. It's a two server setup with 
load balancer and failover. Serving about ones thousand domains and 
counting. It needs to be as failsafe as possible and under no 
circumstances can any request be lost. (The click counting is core 
business and relates directly to our income.)
That said it seemed quite save to let Apache write logfiles. And import 
them later. By that a database downtime wouldn't be mission critical.




You could write small C program which executes advanced query interface
call to the server.


How would that improve performance?

Ulrich

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Need for speed 2

2005-08-25 Thread Ulrich Wisser

Hello,

I realize I need to be much more specific. Here is a more detailed
description of my hardware and system design.


Pentium 4 2.4GHz
Memory 4x DIMM DDR 1GB PC3200 400MHZ CAS3, KVR
Motherboard chipset 'I865G', two IDE channels on board
2x SEAGATE BARRACUDA 7200.7 80GB 7200RPM ATA/100
(software raid 1, system, swap, pg_xlog)
ADAPTEC SCSI RAID 2100S ULTRA160 32MB 1-CHANNEL
2x SEAGATE CHEETAH 15K.3 73GB ULTRA320 68-PIN WIDE
(raid 1, /var/lib/pgsql)

Database size on disc is 22GB. (without pg_xlog)

Please find my postgresql.conf below.

Putting pg_xlog on the IDE drives gave about 10% performance
improvement. Would faster disks give more performance?

What my application does:

Every five minutes a new logfile will be imported. Depending on the
source of the request it will be imported in one of three raw click
tables. (data from two months back, to be able to verify customer complains)
For reporting I have a set of tables. These contain data from the last
two years. My app deletes all entries from today and reinserts updated
data calculated from the raw data tables.

The queries contain no joins only aggregates. I have several indexes to 
speed different kinds of queries.


My problems occur when one users does a report that contains to much old
data. In that case all cache mechanisms will fail and disc io is the
limiting factor.

If one query contains so much data, that a full table scan is needed, I 
do not care if it takes two minutes to answer. But all other queries 
with less data (at the same time) still have to be fast.


I can not stop users doing that kind of reporting. :(

I need more speed in orders of magnitude. Will more disks / more memory
do that trick?

Money is of course a limiting factor but it doesn't have to be real cheap.

Ulrich





# -
# PostgreSQL configuration file
# -
#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -

tcpip_socket = true
max_connections = 100
# note: increasing max_connections costs about 500 bytes of shared
# memory per connection slot, in addition to costs from
shared_buffers
# and max_locks_per_transaction.
#superuser_reserved_connections = 2
#port = 5432
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
#virtual_host = ''  # what interface to listen on; defaults
to any
#rendezvous_name = ''   # defaults to the computer name

# - Security  Authentication -

#authentication_timeout = 60# 1-600, in seconds
#ssl = false
#password_encryption = true
#krb_server_keyfile = ''
#db_user_namespace = false


#---
# RESOURCE USAGE (except WAL)
#---

# - Memory -

shared_buffers = 2  # min 16, at least max_connections*2, 
8KB each

sort_mem = 4096 # min 64, size in KB
vacuum_mem = 8192   # min 1024, size in KB

# - Free Space Map -

max_fsm_pages = 20  # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 1   # min 100, ~50 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
#preload_libraries = ''


#---
# WRITE AHEAD LOG
#---

# - Settings -

fsync = false   # turns forced synchronization on or off
#wal_sync_method = fsync# the default varies across platforms:
# fsync, fdatasync, open_sync, or
open_datasync
wal_buffers = 128   # min 4, 8KB each

# - Checkpoints -

checkpoint_segments = 16# in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300   # range 30-3600, in seconds
#checkpoint_warning = 30# 0 is off, in seconds
#commit_delay = 0   # range 0-10, in microseconds
#commit_siblings = 5# range 1-1000


#---
# QUERY TUNING
#---

# - Planner Method Enabling -

#enable_hashagg = true
#enable_hashjoin = true
#enable_indexscan = true
#enable_mergejoin = true
#enable_nestloop = true
#enable_seqscan = true
#enable_sort = true
#enable_tidscan = true

# - Planner Cost Constants -

#effective_cache_size = 1000# typically 8KB each
#random_page_cost = 4   # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = true
#geqo_threshold = 11
#geqo_effort = 1
#geqo_generations = 0

Re: [PERFORM] Need for speed

2005-08-17 Thread Ulrich Wisser

Hello,

thanks for all your suggestions.

I can see that the Linux system is 90% waiting for disc io. At that time 
all my queries are *very* slow. My scsi raid controller and disc are 
already the fastest available. The query plan uses indexes and vacuum 
analyze is run once a day.


To avoid aggregating to many rows, I already made some aggregation 
tables which will be updated after the import from the Apache logfiles.

That did help, but only to a certain level.

I believe the biggest problem is disc io. Reports for very recent data 
are quite fast, these are used very often and therefor already in the 
cache. But reports can contain (and regulary do) very old data. In that 
case the whole system slows down. To me this sounds like the recent data 
is flushed out of the cache and now all data for all queries has to be 
fetched from disc.


My machine has 2GB memory, please find postgresql.conf below.

Ulrich


#---
# RESOURCE USAGE (except WAL)
#---

# - Memory -

shared_buffers = 2  # min 16, at least max_connections*2, 
sort_mem = 4096 # min 64, size in KB

vacuum_mem = 8192   # min 1024, size in KB

# - Free Space Map -

max_fsm_pages = 5   # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 3000# min 100, ~50 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
#preload_libraries = ''


#---
# WRITE AHEAD LOG
#---

# - Settings -

fsync = false   # turns forced synchronization on or off
#wal_sync_method = fsync# the default varies across platforms:
wal_buffers = 128   # min 4, 8KB each

# - Checkpoints -

checkpoint_segments = 16# in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300   # range 30-3600, in seconds
#checkpoint_warning = 30# 0 is off, in seconds
#commit_delay = 0   # range 0-10, in microseconds
#commit_siblings = 5# range 1-1000


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Need for speed

2005-08-16 Thread Ulrich Wisser

Hello,

one of our services is click counting for on line advertising. We do 
this by importing Apache log files every five minutes. This results in a 
lot of insert and delete statements. At the same time our customers 
shall be able to do on line reporting.


We have a box with
Linux Fedora Core 3, Postgres 7.4.2
Intel(R) Pentium(R) 4 CPU 2.40GHz
2 scsi 76GB disks (15.000RPM, 2ms)

I did put pg_xlog on another file system on other discs.

Still when several users are on line the reporting gets very slow. 
Queries can take more then 2 min.


I need some ideas how to improve performance in some orders of 
magnitude. I already thought of a box with the whole database on a ram 
disc. So really any idea is welcome.


Ulrich



--
Ulrich Wisser  / System Developer

RELEVANT TRAFFIC SWEDEN AB, Riddarg 17A, SE-114 57 Sthlm, Sweden
Direct (+46)86789755 || Cell (+46)704467893 || Fax (+46)86789769

http://www.relevanttraffic.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query tuning help

2005-05-11 Thread Ulrich Wisser
Hi Dan,
I tried to understand your query, but I couldn't get my understanding of 
the query and your description in sync.

Why do you use sub selects? Wouldn't a simple recordtext like '%RED%' 
do the trick too?

You combine all your where conditions with and. To me this looks like 
you get only rows with RED and CORVETTE.

From your description I would rewrite the query as
explain analyze
select distinct
em.incidentid,
ea.recordtext as retdata,
eg.long,
eg.lat
from
ea join em using(incidentid) join eg using(incidentid)
where
em.entrydate = '2005-1-1 00:00'::date
and em.entrydate = '2005-5-9 00:00'::date
and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate
That should give you all rows containing one of the words.
Does it work?
Is is faster? Is it fast enough?
Ulrich
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] insert

2004-08-13 Thread Ulrich Wisser
Hi,
is there anything I can doo to speed up inserts? One of my tables gets 
about 100 new rows every five minutes. And somehow the inserts tend to 
take more and more time.

Any suggestions welcome.
TIA
Ulrich
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match