[GENERAL] Querying a Large Partitioned DB

2009-04-10 Thread Justin Funk
Team Amazing,

I am building a massive database for storing the syslogs of a room of
servers. The database gets about 25 million entries a day, and need to keep
them for 180 days. So the total size of the database will be about 4.5
billion records.

I need to be able to do full text searches on the message field, and of
course, it needs to be reasonably fast.

The table is partitioned daily and has this structure:
syslog=# \d
  List of relations
 Schema |Name | Type  | Owner
+-+---+---
 public | systemevents| table | pgsql
 public | systemevents_032909 | table | pgsql
 public | systemevents_033009 | table | pgsql
 public | systemevents_033109 | table | pgsql
 public | systemevents_040109 | table | pgsql
 public | systemevents_040209 | table | pgsql
 public | systemevents_040309 | table | pgsql
 public | systemevents_040409 | table | pgsql
 public | systemevents_040509 | table | pgsql
 public | systemevents_040609 | table | pgsql
 public | systemevents_040709 | table | pgsql
 public | systemevents_040909 | table | pgsql
 public | systemevents_041009 | table | pgsql
(13 rows)

syslog=# \d systemevents
 Table "public.systemevents"
   Column   |Type | Modifiers
+-+---
 message| character varying   |
 facility   | integer |
 fromhost   | character varying(80)   |
 priority   | integer |
 devicereportedtime | timestamp without time zone |
 receivedat | timestamp without time zone |
 infounitid | integer |
 syslogtag  | character varying(80)   |
 message_index_col  | tsvector|
Rules:
systemevents_insert_032909 AS
ON INSERT TO systemevents
   WHERE new.devicereportedtime > '2009-03-28 23:59:59'::timestamp without
time zone AND new.devicereportedtime <= '2009-03-29 23:59:59'::timestamp
without time zone DO INSTEAD  INSERT INTO systemevents_032909 (message,
facility, fromhost, priority, devicereportedtime, receivedat, infounitid,
syslogtag, message_index_col)
.. [there are rules like that for each partition]



My typical query looks like this:
SELECT * FROM SystemEvents WHERE message_index_col @@ to_tsquery('english',
'Term')  LIMIT 25 OFFSET 0;

Here is an explain analyze:
 Limit  (cost=2422393.00..2422393.06 rows=25 width=153) (actual
time=93363.496..93363.610 rows=25 loops=1)
   ->  Sort  (cost=2422393.00..2422933.05 rows=216019 width=153) (actual
time=93363.490..93363.532 rows=25 loops=1)
 Sort Key: public.systemevents.devicereportedtime
 Sort Method:  top-N heapsort  Memory: 22kB
 ->  Result  (cost=0.00..2416297.10 rows=216019 width=153) (actual
time=20567.267..93362.574 rows=163 loops=1)
   ->  Append  (cost=0.00..2415217.01 rows=216019 width=153)
(actual time=20567.244..93361.582 rows=163 loops=1)
 ->  Seq Scan on systemevents  (cost=0.00..1750240.39
rows=30891 width=153) (actual time=20567.238..91580.249 rows=24 loops=1)
   Filter: (message_index_col @@
'''funkju'''::tsquery)
 ->  Bitmap Heap Scan on systemevents_040309
systemevents  (cost=1168.86..54860.45 rows=15253 width=152) (actual
time=82.429..275.589 rows=20 loops=1)
   Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
   ->  Bitmap Index Scan on
systemevents_msg_idx_040309  (cost=0.00..1165.04 rows=15253 width=0) (actual
time=50.029..50.029 rows=20 loops=1)
 Index Cond: (message_index_col @@
'''funkju'''::tsquery)
 ->  Bitmap Heap Scan on systemevents_040409
systemevents  (cost=1038.56..52300.49 rows=14601 width=147) (actual
time=68.006..68.006 rows=0 loops=1)
   Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
   ->  Bitmap Index Scan on
systemevents_msg_idx_040409  (cost=0.00..1034.91 rows=14601 width=0) (actual
time=67.999..67.999 rows=0 loops=1)
 Index Cond: (message_index_col @@
'''funkju'''::tsquery)
 ->  Bitmap Heap Scan on systemevents_040509
systemevents  (cost=1055.06..52482.72 rows=14644 width=150) (actual
time=63.257..63.257 rows=0 loops=1)
   Recheck Cond: (message_index_col @@
'''funkju'''::tsquery)
   ->  Bitmap Index Scan on
systemevents_msg_idx_040509  (cost=0.00..1051.40 rows=14644 width=0) (actual
time=63.251..63.251 rows=0 loops=1)
 Index Cond: (message_index_col @@
'''funkju'''::tsquery)
 ->  Bitmap Heap Scan on systemevents_040609
systemevents  (cost=1842.50..88135.00 rows=24506 width=152) (actual
time=117.747..355.043 rows=34 loops=1)
   Recheck Cond: (me

[GENERAL] Retain PREPARE or CONNECT TRIGGER

2009-04-01 Thread Justin Funk
I am using rsyslog (a syslog to database application) to connect to my
postgresql database. It then executes Insert after Insert 100s a
minute I have been reading about the PREPARE statement and think
that could dramatically increase my insert speed. The problem is, as
you all know, that PREPARE only works for the session that it was
executed. I cannot make rsyslog prepare the statement when it start's
it's session, so how can I use the ability of "PREPARE" in this
situation?

I was thinking maybe that there would be away to trigger the prepare
statement when the rsyslog user connects. Or is there another option
like PREPARE that persists?

Thanks for your help.

Justin Funk

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Retain PREPARE or connect trigger

2009-04-01 Thread Justin Funk
I am using rsyslog (a syslog to database application) to connect to my
postgresql database. It then executes Insert after Insert 100s a
minute I have been reading about the PREPARE statement and think
that could dramatically increase my insert speed. The problem is, as
you all know, that PREPARE only works for the session that it was
executed. I cannot make rsyslog prepare the statement when it start's
it's session, so how can I use the ability of "PREPARE" in this
situation?

I was thinking maybe that there would be away to trigger the prepare
statement when the rsyslog user connects. Or is there another option
like PREPARE that persists?

Thanks for your help.

Justin Funk

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Partitioned tabled not using indexes for full text search

2009-03-27 Thread Justin Funk
I have a table that is partitioned on a daily basis.

Full text searches used to be respectably fast with large tables (40
million + records) but insert speed would slow down.  So I went with a
partitioned approach. But now, it doesn't seem like the indexes are
being used.

Any idea why it would not be using the indexes?

Here are appropriate descriptions and Explains:

syslog=# \d systemevents;
 Table "public.systemevents"
   Column   |Type | Modifiers
+-+---
 message| character varying   |
 facility   | integer |
 fromhost   | character varying(80)   |
 priority   | integer |
 devicereportedtime | timestamp without time zone |
 receivedat | timestamp without time zone |
 infounitid | integer |
 syslogtag  | character varying(80)   |
 message_index_col  | tsvector|
Rules:
systemevents_insert_032509 AS
ON INSERT TO systemevents
   WHERE new.devicereportedtime > '2009-03-24 23:59:59'::timestamp
without time zone AND new.devicereportedtime <= '2009-03-25
23:59:59'::timestamp without time zone DO INSTEAD  INSERT INTO
systemevents_032509 (message, facility, fromhost, priority,
devicereportedtime, receivedat, infounitid, syslogtag,
message_index_col)
  VALUES (new.message, new.facility, new.fromhost, new.priority,
new.devicereportedtime, new.receivedat, new.infounitid, new.syslogtag,
new.message_index_col)
systemevents_insert_032609 AS
ON INSERT TO systemevents
   WHERE new.devicereportedtime > '2009-03-25 23:59:59'::timestamp
without time zone AND new.devicereportedtime <= '2009-03-26
23:59:59'::timestamp without time zone DO INSTEAD  INSERT INTO
systemevents_032609 (message, facility, fromhost, priority,
devicereportedtime, receivedat, infounitid, syslogtag,
message_index_col)
  VALUES (new.message, new.facility, new.fromhost, new.priority,
new.devicereportedtime, new.receivedat, new.infounitid, new.syslogtag,
new.message_index_col)
systemevents_insert_032709 AS
ON INSERT TO systemevents
   WHERE new.devicereportedtime > '2009-03-26 23:59:59'::timestamp
without time zone AND new.devicereportedtime <= '2009-03-27
23:59:59'::timestamp without time zone DO INSTEAD  INSERT INTO
systemevents_032709 (message, facility, fromhost, priority,
devicereportedtime, receivedat, infounitid, syslogtag,
message_index_col)
  VALUES (new.message, new.facility, new.fromhost, new.priority,
new.devicereportedtime, new.receivedat, new.infounitid, new.syslogtag,
new.message_index_col)
systemevents_insert_032809 AS
ON INSERT TO systemevents
   WHERE new.devicereportedtime > '2009-03-27 23:59:59'::timestamp
without time zone AND new.devicereportedtime <= '2009-03-28
23:59:59'::timestamp without time zone DO INSTEAD  INSERT INTO
systemevents_032809 (message, facility, fromhost, priority,
devicereportedtime, receivedat, infounitid, syslogtag,
message_index_col)
  VALUES (new.message, new.facility, new.fromhost, new.priority,
new.devicereportedtime, new.receivedat, new.infounitid, new.syslogtag,
new.message_index_col)

syslog=# EXPLAIN ANALYZE SELECT DeviceReportedTime,
REPLACE(REPLACE(Message,'<','<'),'>','>') as Message, Facility,
FromHost, Priority, ReceivedAt, InfoUnitID ,SysLogTag FROM
SystemEvents WHERE message_index_col @@ to_tsquery('english', 'MAIL')
ORDER BY DeviceReportedTime DESC LIMIT 25 OFFSET 0;

QUERY PLAN
-
 Limit  (cost=61548.87..61548.93 rows=25 width=176) (actual
time=31933.287..31933.425 rows=25 loops=1)
   ->  Sort  (cost=61548.87..61551.59 rows=1091 width=176) (actual
time=31933.280..31933.327 rows=25 loops=1)
 Sort Key: public.systemevents.devicereportedtime
 Sort Method:  top-N heapsort  Memory: 29kB
 ->  Result  (cost=0.00..61518.08 rows=1091 width=176) (actual
time=43.351..28941.144 rows=21307 loops=1)
   ->  Append  (cost=0.00..61512.62 rows=1091 width=176)
(actual time=43.337..23706.264 rows=21307 loops=1)
 ->  Seq Scan on systemevents  (cost=0.00..13.00
rows=1 width=151) (actual time=0.007..0.007 rows=0 loops=1)
   Filter: (message_index_col @@ '''mail'''::tsquery)
 ->  Seq Scan on systemevents_032609 systemevents
(cost=0.00..27869.42 rows=494 width=152) (actual time=43.318..6153.645
rows=9309 loops=1)
   Filter: (message_index_col @@ '''mail'''::tsquery)
 ->  Seq Scan on systemevents_032509 systemevents
(cost=0.00..19001.65 rows=339 width=153) (actual time=0.611..5861.674
rows=6239 loops=1)
   Filter: (message_index_col @@ '''mail'''::tsquery)
 ->  Seq Scan on systemevents_03

[GENERAL] Get IP addresses from tsvectors

2009-03-09 Thread Justin Funk
Greetings,

I have a table with a column with type tsvector. It contains the
result of to_tsvector() of varchar field in the table. What I'd like
to do is be able to search through the table and find all of the
distinct IP addresses. Any idea how to turn:

SELECT message_index_col FROM systemevents LIMIT 10;
message_index_col
-
 'leas':4 'return':2
 'leas':2 'found':1 'address':5 'hardwar':4 '65.110.236.113':6
 '00':3 '17':4 '1d':7 '27':6 '4c':8 'f2':5 'via':9 'free':14 'leas':15
'martin':12 'network':11 'dhcpdiscov':1 '10.10.94.126':10
 'leas':2 'found':1 'address':5 'request':4 '65.110.236.113':6
 'ip':4 'leas':2,5 'ident':7 'hardwar':1
 'leas':2 'choos':1 'address':5 'request':4
 'leas':2 'return':1 '65.110.236.113':3
 '00':3 '17':4 '1d':7 '27':6 '4c':8 'f2':5 'via':9 'dhcpdiscov':1
'10.10.94.126':10
 '00':5 '17':6 '1d':9 '27':8 '4c':10 'f2':7 'via':11 'dhcpoffer':1
'10.10.94.126':12 '65.110.236.113':3
 '451':6 'tri':9 '4.7.1':7 'later':11 'pleas':8 'milter':2 'reject':5
'tempfail':12 'n29c3q08020087':1 'kgan...@iastate.edu':4

into

IP_ADDRESSES
-
65.110.236.113
10.10.94.126

Thanks for the help...

Justin Funk

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general