[GENERAL] Querying a Large Partitioned DB
Team Amazing, I am building a massive database for storing the syslogs of a room of servers. The database gets about 25 million entries a day, and need to keep them for 180 days. So the total size of the database will be about 4.5 billion records. I need to be able to do full text searches on the message field, and of course, it needs to be reasonably fast. The table is partitioned daily and has this structure: syslog=# \d List of relations Schema |Name | Type | Owner +-+---+--- public | systemevents| table | pgsql public | systemevents_032909 | table | pgsql public | systemevents_033009 | table | pgsql public | systemevents_033109 | table | pgsql public | systemevents_040109 | table | pgsql public | systemevents_040209 | table | pgsql public | systemevents_040309 | table | pgsql public | systemevents_040409 | table | pgsql public | systemevents_040509 | table | pgsql public | systemevents_040609 | table | pgsql public | systemevents_040709 | table | pgsql public | systemevents_040909 | table | pgsql public | systemevents_041009 | table | pgsql (13 rows) syslog=# \d systemevents Table "public.systemevents" Column |Type | Modifiers +-+--- message| character varying | facility | integer | fromhost | character varying(80) | priority | integer | devicereportedtime | timestamp without time zone | receivedat | timestamp without time zone | infounitid | integer | syslogtag | character varying(80) | message_index_col | tsvector| Rules: systemevents_insert_032909 AS ON INSERT TO systemevents WHERE new.devicereportedtime > '2009-03-28 23:59:59'::timestamp without time zone AND new.devicereportedtime <= '2009-03-29 23:59:59'::timestamp without time zone DO INSTEAD INSERT INTO systemevents_032909 (message, facility, fromhost, priority, devicereportedtime, receivedat, infounitid, syslogtag, message_index_col) .. [there are rules like that for each partition] My typical query looks like this: SELECT * FROM SystemEvents WHERE message_index_col @@ to_tsquery('english', 'Term') LIMIT 25 OFFSET 0; Here is an explain analyze: Limit (cost=2422393.00..2422393.06 rows=25 width=153) (actual time=93363.496..93363.610 rows=25 loops=1) -> Sort (cost=2422393.00..2422933.05 rows=216019 width=153) (actual time=93363.490..93363.532 rows=25 loops=1) Sort Key: public.systemevents.devicereportedtime Sort Method: top-N heapsort Memory: 22kB -> Result (cost=0.00..2416297.10 rows=216019 width=153) (actual time=20567.267..93362.574 rows=163 loops=1) -> Append (cost=0.00..2415217.01 rows=216019 width=153) (actual time=20567.244..93361.582 rows=163 loops=1) -> Seq Scan on systemevents (cost=0.00..1750240.39 rows=30891 width=153) (actual time=20567.238..91580.249 rows=24 loops=1) Filter: (message_index_col @@ '''funkju'''::tsquery) -> Bitmap Heap Scan on systemevents_040309 systemevents (cost=1168.86..54860.45 rows=15253 width=152) (actual time=82.429..275.589 rows=20 loops=1) Recheck Cond: (message_index_col @@ '''funkju'''::tsquery) -> Bitmap Index Scan on systemevents_msg_idx_040309 (cost=0.00..1165.04 rows=15253 width=0) (actual time=50.029..50.029 rows=20 loops=1) Index Cond: (message_index_col @@ '''funkju'''::tsquery) -> Bitmap Heap Scan on systemevents_040409 systemevents (cost=1038.56..52300.49 rows=14601 width=147) (actual time=68.006..68.006 rows=0 loops=1) Recheck Cond: (message_index_col @@ '''funkju'''::tsquery) -> Bitmap Index Scan on systemevents_msg_idx_040409 (cost=0.00..1034.91 rows=14601 width=0) (actual time=67.999..67.999 rows=0 loops=1) Index Cond: (message_index_col @@ '''funkju'''::tsquery) -> Bitmap Heap Scan on systemevents_040509 systemevents (cost=1055.06..52482.72 rows=14644 width=150) (actual time=63.257..63.257 rows=0 loops=1) Recheck Cond: (message_index_col @@ '''funkju'''::tsquery) -> Bitmap Index Scan on systemevents_msg_idx_040509 (cost=0.00..1051.40 rows=14644 width=0) (actual time=63.251..63.251 rows=0 loops=1) Index Cond: (message_index_col @@ '''funkju'''::tsquery) -> Bitmap Heap Scan on systemevents_040609 systemevents (cost=1842.50..88135.00 rows=24506 width=152) (actual time=117.747..355.043 rows=34 loops=1) Recheck Cond: (me
[GENERAL] Retain PREPARE or CONNECT TRIGGER
I am using rsyslog (a syslog to database application) to connect to my postgresql database. It then executes Insert after Insert 100s a minute I have been reading about the PREPARE statement and think that could dramatically increase my insert speed. The problem is, as you all know, that PREPARE only works for the session that it was executed. I cannot make rsyslog prepare the statement when it start's it's session, so how can I use the ability of "PREPARE" in this situation? I was thinking maybe that there would be away to trigger the prepare statement when the rsyslog user connects. Or is there another option like PREPARE that persists? Thanks for your help. Justin Funk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Retain PREPARE or connect trigger
I am using rsyslog (a syslog to database application) to connect to my postgresql database. It then executes Insert after Insert 100s a minute I have been reading about the PREPARE statement and think that could dramatically increase my insert speed. The problem is, as you all know, that PREPARE only works for the session that it was executed. I cannot make rsyslog prepare the statement when it start's it's session, so how can I use the ability of "PREPARE" in this situation? I was thinking maybe that there would be away to trigger the prepare statement when the rsyslog user connects. Or is there another option like PREPARE that persists? Thanks for your help. Justin Funk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Partitioned tabled not using indexes for full text search
I have a table that is partitioned on a daily basis. Full text searches used to be respectably fast with large tables (40 million + records) but insert speed would slow down. So I went with a partitioned approach. But now, it doesn't seem like the indexes are being used. Any idea why it would not be using the indexes? Here are appropriate descriptions and Explains: syslog=# \d systemevents; Table "public.systemevents" Column |Type | Modifiers +-+--- message| character varying | facility | integer | fromhost | character varying(80) | priority | integer | devicereportedtime | timestamp without time zone | receivedat | timestamp without time zone | infounitid | integer | syslogtag | character varying(80) | message_index_col | tsvector| Rules: systemevents_insert_032509 AS ON INSERT TO systemevents WHERE new.devicereportedtime > '2009-03-24 23:59:59'::timestamp without time zone AND new.devicereportedtime <= '2009-03-25 23:59:59'::timestamp without time zone DO INSTEAD INSERT INTO systemevents_032509 (message, facility, fromhost, priority, devicereportedtime, receivedat, infounitid, syslogtag, message_index_col) VALUES (new.message, new.facility, new.fromhost, new.priority, new.devicereportedtime, new.receivedat, new.infounitid, new.syslogtag, new.message_index_col) systemevents_insert_032609 AS ON INSERT TO systemevents WHERE new.devicereportedtime > '2009-03-25 23:59:59'::timestamp without time zone AND new.devicereportedtime <= '2009-03-26 23:59:59'::timestamp without time zone DO INSTEAD INSERT INTO systemevents_032609 (message, facility, fromhost, priority, devicereportedtime, receivedat, infounitid, syslogtag, message_index_col) VALUES (new.message, new.facility, new.fromhost, new.priority, new.devicereportedtime, new.receivedat, new.infounitid, new.syslogtag, new.message_index_col) systemevents_insert_032709 AS ON INSERT TO systemevents WHERE new.devicereportedtime > '2009-03-26 23:59:59'::timestamp without time zone AND new.devicereportedtime <= '2009-03-27 23:59:59'::timestamp without time zone DO INSTEAD INSERT INTO systemevents_032709 (message, facility, fromhost, priority, devicereportedtime, receivedat, infounitid, syslogtag, message_index_col) VALUES (new.message, new.facility, new.fromhost, new.priority, new.devicereportedtime, new.receivedat, new.infounitid, new.syslogtag, new.message_index_col) systemevents_insert_032809 AS ON INSERT TO systemevents WHERE new.devicereportedtime > '2009-03-27 23:59:59'::timestamp without time zone AND new.devicereportedtime <= '2009-03-28 23:59:59'::timestamp without time zone DO INSTEAD INSERT INTO systemevents_032809 (message, facility, fromhost, priority, devicereportedtime, receivedat, infounitid, syslogtag, message_index_col) VALUES (new.message, new.facility, new.fromhost, new.priority, new.devicereportedtime, new.receivedat, new.infounitid, new.syslogtag, new.message_index_col) syslog=# EXPLAIN ANALYZE SELECT DeviceReportedTime, REPLACE(REPLACE(Message,'<','<'),'>','>') as Message, Facility, FromHost, Priority, ReceivedAt, InfoUnitID ,SysLogTag FROM SystemEvents WHERE message_index_col @@ to_tsquery('english', 'MAIL') ORDER BY DeviceReportedTime DESC LIMIT 25 OFFSET 0; QUERY PLAN - Limit (cost=61548.87..61548.93 rows=25 width=176) (actual time=31933.287..31933.425 rows=25 loops=1) -> Sort (cost=61548.87..61551.59 rows=1091 width=176) (actual time=31933.280..31933.327 rows=25 loops=1) Sort Key: public.systemevents.devicereportedtime Sort Method: top-N heapsort Memory: 29kB -> Result (cost=0.00..61518.08 rows=1091 width=176) (actual time=43.351..28941.144 rows=21307 loops=1) -> Append (cost=0.00..61512.62 rows=1091 width=176) (actual time=43.337..23706.264 rows=21307 loops=1) -> Seq Scan on systemevents (cost=0.00..13.00 rows=1 width=151) (actual time=0.007..0.007 rows=0 loops=1) Filter: (message_index_col @@ '''mail'''::tsquery) -> Seq Scan on systemevents_032609 systemevents (cost=0.00..27869.42 rows=494 width=152) (actual time=43.318..6153.645 rows=9309 loops=1) Filter: (message_index_col @@ '''mail'''::tsquery) -> Seq Scan on systemevents_032509 systemevents (cost=0.00..19001.65 rows=339 width=153) (actual time=0.611..5861.674 rows=6239 loops=1) Filter: (message_index_col @@ '''mail'''::tsquery) -> Seq Scan on systemevents_03
[GENERAL] Get IP addresses from tsvectors
Greetings, I have a table with a column with type tsvector. It contains the result of to_tsvector() of varchar field in the table. What I'd like to do is be able to search through the table and find all of the distinct IP addresses. Any idea how to turn: SELECT message_index_col FROM systemevents LIMIT 10; message_index_col - 'leas':4 'return':2 'leas':2 'found':1 'address':5 'hardwar':4 '65.110.236.113':6 '00':3 '17':4 '1d':7 '27':6 '4c':8 'f2':5 'via':9 'free':14 'leas':15 'martin':12 'network':11 'dhcpdiscov':1 '10.10.94.126':10 'leas':2 'found':1 'address':5 'request':4 '65.110.236.113':6 'ip':4 'leas':2,5 'ident':7 'hardwar':1 'leas':2 'choos':1 'address':5 'request':4 'leas':2 'return':1 '65.110.236.113':3 '00':3 '17':4 '1d':7 '27':6 '4c':8 'f2':5 'via':9 'dhcpdiscov':1 '10.10.94.126':10 '00':5 '17':6 '1d':9 '27':8 '4c':10 'f2':7 'via':11 'dhcpoffer':1 '10.10.94.126':12 '65.110.236.113':3 '451':6 'tri':9 '4.7.1':7 'later':11 'pleas':8 'milter':2 'reject':5 'tempfail':12 'n29c3q08020087':1 'kgan...@iastate.edu':4 into IP_ADDRESSES - 65.110.236.113 10.10.94.126 Thanks for the help... Justin Funk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general