[GENERAL] tsearch2 parser configuration
Is it possible to configure the tsearch2 parser? I'd like a very dumb parser that splits on everything that is not [a-zA-Z0-9.]. The default parser seems to work well on my dataset except for the '/' character ... it doesn't split mike/john into two lexemes. And ideas? Thanks! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Large IN query optimization
I'm wondering (given what I have read about PG and IN), if there is a better way to execute this query. I have (essentially) two tables in a read-only DB, ip_info and events, where many events can map to one ip_info. I'd like to get all the ip_info.info columns where a specific event occurred. This is about what I have come up with (with indexes on all the columns): CREATE TABLE ip_info ( ip IP4, info VARCHAR ); CREATE TABLE events ( ip IP4, event_name VARCHAR, event_type VARCHAR); SELECT ip, info FROM ip_info WHERE ip IN (SELECT ip FROM events WHERE event_name = 'somename'); This works fine when there are few events named 'somename', however this subquery can return a fairly large set of rows (>1 million) and this query takes quite long to execute. Is there a better way to write this query? What would be the optimal plan for this query, given a roughly even distribution of event_name? My current plan looks something like (as I have partitioned the events table by ip): Hash Join Hash Cond ("outer".ip = "inner".ip) -> Seq Scan on ip_info -> Hash -> HashAggregate -> Append -> Index Scan using "events_ip_01_event_name_idx" on events_ip_01 events Index Cond ... -> Index Scan using "events_ip_02_event_name_idx" on events_ip_02 events Index Cond ... Is this the optimal plan for this query? BTW, ip_info has about 5M rows, and the collective events tables have about 50M rows. Also, slightly off-topic ... are there any optimizations that I can use to tell PG that this is a read-only database? PG 8.1.3 on RHEL4.3 x86_64 ... thinking about upgrading to 8.2 when I get the time. Thanks! ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Trapping PL/Perl spi_query_exec errors
This is probably a really basic question, but how do I trap when spi_query_exec returns an error in PL/Perl? I would expect to be able to use the return value to determine an error condition, however I get the message: ERROR: error from Perl function: relation "foo" does not exist at line 7. where line 7 looks like: my $rv = spi_query_exec('select count(*) from foo'); Alternatively, I would expect to be able to trap it with eval, however the trusted PL/Perl does not support eval. Thanks! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] "Broken" Plan? Nested Loop IN Join vs. Nested Loop/Hash Aggregate
I'm having an issue with a query plan that seems to be taking *far* longer than it should. I have the following schema (Note: I had to retype everything from an isolated lab, so I'm hoping that its correct enough to show my problem): CREATE TABLE ip_profiles ( ipIP4 PRIMARY KEY, --more columns ); CREATE TABLE events ( ipIP4 NOT NULL FOREIGN KEY ip_profiles (ip), content BYTEA, --more columns ); CREATE INDEX events_ip_idx ON events USING btree (ip); CREATE TABLE event_ip1 ( CONSTRAINT partition_ip1 CHECK ((ip >= '1.0.0.0') AND (ip <= '1.255.255.255')); ) INHERITS (events); The events table is partitioned on the first octet of the ip, the same indexes are generated across all the partitions, and constraint exclusion is turned on. I would like to get all the ip_profiles that have an event with a given content string within an IP range. My first try was the following (again, retyped): EXPLAIN ANALYZE SELECT count(*) FROM ip_profiles WHERE ip BETWEEN '1.0.0.0' AND '1.255.255.255' AND ip IN (SELECT ip FROM events WHERE ip >= '1.0.0.0' AND ip <= '1.255.255.255' AND content = 'bytes'); Aggregate (cost=2047.15..2047.16 rows=1 width=0) (actual time=7801833.540 rows=1 loops=1) -> Nested Loop IN Join (cost=1.02..2047.15 rows=1 width=0) (actual time=7801833.505..7801833.505 rows=0 loops=1) Join Filter: ("outer".ip = "inner".ip) -> Index Scan using ip_profiles_pkey on ip_profiles (cost=0.00..3.37 rows=1 width=4) (actual time=0.035..1195.567 rows=15 loops=1) Index Cond: ((ip > '1.0.0.0'::ip4) AND (ip <= '1.255.255.255)) -> Append (cost=1.02..2043.48 rows=24 width=4) (actual time=53.988..53.988 rows=0 loops=15) -> Bitmap Heap Scan on events (cost=1.02..4.96 rows=1 width=4) (actual time=0.010..0.010 rows=0 loops=15) Recheck Cond: ((ip > '1.0.0.0'::ip4) AND (ip <= '1.255.255.255)) Filter: (content='bytes'::bytea) -> Bitmap Index Scan on events_ip_idx (cost=0.00..1.02 rows=3 width=0) (actual time=0.006..0.006 rows=0 loops=15) Index Cond: ((ip > '1.0.0.0'::ip4) AND (ip <= '1.255.255.255)) -> Seq Scan on events_ip1 events (cost=0.00 rows=23 width=4) (actual time=53.972..53.972 rows=0 loops=15) Filter: ((ip > '1.0.0.0'::ip4) AND (ip <= '1.255.255.255) AND (content='bytes'::bytea)) Total runtime: 7801834.104 ms Which looks like the "Nested Loop IN Join) is taking a *lot* longer than the planner estimated and, needless to say, is unusable. Taking out the range condition on the ip_profiles table should lead to a query producing identical results, right? I did this ... SELECT count(*) FROM ip_profiles WHERE ip in (SELECT ip FROM events WHERE ip >= '1.0.0.0' AND ip <= '1.255.255.255' AND content = 'random byte string'); ... which results in a *mush* faster time, with a plan that looks like (will retype fully if useful): Nested Loop (cost=2187.94..2289.54 rows=30 width=67) (actual time=223.681..225.693 rows=84 loops=1) -> Hash Aggregate -> Append -> Bitmap Heap Scan on events -> Seq Scan on events_ip1 events -> Index Scan using ip_profiles_pkey on ip_profiles Total runtime: 226.170 ms I realize that the immediate fix is "don't do that", but I'd like to get to the bottom of this so that I'm not suprised in the future. I have seen several threads on the performance of IN (SELECT ..), but this doesn't seem to be the culprit. Any ideas? Tables are recently vacuumed and analyzed, indexes were just created, and I dropped and recreaetd the database with the same effect. I am a bit behind the latest, PG 8.1.3 on RHEL 4.3 x86_64, but I'd like to know that upgrading will fix the problem before going through the hassle of getting new stuff into the lab. I plan on upgrading to 8.2 whenever that is "released". Thanks! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] DELETE eats up all memory and crashes box
Note that whether you have CASCADE or not is not the issue --- if you are doing a delete in a foreign-key-referenced relation at all, you are going to have a trigger event per deleted row no matter what the details of the FK are. So the best/fastest* way to do this would be to remove the FK relationship from the tables, delete all my rows with DELETE ... WHERE ip IN (SELECT ...) in the previously FK-ed tables, delete all the rows in the PK table, then recreate the FK relationships? I tried this and it was pretty snappy, assuming that all the indexes are built. *note: loading/creating a datawarehouse, guaranteed exclusive access. Current DW size is about 10 GB. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] DELETE eats up all memory and crashes box
On 10/6/06, Tom Lane <[EMAIL PROTECTED]> wrote: "Worky Workerson" <[EMAIL PROTECTED]> writes: > When I issue a fairly large DELETE query which has multiple tables > with FOREIGN KEY CASCADE on them, Postgres eats up *all* the > memory on my system and the system crashes. Well, the memory eating is easy to explain: pending-trigger-event list. System crash sounds like a kernel bug or misconfiguration. You might want to make sure you have "strict" memory overcommit mode set, else the problem might just be an inopportune choice of target by the OOM killer. You were right ... had my vm.overcommit_memory set to 0 (default). Now the process gets killed soon after it starts. Is there any way to tune PG to execute such a query, or am I forced to forgo the convenience of the "ON DELETE CASCADE" and manually delete the records with a subselect? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] DELETE eats up all memory and crashes box
When I issue a fairly large DELETE query which has multiple tables with FOREIGN KEY CASCADE on them, Postgres eats up *all* the memory on my system and the system crashes. I figure that there are two problems, one is PG eating up all of the memory, the other is the system crashing and not telling me anything (neither /var/log/messages nor kernel logging tell me anything). What could cause PG to eat up all the memory on a system? Any ideas and/or monitoring methods that I could use to figure out what is going wrong? I had a top running, and the last thing that I see is that PG is using up about 22gb of memory and postmaster and kswapd is working hardest. Perhaps I also set some of my postgresql.conf numbers incorrectly? Thanks! Simplified schema: CREATE TABLE ip_info ( ip IP4R PRIMARY KEY, country VARCHAR, ... ); CREATE TABLE flow ( fm_ip IP4R NOT NULL REFERENCES ip_info (ip) ON DELETE CASCADE, ... ); Offending query: DELETE FROM ip_info WHERE country IN ('Canada', 'Yugoslavia', ...); Hardware: Quad Dual-core Opteron, 16GB RAM, 8GB swap Software: PostgreSQL 8.1.3 on RHEL4 x64_64 Purpose: Dedicated PG data-warehouse server Changed config settings: shared_buffers = 6 temp_buffers = 1 work_mem = 524288 maintenance_work_mem = 524288 max_fsm_pages=200 max_fsm_relations=10 wal_buffers=128 checkpoint_segments=128 checkpoint_timeout=3000 effective_cache_size = 120 random_page_cost = 2 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Dynamic pgplsql triggers
I don't think it's possible. however, what is possible and achieves roughly the same affect is to query the system catalogs (or information schema) and via dynamic sql cut trigger funtions/procedures by looping the results of your query. non-dynamic sql will usually be a bit faster than dynamic as a bonus, the only downsie is you are creating a lot of functions, albeit in easy to manage fashion. If you are really clever, you can put your trigger functions in a special schema for organizational purposes. Thanks for that hint. I've been thinking about that ... I do similar things to create all and trigger the partitions. And, since I have a regular naming for the partitions, I don't have to go to the catalogs. It seems a little excessive to have to create a different function for each of the triggers, however, when each one is basically the same. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Dynamic pgplsql triggers
I have scanned the archives and found the following message from 2004 dealing with v7.4, however it doesn't solve all my problems: http://archives.postgresql.org/pgsql-sql/2004-07/msg00208.php Here is a brief example. I have a table db with a merge trigger given below, shamelessly stolen from Example 36-1 in the docs. When an insert occurs, the number_seen is updated if the id/content are the same, otherwise the new record is inserted. -- Note that id is not unique CREATE TABLE db ( id INTEGER, content BYTEA, number_seen INTEGER ); CREATE OR REPLACE FUNCTION merge_db() RETURNS TRIGGER AS $$ BEGIN UPDATE db SET number_seen = number_seen + NEW.number_seen WHERE id = NEW.id AND content = NEW.content; IF FOUND THEN --Update row RETURN NULL; END IF; RETURN NEW; END; $$ LANGUAGE PLPGSQL; CREATE TRIGGER db_merge_db BEFORE INSERT ON db FOR EACH ROW EXECUTE PROCEDURE merge_db(); This works like a charm. My 'db' table is getting rather large, however, and I started to partition it based on the non-unique column 'id' CREATE TABLE db_1 ( CONSTRAINT partition_id_1 CHECK (id =1); ) INHERITS db; CREATE TABLE db_2 db_100 However now I am at a loss as to how to create a single trigger function that I can use to trigger all of the partition tables. Basically, how do I set the db name to be dynamic? I tried the naive approach of just replacing 'db' with TG_RELNAME, i.e.: UPDATE TG_RELNAME SET number_seen = number_seen + NEW.number_seen WHERE id = NEW.id AND content = NEW.content; but of course this gives me an error about $1, which is what I figured, given that the table name can't be a parameter (right?): 'ERROR: syntax error at or near "$1" at character 9' So then I tried to make the statement dynamic, i.e.: EXECUTE 'UPDATE ' || TG_RELNAME || ' SET number_seen = number_seen + NEW.number_seen WHERE id = NEW.id AND content = NEW.content; but this gives the error: 'ERROR: NEW used in query that is not in a rule'. This seems a little confusing as NEW should be visible to trigger functions, but I assume that it has something to do with the EXECUTE and how the planner couldn't pre-plan the SQL. So I plod on, and try and make all the NEW args dynamic, i.e.: EXECUTE 'UPDATE ' || TG_RELNAME || ' SET number_seen = number_seen + ' || NEW.number_seen || ' WHERE id = ' || NEW.id || ' AND content = ' || NEW.content; However now I get the error: 'ERROR: operator does not exist: bytea || ip4'. I think I understand what is going on ... that bytea doesn't have a text representation, right? So I'm not quite sure where to go from here. How do I make the UPDATE statement in the trigger function operate against the table on which the trigger was fired, while at the same time passing in the values in NEW? Any help would be greatly appreciated. Thanks! CREATE OR REPLACE FUNCTION merge_db() RETURNS TRIGGER AS $$ BEGIN UPDATE db SET number_seen = number_seen + NEW.number_seen WHERE id = NEW.id AND content = NEW.content; IF FOUND THEN --Update row RETURN NULL; END IF; RETURN NEW; END; $$ LANGUAGE PLPGSQL; On 7/31/06, Worky Workerson <[EMAIL PROTECTED]> wrote: I'm trying to trigger a whole bunch of partitions at once (initial DB setup) using the same plpgsql trigger. The trigger is basically the merge trigger in the docs (i.e. UPDATE IF NOT FOUND RETURN NEW ...). I need to use the TG_RELNAME variable within the "UPDATE" in the trigger so that I can use the same function to trigger all of the partitions (correct?), the problem is that I can't quite figure out how. I figure that I will have to use EXECUTE on a string that I build up, right? The problem that I'm having with this approach is that some of the columns of NEW don't have a text conversion, and I'm getting an error whenever the trigger fires. Is there a way around this and/or a better way to trigger a bunch of partitions with the same function? Thanks! -Worky ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Dynamic pgplsql triggers
I'm trying to trigger a whole bunch of partitions at once (initial DB setup) using the same plpgsql trigger. The trigger is basically the merge trigger in the docs (i.e. UPDATE IF NOT FOUND RETURN NEW ...). I need to use the TG_RELNAME variable within the "UPDATE" in the trigger so that I can use the same function to trigger all of the partitions (correct?), the problem is that I can't quite figure out how. I figure that I will have to use EXECUTE on a string that I build up, right? The problem that I'm having with this approach is that some of the columns of NEW don't have a text conversion, and I'm getting an error whenever the trigger fires. Is there a way around this and/or a better way to trigger a bunch of partitions with the same function? Thanks! -Worky ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Exporting data from view
On 6/20/06, Aaron Koning <[EMAIL PROTECTED]> wrote: google pg_dump Really? What command do you use? I've tried the following: pg_dump -t viewname dbname and I get the view definition, whereas I would like the data. Is there an option to pg_dump that I'm missing? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Exporting data from view
I read recently about the efforts underway to COPY from a view, however I was wondering what the current best-practices are for being able to copy out of a view and import that data into an actual table elsewhere. I am currently doing psql -c "SELECT ..." and the using a bit of perl to transform that into something copyable (i.e. CSV), but is there a way to directly export the data in an easily importable form? Thanks! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster