[GENERAL] tsearch2 parser configuration

2007-02-06 Thread Worky Workerson

Is it possible to configure the tsearch2 parser?  I'd like a very dumb
parser that splits on everything that is not [a-zA-Z0-9.].  The
default parser seems to work well on my dataset except for the '/'
character ... it doesn't split mike/john into two lexemes.  And ideas?

Thanks!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] Large IN query optimization

2006-12-13 Thread Worky Workerson

I'm wondering (given what I have read about PG and IN), if there is a
better way to execute this query.  I have (essentially) two tables in
a read-only DB, ip_info and events, where many events can map to one
ip_info.  I'd like to get all the ip_info.info columns where a
specific event occurred.  This is about what I have come up with (with
indexes on all the columns):

CREATE TABLE ip_info ( ip IP4, info VARCHAR );
CREATE TABLE events ( ip IP4, event_name VARCHAR, event_type VARCHAR);

SELECT ip, info
 FROM ip_info
 WHERE ip IN (SELECT ip FROM events WHERE event_name = 'somename');

This works fine when there are few events named 'somename', however
this subquery can return a fairly large set of rows (>1 million) and
this query takes quite long to execute.  Is there a better way to
write this query?  What would be the optimal plan for this query,
given a roughly even distribution of event_name?  My current plan
looks something like (as I have partitioned the events table by ip):

Hash Join
 Hash Cond ("outer".ip = "inner".ip)
 -> Seq Scan on ip_info
 -> Hash
   -> HashAggregate
 -> Append
   -> Index Scan using "events_ip_01_event_name_idx" on events_ip_01 events
 Index Cond ...
   -> Index Scan using "events_ip_02_event_name_idx" on events_ip_02 events
 Index Cond ...

Is this the optimal plan for this query?  BTW, ip_info has about 5M
rows, and the collective events tables have about 50M rows.

Also, slightly off-topic ... are there any optimizations that I can
use to tell PG that this is a read-only database?

PG 8.1.3 on RHEL4.3 x86_64 ... thinking about upgrading to 8.2 when I
get the time.

Thanks!

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Trapping PL/Perl spi_query_exec errors

2006-11-22 Thread Worky Workerson

This is probably a really basic question, but how do I trap when
spi_query_exec returns an error in PL/Perl?  I would expect to be able
to use the return value to determine an error condition, however I get
the message:

ERROR:  error from Perl function: relation "foo" does not exist at line 7.

where line 7 looks like:

my $rv = spi_query_exec('select count(*) from foo');

Alternatively, I would expect to be able to trap it with eval, however
the trusted PL/Perl does not support eval.

Thanks!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] "Broken" Plan? Nested Loop IN Join vs. Nested Loop/Hash Aggregate

2006-11-08 Thread Worky Workerson

I'm having an issue with a query plan that seems to be taking *far*
longer than it should.  I have the following schema (Note: I had to
retype everything from an isolated lab, so I'm hoping that its correct
enough to show my problem):

CREATE TABLE ip_profiles (
 ipIP4 PRIMARY KEY,
 --more columns
);

CREATE TABLE events (
 ipIP4 NOT NULL FOREIGN KEY ip_profiles (ip),
 content BYTEA,
 --more columns
);
CREATE INDEX events_ip_idx ON events USING btree (ip);

CREATE TABLE event_ip1 (
 CONSTRAINT partition_ip1 CHECK ((ip >= '1.0.0.0') AND (ip <=
'1.255.255.255'));
) INHERITS (events);

The events table is partitioned on the first octet of the ip, the same
indexes are generated across all the partitions, and constraint
exclusion is turned on.

I would like to get all the ip_profiles that have an event with a
given content string within an IP range.  My first try was the
following (again, retyped):

EXPLAIN ANALYZE SELECT count(*) FROM ip_profiles WHERE ip BETWEEN
'1.0.0.0' AND '1.255.255.255' AND ip IN (SELECT ip FROM events WHERE
ip >= '1.0.0.0' AND ip <= '1.255.255.255' AND content = 'bytes');

Aggregate (cost=2047.15..2047.16 rows=1 width=0) (actual
time=7801833.540 rows=1 loops=1)
 ->  Nested Loop IN Join (cost=1.02..2047.15 rows=1 width=0) (actual
time=7801833.505..7801833.505 rows=0 loops=1)
   Join Filter: ("outer".ip = "inner".ip)
   -> Index Scan using ip_profiles_pkey on ip_profiles
(cost=0.00..3.37 rows=1 width=4) (actual time=0.035..1195.567
rows=15 loops=1)
 Index Cond: ((ip > '1.0.0.0'::ip4) AND (ip <= '1.255.255.255))
   -> Append (cost=1.02..2043.48 rows=24 width=4) (actual
time=53.988..53.988 rows=0 loops=15)
 -> Bitmap Heap Scan on events (cost=1.02..4.96 rows=1
width=4) (actual time=0.010..0.010 rows=0 loops=15)
   Recheck Cond: ((ip > '1.0.0.0'::ip4) AND (ip <=
'1.255.255.255))
   Filter: (content='bytes'::bytea)
   -> Bitmap Index Scan on events_ip_idx
(cost=0.00..1.02 rows=3 width=0) (actual time=0.006..0.006 rows=0
loops=15)
 Index Cond: ((ip > '1.0.0.0'::ip4) AND (ip
<= '1.255.255.255))
 -> Seq Scan on events_ip1 events (cost=0.00 rows=23
width=4) (actual time=53.972..53.972 rows=0 loops=15)
   Filter: ((ip > '1.0.0.0'::ip4) AND (ip <=
'1.255.255.255) AND (content='bytes'::bytea))
Total runtime: 7801834.104 ms

Which looks like the "Nested Loop IN Join) is taking a *lot* longer
than the planner estimated and, needless to say, is unusable.  Taking
out the range condition on the ip_profiles table should lead to a
query producing identical results, right?  I did this ...

SELECT count(*) FROM ip_profiles WHERE ip in (SELECT ip FROM events
WHERE ip >= '1.0.0.0' AND ip <= '1.255.255.255' AND content = 'random
byte string');

... which results in a *mush* faster time, with a plan that looks like
(will retype fully if useful):

Nested Loop (cost=2187.94..2289.54 rows=30 width=67) (actual
time=223.681..225.693 rows=84 loops=1)
 -> Hash Aggregate
   -> Append
 -> Bitmap Heap Scan on events
 -> Seq Scan on events_ip1 events
 -> Index Scan using ip_profiles_pkey on ip_profiles
Total runtime: 226.170 ms

I realize that the immediate fix is "don't do that", but I'd like to
get to the bottom of this so that I'm not suprised in the future.  I
have seen several threads on the performance of IN (SELECT ..), but
this doesn't seem to be the culprit.  Any ideas?

Tables are recently vacuumed and analyzed, indexes were just created,
and I dropped and recreaetd the database with the same effect.  I am a
bit behind the latest, PG 8.1.3 on RHEL 4.3 x86_64, but I'd like to
know that upgrading will fix the problem before going through the
hassle of getting new stuff into the lab.  I plan on upgrading to 8.2
whenever that is "released".

Thanks!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] DELETE eats up all memory and crashes box

2006-10-06 Thread Worky Workerson

Note that whether you have CASCADE or not is not the issue --- if you
are doing a delete in a foreign-key-referenced relation at all, you
are going to have a trigger event per deleted row no matter what the
details of the FK are.


So the best/fastest* way to do this would be to remove the FK
relationship from the tables, delete all my rows with DELETE ... WHERE
ip IN (SELECT ...) in the previously FK-ed tables,  delete all the
rows in the PK table, then recreate the FK relationships?  I tried
this and it was pretty snappy, assuming that all the indexes are
built.

*note: loading/creating a datawarehouse, guaranteed exclusive access.
Current DW size is about 10 GB.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] DELETE eats up all memory and crashes box

2006-10-06 Thread Worky Workerson

On 10/6/06, Tom Lane <[EMAIL PROTECTED]> wrote:

"Worky Workerson" <[EMAIL PROTECTED]> writes:
> When I issue a fairly large DELETE query which has multiple tables
> with FOREIGN KEY  CASCADE on them, Postgres eats up *all* the
> memory on my system and the system crashes.

Well, the memory eating is easy to explain: pending-trigger-event list.
System crash sounds like a kernel bug or misconfiguration.  You might
want to make sure you have "strict" memory overcommit mode set, else the
problem might just be an inopportune choice of target by the OOM killer.


You were right ... had my vm.overcommit_memory set to 0 (default).
Now the process gets killed soon after it starts.

Is there any way to tune PG to execute such a query, or am I forced to
forgo the convenience of the "ON DELETE CASCADE" and manually delete
the records with a subselect?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[GENERAL] DELETE eats up all memory and crashes box

2006-10-06 Thread Worky Workerson

When I issue a fairly large DELETE query which has multiple tables
with FOREIGN KEY  CASCADE on them, Postgres eats up *all* the
memory on my system and the system crashes.  I figure that there are
two problems, one is PG eating up all of the memory, the other is the
system crashing and not telling me anything (neither /var/log/messages
nor kernel logging tell me anything).  What could cause PG to eat up
all the memory on a system?

Any ideas and/or monitoring methods that I could use to figure out
what is going wrong?  I had a top running, and the last thing that I
see is that PG is using up about 22gb of memory and postmaster and
kswapd is working hardest.  Perhaps I also set some of my
postgresql.conf numbers incorrectly?

Thanks!



Simplified schema:

CREATE TABLE ip_info (
   ip IP4R PRIMARY KEY,
   country VARCHAR,
   ...
);

CREATE TABLE flow (
   fm_ip IP4R NOT NULL REFERENCES ip_info (ip) ON DELETE CASCADE,
   ...
);

Offending query: DELETE FROM ip_info WHERE country IN ('Canada',
'Yugoslavia', ...);

Hardware: Quad Dual-core Opteron, 16GB RAM, 8GB swap
Software: PostgreSQL 8.1.3 on RHEL4 x64_64
Purpose:  Dedicated PG data-warehouse server

Changed config settings:
 shared_buffers = 6
 temp_buffers = 1
 work_mem = 524288
 maintenance_work_mem = 524288
 max_fsm_pages=200
 max_fsm_relations=10
 wal_buffers=128
 checkpoint_segments=128
 checkpoint_timeout=3000
 effective_cache_size = 120
 random_page_cost = 2

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Dynamic pgplsql triggers

2006-08-01 Thread Worky Workerson

I don't think it's possible.  however, what is possible and achieves
roughly the same affect is to query the system catalogs  (or
information schema) and via dynamic sql cut trigger
funtions/procedures by looping the results of your query.  non-dynamic
sql will usually be a bit faster than dynamic as a bonus, the only
downsie is you are creating a lot of functions, albeit in easy to
manage fashion.  If you are really clever, you can put your trigger
functions in a special schema for organizational purposes.


Thanks for that hint.  I've been thinking about that ... I do similar
things to create all and trigger the partitions.  And, since I have a
regular naming for the partitions, I don't have to go to the catalogs.
It seems a little excessive to have to create a different function
for each of the triggers, however, when each one is basically the
same.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Dynamic pgplsql triggers

2006-08-01 Thread Worky Workerson

I have scanned the archives and found the following message from 2004
dealing with v7.4, however it doesn't solve all my problems:

http://archives.postgresql.org/pgsql-sql/2004-07/msg00208.php

Here is a brief example.  I have a table db with a merge trigger given
below, shamelessly stolen from Example 36-1 in the docs.  When an
insert occurs, the number_seen is updated if the id/content are the
same, otherwise the new record is inserted.

-- Note that id is not unique
CREATE TABLE db (
   id INTEGER,
   content BYTEA,
   number_seen INTEGER
);

CREATE OR REPLACE FUNCTION merge_db() RETURNS TRIGGER AS $$
 BEGIN
   UPDATE db SET number_seen = number_seen + NEW.number_seen
 WHERE id = NEW.id AND content = NEW.content;
   IF FOUND THEN
 --Update row
 RETURN NULL;
   END IF;

   RETURN NEW;
 END;
$$ LANGUAGE PLPGSQL;

CREATE TRIGGER db_merge_db BEFORE INSERT ON db FOR EACH ROW EXECUTE
PROCEDURE merge_db();

This works like a charm.  My 'db' table is getting rather large,
however, and I started to partition it based on the non-unique column
'id'

CREATE TABLE db_1 (
 CONSTRAINT partition_id_1 CHECK (id =1);
) INHERITS db;

CREATE TABLE db_2  db_100

However now I am at a loss as to how to create a single trigger
function that I can use to trigger all of the partition tables.
Basically, how do I set the db name to be dynamic?  I tried the naive
approach of just replacing 'db' with TG_RELNAME, i.e.:

   UPDATE TG_RELNAME SET number_seen = number_seen + NEW.number_seen
 WHERE id = NEW.id AND content = NEW.content;

but of course this gives me an error about $1, which is what I
figured, given that the table name can't be a parameter (right?):
'ERROR:  syntax error at or near "$1" at character 9'

So then I tried to make the statement dynamic, i.e.:

   EXECUTE 'UPDATE ' || TG_RELNAME || '
 SET number_seen = number_seen + NEW.number_seen
 WHERE id = NEW.id AND content = NEW.content;

but this gives the error: 'ERROR:  NEW used in query that is not in a
rule'.  This seems a little confusing as NEW should be visible to
trigger functions, but I assume that it has something to do with the
EXECUTE and how the planner couldn't pre-plan the SQL.

So I plod on, and try and make all the NEW args dynamic, i.e.:

   EXECUTE 'UPDATE ' || TG_RELNAME || '
 SET number_seen = number_seen + ' || NEW.number_seen || '
 WHERE id = ' || NEW.id || ' AND content = ' || NEW.content;

However now I get the error: 'ERROR:  operator does not exist:  bytea
|| ip4'.  I think I understand what is going on ... that bytea doesn't
have a text representation, right?

So I'm not quite sure where to go from here.  How do I make the UPDATE
statement in the trigger function operate against the table on which
the trigger was fired, while at the same time passing in the values in
NEW?

Any help would be greatly appreciated.

Thanks!



CREATE OR REPLACE FUNCTION merge_db() RETURNS TRIGGER AS $$
 BEGIN
   UPDATE db SET number_seen = number_seen + NEW.number_seen
 WHERE id = NEW.id AND content = NEW.content;
   IF FOUND THEN
 --Update row
 RETURN NULL;
   END IF;

   RETURN NEW;
 END;
$$ LANGUAGE PLPGSQL;


On 7/31/06, Worky Workerson <[EMAIL PROTECTED]> wrote:

I'm trying to trigger a whole bunch of partitions at once (initial DB
setup) using the same plpgsql trigger.  The trigger is basically the
merge trigger in the docs (i.e. UPDATE IF NOT FOUND RETURN NEW ...).

I need to use the TG_RELNAME variable within the "UPDATE" in the
trigger so that I can use the same function to trigger all of the
partitions (correct?), the problem is that I can't quite figure out
how.  I figure that I will have to use EXECUTE on a string that I
build up, right?  The problem that I'm having with this approach is
that some of the columns of NEW don't have a text conversion, and I'm
getting an error whenever the trigger fires.  Is there a way around
this and/or a better way to trigger a bunch of partitions with the
same function?

Thanks!
-Worky



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] Dynamic pgplsql triggers

2006-07-31 Thread Worky Workerson

I'm trying to trigger a whole bunch of partitions at once (initial DB
setup) using the same plpgsql trigger.  The trigger is basically the
merge trigger in the docs (i.e. UPDATE IF NOT FOUND RETURN NEW ...).

I need to use the TG_RELNAME variable within the "UPDATE" in the
trigger so that I can use the same function to trigger all of the
partitions (correct?), the problem is that I can't quite figure out
how.  I figure that I will have to use EXECUTE on a string that I
build up, right?  The problem that I'm having with this approach is
that some of the columns of NEW don't have a text conversion, and I'm
getting an error whenever the trigger fires.  Is there a way around
this and/or a better way to trigger a bunch of partitions with the
same function?

Thanks!
-Worky

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Exporting data from view

2006-06-20 Thread Worky Workerson

On 6/20/06, Aaron Koning <[EMAIL PROTECTED]> wrote:

google pg_dump


Really?  What command do you use?  I've tried the following:

pg_dump -t viewname dbname

and I get the view definition, whereas I would like the data.  Is
there an option to pg_dump that I'm missing?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Exporting data from view

2006-06-20 Thread Worky Workerson

I read recently about the efforts underway to COPY from a view,
however I was wondering what the current best-practices are for being
able to copy out of a view and import that data into an actual table
elsewhere.  I am currently doing psql -c "SELECT ..." and the using a
bit of perl to transform that into something copyable (i.e. CSV), but
is there a way to directly export the data in an easily importable
form?

Thanks!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster