Re: [GENERAL] Can dml realize the partition table's rule and make good execution plan?

2012-11-11 Thread Craig Ringer
On 11/12/2012 02:24 PM, 高健 wrote:
> Hi Craig:
> Thank you for your reply.
> I checked for constratint_exclusion , in my sample, on and partition
> is same(I have no data on parent table). it really works for me.
> I tried and found that constraint_exclusion can work in simple ways ,
> but not for complicated conditions such as "id=a or id=b".
> And the union all of two simple query really produced a lower cost.
For now you'll probably need to use that approach - UNION ALL of simpler
queries. It's cumbersome and annoying, though.

It'd be really interesting to enhance the query planner to be smarter
about this particular case, but the planner is way past my
scary-code-voodoo level so I can't really help there; I'm more
interested in usability issues in the tools for any development time I get.

--
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Enabling unaccent module on all schemas

2012-11-11 Thread Bernardo Carneiro
Hello,

I've installed unaccent on the public schema using CREATE EXTENSION.
However I have multiple schemas and they don't have the public schema on
their search path. I've tried calling "public".unaccent() but it fails with
'text search dictionary "unaccent" does not exist'. I suppose it is trying
to locate the unaccent dictionary on the wrong path because the search path
does not contain public. Is there anyway to overcome this issue?

Many thanks in advance!

Cheers,
Bernardo


Re: [GENERAL] Can dml realize the partition table's rule and make good execution plan?

2012-11-11 Thread 高健
Hi Craig:
Thank you for your reply.
I checked for constratint_exclusion , in my sample, on and partition is
same(I have no data on parent table). it really works for me.
I tried and found that constraint_exclusion can work in simple ways , but
not for complicated conditions such as "id=a or id=b".
And the union all of two simple query really produced a lower cost.

postgres=# show constraint_exclusion;
 constraint_exclusion
--
 partition
(1 row)

postgres=# explain select * from ptest where id=5000;
   QUERY PLAN

-
 Result  (cost=0.00..13.75 rows=2 width=36)
   ->  Append  (cost=0.00..13.75 rows=2 width=36)
 ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1 width=62)
   Filter: (id = 5000)
 ->  Index Scan using ctest01_id_idx on ctest01 ptest
 (cost=0.00..13.75 rows=1 width=9)
   Index Cond: (id = 5000)
(6 rows)

postgres=#

postgres=# explain select * from ptest where id=60;
   QUERY PLAN

-
 Result  (cost=0.00..13.75 rows=2 width=36)
   ->  Append  (cost=0.00..13.75 rows=2 width=36)
 ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1 width=62)
   Filter: (id = 60)
 ->  Index Scan using ctest01_id_idx on ctest01 ptest
 (cost=0.00..13.75 rows=1 width=9)
   Index Cond: (id = 60)
(6 rows)

postgres=#


postgres=# explain select * from ptest where id=5000
UNION ALL
select * from ptest where id=600;
 QUERY PLAN

-
 Result  (cost=0.00..27.55 rows=4 width=36)
   ->  Append  (cost=0.00..27.55 rows=4 width=36)
 ->  Result  (cost=0.00..13.75 rows=2 width=36)
   ->  Append  (cost=0.00..13.75 rows=2 width=36)
 ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1
width=62)
   Filter: (id = 5000)
 ->  Index Scan using ctest01_id_idx on ctest01 ptest
 (cost=0.00..13.75 rows=1 width=9)
   Index Cond: (id = 5000)
 ->  Result  (cost=0.00..13.75 rows=2 width=36)
   ->  Append  (cost=0.00..13.75 rows=2 width=36)
 ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1
width=62)
   Filter: (id = 600)
 ->  Index Scan using ctest02_id_idx on ctest02 ptest
 (cost=0.00..13.75 rows=1 width=9)
   Index Cond: (id = 600)
(14 rows)

postgres=#


2012/11/12 Craig Ringer 

> On 11/12/2012 10:39 AM, 高健 wrote:
> > The selection used where condition for every  partition table, which
> > is not what I want. my rule is just for id column value.
> > And my select sql statement's where condition is also for id column
> value.
> After re-reading your question I see what you're getting at. You want
> the query planner to rewrite it as if it were:
>
> explain select * from ptest where id=5000
> UNION ALL
> select * from ptest WHERE id=600
>
> and produce a plan like this:
>
>
> regress=> explain select * from ptest where id=5000 UNION ALL select *
> from ptest WHERE id=600;
>  QUERY
> PLAN
>
> -
>  Result  (cost=0.00..25.58 rows=10 width=62)
>->  Append  (cost=0.00..25.58 rows=10 width=62)
>  ->  Result  (cost=0.00..12.74 rows=5 width=62)
>->  Append  (cost=0.00..12.74 rows=5 width=62)
>  ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1
> width=62)
>Filter: (id = 5000)
>  ->  Bitmap Heap Scan on ctest01 ptest
> (cost=4.28..12.74 rows=4 width=62)
>Recheck Cond: (id = 5000)
>->  Bitmap Index Scan on ctest01_id_idx
> (cost=0.00..4.28 rows=4 width=0)
>  Index Cond: (id = 5000)
>  ->  Result  (cost=0.00..12.74 rows=5 width=62)
>->  Append  (cost=0.00..12.74 rows=5 width=62)
>  ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1
> width=62)
>Filter: (id = 600)
>  ->  Bitmap Heap Scan on ctest02 ptest
> (cost=4.28..12.74 rows=4 width=62)
>Recheck Cond: (id = 600)
>->  Bitmap Index Scan on ctest02_id_idx
> (cost=0.00..4.28 rows=4 width=0)
>  Index Cond: (id = 600)
> (18 rows)
>
>
> ie to scan ctest01 using ctest01_id_idx for 500, and ctest02 using
> ctest02_id_idx for
> 600, then combine the results.
>
> If

Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-11 Thread Lists

On 11/09/2012 05:26 PM, Steve Crawford wrote:
Bloat in pg_attribute would correlate with A) (or any constant 
creation/destruction of tables). You can vacuum and/or reindex the 
system tables if you are connected as the superuser but you are better 
off preventing bloat by appropriate adjustment of your configuration 
settings. However note that if you do frequent bulk 
creation/destruction of tables you could end up bloating the attribute 
table between vacuum runs and may need to periodically manually shrink 
it.





Steve,

Our system divides customers into distinct databases, however customers 
are often clustered. (Think: different locations of 7/11) and so we have 
to aggregate data from different databases. We do this with dblink to 
get the data and temp tables to collate it, which appears to be a cause 
of the bloat we're seeing.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can dml realize the partition table's rule and make good execution plan?

2012-11-11 Thread Craig Ringer
On 11/12/2012 10:39 AM, 高健 wrote:
> The selection used where condition for every  partition table, which
> is not what I want. my rule is just for id column value.
> And my select sql statement's where condition is also for id column value.
After re-reading your question I see what you're getting at. You want
the query planner to rewrite it as if it were:

explain select * from ptest where id=5000
UNION ALL
select * from ptest WHERE id=600

and produce a plan like this:


regress=> explain select * from ptest where id=5000 UNION ALL select *
from ptest WHERE id=600;
 QUERY
PLAN 
-
 Result  (cost=0.00..25.58 rows=10 width=62)
   ->  Append  (cost=0.00..25.58 rows=10 width=62)
 ->  Result  (cost=0.00..12.74 rows=5 width=62)
   ->  Append  (cost=0.00..12.74 rows=5 width=62)
 ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1
width=62)
   Filter: (id = 5000)
 ->  Bitmap Heap Scan on ctest01 ptest 
(cost=4.28..12.74 rows=4 width=62)
   Recheck Cond: (id = 5000)
   ->  Bitmap Index Scan on ctest01_id_idx 
(cost=0.00..4.28 rows=4 width=0)
 Index Cond: (id = 5000)
 ->  Result  (cost=0.00..12.74 rows=5 width=62)
   ->  Append  (cost=0.00..12.74 rows=5 width=62)
 ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1
width=62)
   Filter: (id = 600)
 ->  Bitmap Heap Scan on ctest02 ptest 
(cost=4.28..12.74 rows=4 width=62)
   Recheck Cond: (id = 600)
   ->  Bitmap Index Scan on ctest02_id_idx 
(cost=0.00..4.28 rows=4 width=0)
 Index Cond: (id = 600)
(18 rows)


ie to scan ctest01 using ctest01_id_idx for 500, and ctest02 using
ctest02_id_idx for
600, then combine the results.

If so: I'm not aware of any way to make the planner aware that that's
possible. It'd be an interesting enhancement, to apply constraint
exclusion to values pushed down into partitions, rather than simply to
include or exclude partitions based on constraint exclusion.

--
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can dml realize the partition table's rule and make good execution plan?

2012-11-11 Thread Craig Ringer
On 11/12/2012 10:39 AM, 高健 wrote:

> The selection used where condition for every  partition table, which
> is not what I want. my rule is just for id column value.
Is `constraint_exclusion` turned on?

http://www.postgresql.org/docs/current/static/ddl-partitioning.html

http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION


--
Craig Ringer


[GENERAL] Can dml realize the partition table's rule and make good execution plan?

2012-11-11 Thread 高健
Hi all:
I made partition tables:

postgres=# create table ptest(id integer, name varchar(20));
CREATE TABLE
postgres=# create table ctest01(CHECK(id<500)) inherits (ptest);
CREATE TABLE
postgres=# create table ctest02(CHECK(id>=500)) inherits (ptest);
CREATE TABLE
postgres=#
postgres=# create index on ctest01(id);
CREATE INDEX
postgres=# create index on ctest02(id);
CREATE INDEX
postgres=#
postgres=#

postgres=# CREATE OR REPLACE FUNCTION ptest_insert_trigger() RETURNS
TRIGGER AS $$
postgres$#
postgres$# BEGIN
postgres$#
postgres$#IF ( NEW.id <500 ) THEN
postgres$#INSERT INTO ctest01 VALUES (NEW.*);
postgres$#ELSIF ( NEW.id >= 500 ) THEN
postgres$#INSERT INTO ctest02 VALUES (NEW.*);
postgres$#ELSE
postgres$#RAISE EXCEPTION 'Error while inserting data';
postgres$#END IF;
postgres$#
postgres$#   RETURN NULL;
postgres$# END; $$ LANGUAGE plpgsql;
CREATE FUNCTION
postgres=#
postgres=# CREATE TRIGGER insert_ptest_trigger BEFORE INSERT ON ptest FOR
EACH ROW
postgres-#   EXECUTE PROCEDURE ptest_insert_trigger();
CREATE TRIGGER
postgres=#

And  when executing sql statement , I got the following plan:

postgres=# explain select * from ptest where id=5000 or id=600;
  QUERY PLAN

---
 Result  (cost=0.00..54.93 rows=5 width=20)
   ->  Append  (cost=0.00..54.93 rows=5 width=20)
 ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1 width=62)
   Filter: ((id = 5000) OR (id = 600))
 ->  Bitmap Heap Scan on ctest01 ptest  (cost=19.49..27.46 rows=2
width=9)
   Recheck Cond: ((id = 5000) OR (id = 600))
   ->  BitmapOr  (cost=19.49..19.49 rows=2 width=0)
 ->  Bitmap Index Scan on ctest01_id_idx
 (cost=0.00..9.74 rows=1 width=0)
   Index Cond: (id = 5000)
 ->  Bitmap Index Scan on ctest01_id_idx
 (cost=0.00..9.74 rows=1 width=0)
   Index Cond: (id = 600)
 ->  Bitmap Heap Scan on ctest02 ptest  (cost=19.49..27.46 rows=2
width=9)
   Recheck Cond: ((id = 5000) OR (id = 600))
   ->  BitmapOr  (cost=19.49..19.49 rows=2 width=0)
 ->  Bitmap Index Scan on ctest02_id_idx
 (cost=0.00..9.74 rows=1 width=0)
   Index Cond: (id = 5000)
 ->  Bitmap Index Scan on ctest02_id_idx
 (cost=0.00..9.74 rows=1 width=0)
   Index Cond: (id = 600)
(18 rows)

postgres=#

The selection used where condition for every  partition table, which is not
what I want. my rule is just for id column value.
And my select sql statement's where condition is also for id column value.
Is there any method to let  the database to realize my rule of parent table
when creating execution plan?

Thanks in advance


Re: [GENERAL] explain plan visibility

2012-11-11 Thread 高健
Hi Laurenz:



Thank you for your kind reply.



Please let me dig it a little more:

I think that  when a session is accessing a postgersql table. It will be
influenced by  the followings:



Really old data (needed to be vacuumed, eg: old image at one day ago).

Recent data (committed and uncommitted), because they are all in the data
block.



Isn’t it strange that I have to access my data among somebody’s un-decided
data?

How if there is a busy system having a table accessed by so many sessions
at same time?

They will all be slowed down because of uncommitted but flushed out data, I
think.



I hope in future the architecture of PostgreSQL can put the committed data
& uncommitted data apart,

Or even put them in separate physical disks.That will Help to improve
performance I think.

Jian Gao

2012/11/9 Albe Laurenz 

> 高健  wrote:
> > I have one question about the  visibility of  explain plan.
> >
> > Firstly  ,  I was inserting into data to a table.   I use :   [ insert
> into ptest  select  *  from
> > test02; ]
> >
> > And test02 table has 10,000,000  records.  And ptest is a parent table,
> which has two distribution
> > child table --- ctest01 and ctest02.
> >
> > When I execute  the above sql statement, it takes some time to execute
> because of data volume.
> >
> > Before the above sql statement finish,  I open another session with
> psql, and execute:  [ select
> > count(*) from  ptest; ]
> > Because the insert into statement in other session has not finished, I
> got the result of zero.
> >
> > Before first session finish, If I check the explain of  select, I got:
> > postgres=# explain select count(*) from ptest;
> > QUERY PLAN
> >
> ---
> >  Aggregate  (cost=55406.40..55406.41 rows=1 width=0)
> >->  Append  (cost=0.00..49601.92 rows=2321793 width=0)
> >  ->  Seq Scan on ptest  (cost=0.00..0.00 rows=1 width=0)
> >  ->  Seq Scan on ctest01 ptest  (cost=0.00..24776.52
> rows=1159752 width=0)
> >  ->  Seq Scan on ctest02 ptest  (cost=0.00..24825.40
> rows=1162040 width=0)
> > (5 rows)postgres=#
> >
> > I think that is because  postgresql  is holding commited and uncommited
> data  together
> > physically(vacuum is needed because of it?).
> >
> > Is there some method that allow  simple select such as select count(*)
> do not  activate  the explain
> > plan  ?
> > (I think the more table is bigger, the more sample data is needed)
>
> "SELECT count(*) FROM ptest" may be simple, but it is expensive
> because it has to visit every tuple in the table.
> The first time you run it after the insert it might also trigger
> considerable write activity (hint bits), but that on the side.
>
> If you need only an estimate, try
> SELECT sum(reltuples) FROM pg_class WHERE relname IN ('ptest', 'ctest01',
> 'ctest02');
>
> Yours,
> Laurenz Albe
>


[GENERAL] Oracle to PostgreSQL replication with Goldengate

2012-11-11 Thread Jayadevan M
Hello all,
GoldenGate added PostgreSQL as a target database for replication. I tried
setting it up, and not finding any tutorial, put together a how to here -
http://jayadevanm.wordpress.com/2012/11/07/goldengate-replication-from-oracle-to-postgresql/
I think near real-time replication might have quite a few cases - such as
trying out the application on PostgreSQL before really cutting over from an
Oracle database, may be running reports off PostgreSQL and so on.
Regards,
Jayadevan


Re: [GENERAL] Using COPY FROM on a subset of the file's column

2012-11-11 Thread Thomas Kellerer

Craig Ringer wrote on 11.11.2012 11:23:

It seems like you want to be able to say something like this (imaginary)
syntax:

\copy test_copy (id, col1) FROM 'test.csv' CSV COLUMNS(1,2,IGNORE)



or some similar way to provide a column mapping from the CSV columns to
the output of the COPY command.


right, that was what I was hoping fro.


- \copy into a view that had a view trigger (or possibly rules;
untested) to rewrite the incoming inserts and store them in the real
target table; or
- Just \copy into an UNLOGGED or TEMPORARY table then INSERT INTO ...
SELECT the data to the real destination.


Thanks for the tips, I do have another option to use an external tool that will 
let me do that without problem.
The COPY solution would have been faster though, but it's a one-off thing 
anyway.

Regards
Thomas





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using COPY FROM on a subset of the file's column

2012-11-11 Thread Craig Ringer
On 11/11/2012 05:33 PM, Thomas Kellerer wrote:
> Hello,
>
> I'm trying to import a text file into a table using the COPY command.
>
> The text file contains four columns: id, col_1, col_2, col_3 (in that
> order).
> My target table only has id and col_1.
>
> From the documentation I assumed that the following statement
>
>   copy foo (id, col_1)
>   from 'foo.txt'
>   with (format csv, delimiter ';', header true);
>
> would only import id and col_1, but instead it fails with "ERROR: 
> extra data after last expected column".
>
> Am I missing something, or is this a limitation of the COPY command?

\copy (VALUES (1,2,3),(1,4,5),(1,9,21),(42,42,42)) TO 'test.csv' CSV
CREATE TABLE test_copy (id integer, col_1 integer);
\copy test_copy (id, col_1) FROM 'test.csv' CSV
ERROR:  extra data after last expected column
CONTEXT:  COPY test_copy, line 1: "1,2,3"

Looks like a limitation in COPY. The column-list is specifying how you
want the data COPY reads to be inserted into the target table; it's
expected to have fields compatible with that column list.

It seems like you want to be able to say something like this (imaginary)
syntax:

\copy test_copy (id, col1) FROM 'test.csv' CSV COLUMNS(1,2,IGNORE)

or some similar way to provide a column mapping from the CSV columns to
the output of the COPY command.

That isn't directly possible at the moment. You could:

- \copy into a view that had a view trigger (or possibly rules;
untested) to rewrite the incoming inserts and store them in the real
target table; or
- Just \copy into an UNLOGGED or TEMPORARY table then INSERT INTO ...
SELECT the data to the real destination.

--
Craig Ringer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using COPY FROM on a subset of the file's column

2012-11-11 Thread Thomas Kellerer

Hello,

I'm trying to import a text file into a table using the COPY command.

The text file contains four columns: id, col_1, col_2, col_3 (in that order).
My target table only has id and col_1.

From the documentation I assumed that the following statement

  copy foo (id, col_1)
  from 'foo.txt'
  with (format csv, delimiter ';', header true);

would only import id and col_1, but instead it fails with "ERROR:  extra data after 
last expected column".

Am I missing something, or is this a limitation of the COPY command?

Regards
Thomas



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general