Re: [GENERAL] explain plan visibility

2012-11-12 Thread Albe Laurenz
高健 wrote:
 Please let me dig it a little more:
 
 I think that  when a session is accessing a postgersql table. It will be 
 influenced by  the
 followings:
 
 
 
 Really old data (needed to be vacuumed, eg: old image at one day ago).
 
 Recent data (committed and uncommitted), because they are all in the data 
 block.
 
 
 
 Isn’t it strange that I have to access my data among somebody’s un-decided 
 data?
 
 How if there is a busy system having a table accessed by so many sessions at 
 same time?
 
 They will all be slowed down because of uncommitted but flushed out data, I 
 think.
 
 
 
 I hope in future the architecture of PostgreSQL can put the committed data  
 uncommitted data apart,
 
 Or even put them in separate physical disks.That will Help to improve 
 performance I think.

I don't think that this design choice will ever be changed.

You'll always be influenced by other people's data, no
matter how you handle MVCC.

Let's imagine a design where yet uncommitted data are
stored somewhere else.  How do you handle the following
problems:

1) The space for uncommitted data runs out.  That will keep you
   from doing any transaction exceeding the size of this storage
   space.  Note that Oracle suffers from that problem (though
   they do the opposite of what you propose:  they store uncommitted
   data in the table and move old data to the undo space).

2) Every COMMIT will require that everything is moved from the
   uncommited space to the regular table.  So you'd double the
   I/O cost for the normal case (I guess that's the reason why
   Oracle does it the other way round).

And you wouldn't even get rid of the most unpleasant side effect
of the way that PostgreSQL manages MVCC, the need to VACUUM away
old tuples.

As for Oracle's approach to MVCC, which I think is better than yours,
it has its pros and cons, but I don't think it is demonstrably better
than PostgreSQL's.  No matter how you implement MVCC, you'll have to
pay the price for the benefits somehow.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] explain plan visibility

2012-11-11 Thread 高健
Hi Laurenz:



Thank you for your kind reply.



Please let me dig it a little more:

I think that  when a session is accessing a postgersql table. It will be
influenced by  the followings:



Really old data (needed to be vacuumed, eg: old image at one day ago).

Recent data (committed and uncommitted), because they are all in the data
block.



Isn’t it strange that I have to access my data among somebody’s un-decided
data?

How if there is a busy system having a table accessed by so many sessions
at same time?

They will all be slowed down because of uncommitted but flushed out data, I
think.



I hope in future the architecture of PostgreSQL can put the committed data
 uncommitted data apart,

Or even put them in separate physical disks.That will Help to improve
performance I think.

Jian Gao

2012/11/9 Albe Laurenz laurenz.a...@wien.gv.at

 高健  wrote:
  I have one question about the  visibility of  explain plan.
 
  Firstly  ,  I was inserting into data to a table.   I use :   [ insert
 into ptest  select  *  from
  test02; ]
 
  And test02 table has 10,000,000  records.  And ptest is a parent table,
 which has two distribution
  child table --- ctest01 and ctest02.
 
  When I execute  the above sql statement, it takes some time to execute
 because of data volume.
 
  Before the above sql statement finish,  I open another session with
 psql, and execute:  [ select
  count(*) from  ptest; ]
  Because the insert into statement in other session has not finished, I
 got the result of zero.
 
  Before first session finish, If I check the explain of  select, I got:
  postgres=# explain select count(*) from ptest;
  QUERY PLAN
 
 ---
   Aggregate  (cost=55406.40..55406.41 rows=1 width=0)
 -  Append  (cost=0.00..49601.92 rows=2321793 width=0)
   -  Seq Scan on ptest  (cost=0.00..0.00 rows=1 width=0)
   -  Seq Scan on ctest01 ptest  (cost=0.00..24776.52
 rows=1159752 width=0)
   -  Seq Scan on ctest02 ptest  (cost=0.00..24825.40
 rows=1162040 width=0)
  (5 rows)postgres=#
 
  I think that is because  postgresql  is holding commited and uncommited
 data  together
  physically(vacuum is needed because of it?).
 
  Is there some method that allow  simple select such as select count(*)
 do not  activate  the explain
  plan  ?
  (I think the more table is bigger, the more sample data is needed)

 SELECT count(*) FROM ptest may be simple, but it is expensive
 because it has to visit every tuple in the table.
 The first time you run it after the insert it might also trigger
 considerable write activity (hint bits), but that on the side.

 If you need only an estimate, try
 SELECT sum(reltuples) FROM pg_class WHERE relname IN ('ptest', 'ctest01',
 'ctest02');

 Yours,
 Laurenz Albe



[GENERAL] explain plan visibility

2012-11-09 Thread 高健
Hi all:

I have one question about the  visibility of  explain plan.

Firstly  ,  I was inserting into data to a table.   I use :   [ insert into
ptest  select  *  from  test02; ]

And test02 table has 10,000,000  records.  And ptest is a parent table,
which has two distribution child table --- ctest01 and ctest02.

When I execute  the above sql statement, it takes some time to execute
because of data volume.

Before the above sql statement finish,  I open another session with psql,
and execute:  [ select count(*) from  ptest; ]
Because the insert into statement in other session has not finished, I got
the result of zero.

Before first session finish, If I check the explain of  select, I got:
postgres=# explain select count(*) from ptest;
QUERY PLAN
---
 Aggregate  (cost=55406.40..55406.41 rows=1 width=0)
   -  Append  (cost=0.00..49601.92 rows=2321793 width=0)
 -  Seq Scan on ptest  (cost=0.00..0.00 rows=1 width=0)
 -  Seq Scan on ctest01 ptest  (cost=0.00..24776.52 rows=1159752
width=0)
 -  Seq Scan on ctest02 ptest  (cost=0.00..24825.40 rows=1162040
width=0)
(5 rows)postgres=#

I think that is because  postgresql  is holding commited and uncommited
data  together physically(vacuum is needed because of it?).

Is there some method that allow  simple select such as select count(*)   do
not  activate  the explain plan  ?
(I think the more table is bigger, the more sample data is needed)


Re: [GENERAL] explain plan visibility

2012-11-09 Thread Albe Laurenz
高健  wrote:
 I have one question about the  visibility of  explain plan.
 
 Firstly  ,  I was inserting into data to a table.   I use :   [ insert into 
 ptest  select  *  from
 test02; ]
 
 And test02 table has 10,000,000  records.  And ptest is a parent table, which 
 has two distribution
 child table --- ctest01 and ctest02.
 
 When I execute  the above sql statement, it takes some time to execute 
 because of data volume.
 
 Before the above sql statement finish,  I open another session with psql, and 
 execute:  [ select
 count(*) from  ptest; ]
 Because the insert into statement in other session has not finished, I got 
 the result of zero.
 
 Before first session finish, If I check the explain of  select, I got:
 postgres=# explain select count(*) from ptest;
 QUERY PLAN
 ---
  Aggregate  (cost=55406.40..55406.41 rows=1 width=0)
-  Append  (cost=0.00..49601.92 rows=2321793 width=0)
  -  Seq Scan on ptest  (cost=0.00..0.00 rows=1 width=0)
  -  Seq Scan on ctest01 ptest  (cost=0.00..24776.52 rows=1159752 
 width=0)
  -  Seq Scan on ctest02 ptest  (cost=0.00..24825.40 rows=1162040 
 width=0)
 (5 rows)postgres=#
 
 I think that is because  postgresql  is holding commited and uncommited data  
 together
 physically(vacuum is needed because of it?).
 
 Is there some method that allow  simple select such as select count(*)   do 
 not  activate  the explain
 plan  ?
 (I think the more table is bigger, the more sample data is needed)

SELECT count(*) FROM ptest may be simple, but it is expensive
because it has to visit every tuple in the table.
The first time you run it after the insert it might also trigger
considerable write activity (hint bits), but that on the side.

If you need only an estimate, try
SELECT sum(reltuples) FROM pg_class WHERE relname IN ('ptest', 'ctest01', 
'ctest02');

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general