[HACKERS] strange cost for correlated subquery

2008-03-16 Thread Pavel Stehule
Hello

I tested speed SELF JOIN and correlated subquery for couting of subtotals:

It's strange, so correlated subqueries is faster, but it has much higher cost:

postgres=# explain analyze select t1.id, t1.sale_date, t1.product,
t1.sale_price, sum(t2.sale_price) from
history t1 inner join history t2 on t1.id = t2.id and t1.product = t2.product
group by t1.id, t1.sale_date, t1.product, t1.sale_price
order by t1.id
;
   QUERY
PLAN

 Sort  (cost=3678.85..3691.36 rows=5003 width=19) (actual
time=1553.575..1560.618 rows=5003 loops=1)
   Sort Key: t1.id
   Sort Method:  quicksort  Memory: 480kB
   -  HashAggregate  (cost=3308.91..3371.45 rows=5003 width=19)
(actual time=1530.276..1540.206 rows=5003 loops=1)
 -  Nested Loop  (cost=0.00..1708.29 rows=128050 width=19)
(actual time=0.264..1034.048 rows=198333 loops=1)
   -  Seq Scan on history t1  (cost=0.00..78.03 rows=5003
width=15) (actual time=0.077..8.835 rows=5003 loops=1)
   -  Index Scan using fxxx on history t2
(cost=0.00..0.31 rows=1 width=11) (actual time=0.017..0.087 rows=40
loops=5003)
 Index Cond: (((t2.product)::text =
(t1.product)::text) AND (t1.id = t2.id))
 Total runtime: 1567.125 ms
(9 rows)

postgres=# explain analyze SELECT sale_date, product, sale_price,
  COALESCE((SELECT SUM(sale_price)
   FROM history
  WHERE product = o.product
AND id = o.id), 0) AS total
  FROM history o;
   QUERY PLAN

 Seq Scan on history o  (cost=0.00..41532.29 rows=5003 width=15)
(actual time=0.073..825.333 rows=5003 loops=1)
   SubPlan
 -  Aggregate  (cost=8.28..8.29 rows=1 width=4) (actual
time=0.158..0.159 rows=1 loops=5003)
   -  Index Scan using fxxx on history  (cost=0.00..8.27
rows=1 width=4) (actual time=0.018..0.086 rows=40 loops=5003)
 Index Cond: (((product)::text = ($0)::text) AND (id = $1))
 Total runtime: 833.213 ms
(6 rows)

postgres=# show effective_cache_size ;
 effective_cache_size
--
 600MB
(1 row)

postgres=# SHOW shared_buffers ;
 shared_buffers

 300MB
(1 row)

Maybe I have too big random_page_cost?
postgres=# SHOW random_page_cost ;
 random_page_cost
--
 4
(1 row)

Time: 0,351 ms
postgres=# set random_page_cost to 2;
SET
Time: 0,330 ms
postgres=# SHOW random_page_cost ;
 random_page_cost
--
 2
(1 row)

Time: 0,320 ms
postgres=# explain analyze SELECT sale_date, product, sale_price,
  COALESCE((SELECT SUM(sale_price)
   FROM history
  WHERE product = o.product
AND id = o.id), 0) AS total
  FROM history o;
   QUERY PLAN

 Seq Scan on history o  (cost=0.00..21518.09 rows=5003 width=15)
(actual time=0.132..809.701 rows=5003 loops=1)
   SubPlan
 -  Aggregate  (cost=4.28..4.29 rows=1 width=4) (actual
time=0.154..0.155 rows=1 loops=5003)
   -  Index Scan using fxxx on history  (cost=0.00..4.27
rows=1 width=4) (actual time=0.020..0.088 rows=40 loops=5003)
 Index Cond: (((product)::text = ($0)::text) AND (id = $1))
 Total runtime: 817.358 ms

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 I'm curious where the comments are being stored, since it's not in the html
 source. The javascript must be pulling them from another url?

 The comments  are stored at JS-Kit:

   http://js-kit.com/comments/

It's stored in their server?


 Well, I can't have @ in the URL because it is usually forbidden by
 browsers for phishing protection, so I was converting @ to '.' anyway. 
 What I have now done is display the real message id and MD5 permanent
 link at the top of each message, e.g.:

   http://momjian.us/mhonarc/patches/msg00054.html

Any chance we could put that on the actual listing page somehow. perhaps in a
tiny font?? I want to be able to copy the thread and get enough information
for future reference.

Also, any chance you could use the permanent urls in the thread listing?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Bruce Momjian
Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  I'm curious where the comments are being stored, since it's not in the html
  source. The javascript must be pulling them from another url?
 
  The comments  are stored at JS-Kit:
 
  http://js-kit.com/comments/
 
 It's stored in their server?

Yes, that was the beauty of it --- I just add javascript with a tag and
all comments are handled by them.

  Well, I can't have @ in the URL because it is usually forbidden by
  browsers for phishing protection, so I was converting @ to '.' anyway. 
  What I have now done is display the real message id and MD5 permanent
  link at the top of each message, e.g.:
 
  http://momjian.us/mhonarc/patches/msg00054.html
 
 Any chance we could put that on the actual listing page somehow. perhaps in a
 tiny font?? I want to be able to copy the thread and get enough information
 for future reference.

You want the message-id on the listing page?  Sure, I was doing that
before but I didn't know anyone wanted it and it looked a little
cluttered.  Let me know.

 Also, any chance you could use the permanent urls in the thread listing?

Uh, that would be a little tricky because the next/previous wants to go
by message number increment, I think.  I could post-process all the HTML
files to do a search/replace.

I think the bigger problem is the threads move around on the thread
pages, and that seems impossible to fix as items are added and removed.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Gregory Stark
Bruce Momjian [EMAIL PROTECTED] writes:

 You want the message-id on the listing page?  Sure, I was doing that
 before but I didn't know anyone wanted it and it looked a little
 cluttered.  Let me know.

I agree it was too cluttered for normal use.

What I'm trying to do is get a page which has the message-id's of all the
messages and the comments on the same page. That way I can dump the data into
a text file to experiment with.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 The comments  are stored at JS-Kit:
 http://js-kit.com/comments/
 
 It's stored in their server?

 Yes, that was the beauty of it --- I just add javascript with a tag and
 all comments are handled by them.

Beauty?  I don't think we want to rely on non-project-controlled
servers for anything that's part of our core infrastructure.  If/when
js-kit.com goes belly-up, what happens to that data?  Also, what kind of
privacy guarantees have we got?  (Admittedly, privacy may be moot for
information that was originally entered on a public web page, but the
whole idea of someone else controlling our data just makes me itch.)

I can go along with this as a jury-rig setup for our first commit fest,
but it just seems like another powerful argument for moving to something
wiki-based as soon as we can get that sorted.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange cost for correlated subquery

2008-03-16 Thread Tom Lane
Pavel Stehule [EMAIL PROTECTED] writes:
 It's strange, so correlated subqueries is faster, but it has much higher cost:

In the nestloop plan, the estimated cost for the indexscan is discounted
based on the knowledge that it'll be executed repeatedly:

-  Index Scan using fxxx on history t2
 (cost=0.00..0.31 rows=1 width=11) (actual time=0.017..0.087 rows=40
 loops=5003)

In the subplan case that doesn't happen:

-  Index Scan using fxxx on history  (cost=0.00..8.27
 rows=1 width=4) (actual time=0.018..0.086 rows=40 loops=5003)
  Index Cond: (((product)::text = ($0)::text) AND (id = $1))

Had the same discount been applied then the estimated costs would be
pretty nearly in line with reality, if I did the math right.

It'd be nice to do better but I'm not sure how; at the time that we
create plans for sub-queries we don't really have any way to know how
often they'll be called by the upper query.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single table forcing sequential scans on query plans

2008-03-16 Thread Tom Lane
Cristian Gafton [EMAIL PROTECTED] writes:
 I have a weird query execution plan problem I am trying to debug on 
 Postgresql 8.2.6. I have a query that joins against a temporary table that 
 has very few rows.

Is it possible that the temp table ever has exactly zero rows?

 My questions are:
 - what would make the analyze operation fail in the eyes of the planner?
 - why joining to a single unanalyzed table disables any and all indexes from 
 the other tables references in the query?

That's entirely the wrong way to think about it.  The planner is
choosing a good plan based on its estimates of table sizes, which
are wildly different in the two cases:

-  Seq Scan on tmpinstanceid  
 (cost=0.00..1.02 rows=2 width=8) (actual time=0.005..0.007 rows=2 loops=1)

  -  Seq Scan on tmpinstanceid  (cost=0.00..29.40 
 rows=1940 width=8)

If there actually were nearly 2000 rows in the temp table, that
nested-loops plan would take about a thousand times longer than
it does, and you'd not be nearly so pleased with it.  The
merge-and-hash-joins plan looks quite sane to me for that table size.

The larger estimate is coming from some heuristics that are applied
when the table size recorded in pg_class.relpages  reltuples is
exactly zero.  It's intentionally not small, to keep us from choosing
a plan with brittle performance behavior when we are looking at a
table that's never been vacuumed or analyzed.

The only idea I have for how the planner could ignore a previous
analyze result is if the analyze found the table to be of zero size.
Then the heuristic would still be applied because relpages == 0.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hash index build patch has *worse* performance at small table sizes

2008-03-16 Thread Tom Lane
I've been reviewing the hash index build patch submitted here:
http://archives.postgresql.org/pgsql-patches/2007-10/msg00154.php

Although it definitely helps on large indexes, it's actually
counterproductive on not-so-large ones.  The test case I'm using
is random integers generated like this:
create table foo as select (random() * N)::int as f1
  from generate_series(1,N);
select count(*) from foo; -- force hint bit updates
checkpoint;
then timing
create index fooi on foo using hash(f1);

Using all-default configuration settings on some not-very-new hardware,
at N = 1E6 I see

8.3.1:  30 sec
With pre-expansion of index (CVS HEAD): 24 sec
With sorting:   72 sec
To build a btree index on same data:34 sec

Now this isn't amazingly surprising, because the original argument for
doing sorting was to improve locality of access to the index during
the build, and that only matters if you've got an index significantly
bigger than memory.  If the index fits in RAM then the sort is pure
overhead.

The obvious response to this is to use the sorting approach only when
the estimated index size exceeds some threshold.  One possible choice of
threshold would be shared_buffers (or temp_buffers for a temp index)
but I think that is probably too conservative, because in most scenarios
the kernel's disk cache is available too.  Plus you can't tweak that
setting without a postmaster restart.  I'm tempted to use
effective_cache_size, which attempts to measure an appropriate number
and can be set locally within the session doing the CREATE INDEX if
necessary.  Or we could invent a new GUC parameter, but that is probably
overkill.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Rewriting Free Space Map

2008-03-16 Thread Heikki Linnakangas
I've started working on revamping Free Space Map, using the approach 
where we store a map of heap pages on every nth heap page. What we need 
now is discussion on the details of how exactly it should work.


Here's my rough plan on the implementation. It's long, sorry. I'm fairly 
confident with it, but please let me know if you see any problems or 
have any suggestions or better ideas.


Heap FSM


The FSM is stored in the special area of every nth heap page. When 
extending the relation, the heapam checks if the block number of the new 
page is one that belongs to the FSM. If it is, it let's the FSM to 
initialize it by calling InitFSMPage() on it, and extends the relation 
again to get another, normal heap page.


I chose the every nth page is an FSM page approach, rather than using 
a separate relfilenode, which I also considered. The separate 
relfilenode approach has some advantages, like being able to scan all 
FSM pages in a sequential fashion, but it involves a fair amount of 
catalog and buffer manager changes.


It's convenient that the FSM uses up the whole page, leaving no room for 
heap tuples. It simplifies the locking, as we don't need to worry with 
the possibility in the FSM that the caller is already holding a lock on 
the same page.


In an FSM page, there's one byte for each of the next N heap pages, 
starting from the FSM page. That one byte stores the amount of free 
space on the corresponding heap page, in BLCKSZ/256 byte precision (32 
bytes with default block size).


The mapping of free space to these 256 buckets wouldn't necessarily 
have to be a linear one, we could for example have a single bucket for 
pages with more than BLCKSZ/2 bytes of free space and divide the rest 
linearly into 16 byte buckets, but let's keep it simple for now. Of 
course, we could also just use 2 bytes per page, and store the page size 
exactly, but 32 byte precision seems like enough to me.



Index FSM
-

Indexes use a similar scheme, but since we only need to keep track 
whether a page is used or not, we only need one bit per page. If the 
amount of free space on pages is interesting for an indexam in the 
future, it can use the heap FSM implementation instead. Or no FSM at 
all, like the hash indexam.


To use the index FSM, the indexam needs to leave every nth page alone, 
like in the heap. The B-tree assumes that the metapage is at block 0, 
but that's also where we would like to store the first index FSM page. 
To overcome that, we can make the index FSM special area a little bit 
smaller, so that the B-tree metadata fits on the same page as the FSM 
information. That will be wasted space on other FSM pages than block 0, 
but we're only talking about 24 bytes per FSM page, and we only need one 
FSM page per ~512 MB of index pages (with default BLCKSZ).



Performance
---

The patch I'm working on currently uses a naive way to find a page in 
the FSM. To find a page with X bytes of free space, it scans the FSM 
pages until one is found. And it always starts the scan from the 
beginning, which is far from optimal. And when there's page with enough 
free space, it still needs to scan all FSM pages just to find out that 
we need to extend the relation.


To speed things up, we're going to need some mechanism to avoid that. 
First of all, we need to somehow cache the information that there's no 
page with = X bytes left, to avoid fruitless scanning. To speed up the 
case when there's only a few pages with enough free space, we can keep a 
limited size list of such pages in addition to the map.


These information needs to be in shared memory, either on heap pages 
like the FSM pages and managed by the buffer manager, or in a separate 
shmem block. I would like to go with normal bufmgr managed pages, as 
fixed-sized memory blocks have their problems, and the cached 
information should be stored to disk as well.


Let's have one special page in the heap file, called the Free Space List 
(FSL) page, in addition to the normal FSM pages. It has the following 
structure:


struct {
  bit anypages[256]

 struct {
BlockNumber blockno;
uint8 freespace;
  } freespacelist[as large as fits on page]
}

Remember that we track the free space on each page using one byte, IOW, 
each page falls into one of 256 buckets of free space. In the anypages 
bitmap, we have one bit per bucket indicating is there any pages with 
this much free space. When we look for a page with X bytes, we check 
the bits up to the bucket corresponding X bytes, and if there's no set 
bits we know not to bother scanning the FSM pages.


To speed up the scan where there is space, we keep a simple list of 
pages with free space. This list is actually like the current FSM, but 
here we only use it as a small cache of the FSM pages. VACUUM and any 
other operations that update the FSM can put pages to the list when 
there's free slots.


We can store the FSL page on a magic location, say block #100. For 
relations 

Re: [HACKERS] Single table forcing sequential scans on query plans

2008-03-16 Thread Cristian Gafton
On Sun, 16 Mar 2008, Tom Lane wrote:

  I have a weird query execution plan problem I am trying to debug on 
  Postgresql 8.2.6. I have a query that joins against a temporary table that 
  has very few rows.
 
 Is it possible that the temp table ever has exactly zero rows?

Ah, that is indeed a possibility. If I am to understand correctly, there is 
no way to represent the difference between an un-analyzed table and a 
zero-sized analyzed table as far as the query planner is concerned? Looks 
like I'll have to do a select count(*) before running query to avoid 
entering this trap. (That feels a bit suboptimal since the conary repository 
code does extensive work with/through temporary tables, and this could very 
well end up not being the only section affected...)

 That's entirely the wrong way to think about it.  The planner is
 choosing a good plan based on its estimates of table sizes, which
 are wildly different in the two cases:
 
  -  Seq Scan on tmpinstanceid  (cost=0.00..1.02 rows=2 width=8) 
  (actual time=0.005..0.007 rows=2 loops=1)
 
  -  Seq Scan on tmpinstanceid  (cost=0.00..29.40 rows=1940 width=8)

In this particular case it would be nice if there would be a differentiation 
between estimate size 0 and estimate size unknown.

 The only idea I have for how the planner could ignore a previous
 analyze result is if the analyze found the table to be of zero size.
 Then the heuristic would still be applied because relpages == 0.

For now I will try to run with the assumption that the massive sequential 
scans are caused by joing an empty table in the query and try to work my way 
around it - unless there is some trick to tell the planner that this is a 
query that would be much better optimized away instead of causing a massive 
IO storm.

Thanks,

Cristian
-- 
Cristian Gafton
rPath, Inc.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-16 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I've started working on revamping Free Space Map, using the approach 
 where we store a map of heap pages on every nth heap page. What we need 
 now is discussion on the details of how exactly it should work.

You're cavalierly waving away a whole boatload of problems that will
arise as soon as you start trying to make the index AMs play along
with this :-(.  Hash for instance has very narrow-minded ideas about
page allocation within its indexes.

Also, I don't think that use the special space will scale to handle
other kinds of maps such as the proposed dead space map.  (This is
exactly why I said the other day that we need a design roadmap for all
these ideas.)

The idea that's becoming attractive to me while contemplating the
multiple-maps problem is that we should adopt something similar to
the old Mac OS idea of multiple forks in a relation.  In addition
to the main data fork which contains the same info as now, there could
be one or more map forks which are separate files in the filesystem.
They are named by relfilenode plus an extension, for instance a relation
with relfilenode NNN would have a data fork in file NNN (plus perhaps
NNN.1, NNN.2, etc) and a map fork named something like NNN.map (plus
NNN.map.1 etc as needed).  We'd have to add one more field to buffer
lookup keys (BufferTag) to disambiguate which fork the referenced page
is in.  Having bitten that bullet, though, the idea trivially scales to
any number of map forks with potentially different space requirements
and different locking and WAL-logging requirements.

Another possible advantage is that a new map fork could be added to an
existing table without much trouble.  Which is certainly something we'd
need if we ever hope to get update-in-place working.

The main disadvantage I can see is that for very small tables, the
percentage overhead from multiple map forks of one page apiece is
annoyingly high.  However, most of the point of a map disappears if
the table is small, so we might finesse that by not creating any maps
until the table has reached some minimum size.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single table forcing sequential scans on query plans

2008-03-16 Thread Tom Lane
Cristian Gafton [EMAIL PROTECTED] writes:
 On Sun, 16 Mar 2008, Tom Lane wrote:
 Is it possible that the temp table ever has exactly zero rows?

 Ah, that is indeed a possibility. If I am to understand correctly, there is 
 no way to represent the difference between an un-analyzed table and a 
 zero-sized analyzed table as far as the query planner is concerned?

While thinking about your report I was considering having VACUUM and
ANALYZE always set relpages to at least 1.  Then seeing relpages=0
would indeed indicate a never-analyzed table, whereas relpages=1
when physical table size is zero could be taken to indicate that
we should trust the table to be really empty.  I'm not sure though
whether this sort of convention would confuse any existing code.

Another possibility (though not a back-patchable solution) is that
we could just dispense with the heuristic size estimate and trust a
zero-sized table to stay zero-sized.  This would be relying on the
assumption that autovacuum will kick in and update the stats, leading
to invalidation of any existing plans that assume the table is small.
I don't feel very comfortable about that though --- throwing a few
hundred tuples into a table might not be enough to draw autovacuum's
attention, but it could surely be enough to create a performance
disaster for nestloop plans.

Could you confirm that your problem cases are actually caused by this
effect and not something else?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting Free Space Map

2008-03-16 Thread Alvaro Herrera
Tom Lane wrote:

 The idea that's becoming attractive to me while contemplating the
 multiple-maps problem is that we should adopt something similar to
 the old Mac OS idea of multiple forks in a relation.  In addition
 to the main data fork which contains the same info as now, there could
 be one or more map forks which are separate files in the filesystem.

I think something similar could be used to store tuple visibility bits
separately from heap tuple data itself, so +1 to this idea.

(The rough idea in my head was that you can do an indexscan and look
up visibility bits without having to pull the whole heap along; and
visibility updates are also cheaper, whether they come from indexscans
or heap scans.  Of course, the implicit cost is that a seqscan needs to
fetch the visibility pages, too; and the locking is more complex.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Gregory Stark wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
  The comments  are stored at JS-Kit:
  http://js-kit.com/comments/
  
  It's stored in their server?
 
  Yes, that was the beauty of it --- I just add javascript with a tag and
  all comments are handled by them.
 
 Beauty?  I don't think we want to rely on non-project-controlled
 servers for anything that's part of our core infrastructure.  If/when
 js-kit.com goes belly-up, what happens to that data?  Also, what kind of
 privacy guarantees have we got?  (Admittedly, privacy may be moot for
 information that was originally entered on a public web page, but the
 whole idea of someone else controlling our data just makes me itch.)
 
 I can go along with this as a jury-rig setup for our first commit fest,
 but it just seems like another powerful argument for moving to something
 wiki-based as soon as we can get that sorted.

We don't need these comments for more than a few weeks --- I don't see a
problem with it.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Bruce Momjian
Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  You want the message-id on the listing page?  Sure, I was doing that
  before but I didn't know anyone wanted it and it looked a little
  cluttered.  Let me know.
 
 I agree it was too cluttered for normal use.
 
 What I'm trying to do is get a page which has the message-id's of all the
 messages and the comments on the same page. That way I can dump the data into
 a text file to experiment with.

Why not grab the mbox file and grep out the message ids?

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Gregory Stark wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
  The comments  are stored at JS-Kit:
  http://js-kit.com/comments/
  
  It's stored in their server?
 
  Yes, that was the beauty of it --- I just add javascript with a tag and
  all comments are handled by them.
 
 Beauty?  I don't think we want to rely on non-project-controlled
 servers for anything that's part of our core infrastructure.  If/when
 js-kit.com goes belly-up, what happens to that data?  Also, what kind of
 privacy guarantees have we got?  (Admittedly, privacy may be moot for
 information that was originally entered on a public web page, but the
 whole idea of someone else controlling our data just makes me itch.)
 
 I can go along with this as a jury-rig setup for our first commit fest,
 but it just seems like another powerful argument for moving to something
 wiki-based as soon as we can get that sorted.

We could move to a wiki if someone finds out how to dump emails into a
wiki, or if we decide to be more structured in our methods, like having
separate URLs for bugs, feature requests, and patches, and doing all
activity on those items at those URLs.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Bruce Momjian
Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  You want the message-id on the listing page?  Sure, I was doing that
  before but I didn't know anyone wanted it and it looked a little
  cluttered.  Let me know.
 
 I agree it was too cluttered for normal use.
 
 What I'm trying to do is get a page which has the message-id's of all the
 messages and the comments on the same page. That way I can dump the data into
 a text file to experiment with.

Oh, what I would really like is to be able to pull up
archives.postgresql.org emails based on message id so I can link to the
entire thread.  Unfortunately, it doesn't work there, nor does Google or
any of the other Postgres email archive sites.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I can go along with this as a jury-rig setup for our first commit fest,
 but it just seems like another powerful argument for moving to something
 wiki-based as soon as we can get that sorted.

 We could move to a wiki if someone finds out how to dump emails into a
 wiki, or if we decide to be more structured in our methods, like having
 separate URLs for bugs, feature requests, and patches, and doing all
 activity on those items at those URLs.

I don't think we want to dump emails into a wiki.  What we want is a
page with links into the mail archives plus commentary.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  I can go along with this as a jury-rig setup for our first commit fest,
  but it just seems like another powerful argument for moving to something
  wiki-based as soon as we can get that sorted.
 
  We could move to a wiki if someone finds out how to dump emails into a
  wiki, or if we decide to be more structured in our methods, like having
  separate URLs for bugs, feature requests, and patches, and doing all
  activity on those items at those URLs.
 
 I don't think we want to dump emails into a wiki.  What we want is a
 page with links into the mail archives plus commentary.

So the thread titles get put into a wiki that supports comments with
URL links to our archives?

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New style of hash join proposal

2008-03-16 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 We currently execute a lot of joins as Nested Loops which would be more
 efficient if we could batch together all the outer keys and execute a single
 inner bitmap index scan for all of them together.

Please give an example of what you're talking about that you think we
can't do now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single table forcing sequential scans on query plans

2008-03-16 Thread Cristian Gafton
On Sun, 16 Mar 2008, Tom Lane wrote:

  Ah, that is indeed a possibility. If I am to understand correctly, there is 
  no way to represent the difference between an un-analyzed table and a 
  zero-sized analyzed table as far as the query planner is concerned?
 
 While thinking about your report I was considering having VACUUM and
 ANALYZE always set relpages to at least 1.  Then seeing relpages=0
 would indeed indicate a never-analyzed table, whereas relpages=1
 when physical table size is zero could be taken to indicate that
 we should trust the table to be really empty.  I'm not sure though
 whether this sort of convention would confuse any existing code.

If having a discrepancy between relpages and table size is a concern,
could relpages be a negative value to mark a non-analyzed table?

 Another possibility (though not a back-patchable solution) is that
 we could just dispense with the heuristic size estimate and trust a
 zero-sized table to stay zero-sized.  This would be relying on the

I think improving the estimator would get us further, since in most cases it 
seems to get it relatively right.

 Could you confirm that your problem cases are actually caused by this
 effect and not something else?

Yes, confirmed. The runaway queries all are joining against an empty 
temporary table.

Thanks,

Cristian
-- 
Cristian Gafton
rPath, Inc.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest?

2008-03-16 Thread Greg Smith

On Sun, 16 Mar 2008, Bruce Momjian wrote:

Oh, what I would really like is to be able to pull up 
archives.postgresql.org emails based on message id so I can link to the 
entire thread.  Unfortunately, it doesn't work there, nor does Google or 
any of the other Postgres email archive sites.


This is something I've been looking into my own organization.  The message 
ids are at the start of the archive web pages.  For example your e-mail 
here I'm replying to begins like this if you look at the page source:


http://archives.postgresql.org/pgsql-hackers/2008-03/msg00554.php
!-- MHonArc v2.6.16 --
!--X-Subject: Re: Commit fest? --
!--X-From-R13: Pehpr [bzwvna oehprNzbzwvna.hf --
!--X-Date: Sun, 16 Mar 2008 23:19:33 #45;0300 (ADT) --
!--X-Message-Id: [EMAIL PROTECTED] --
!--X-Content-Type: text/plain --
!--X-Reference: [EMAIL PROTECTED] --
!--X-Head-End--

I was thinking of writing something that scraped the archives building a 
lookup table out of this information.  What would be nice is if the 
X-Message-Id and X-Reference were both put into the regular HTML for 
future archived messages so that it's more likely tools like Google could 
search based on them.  A brief glance at the MHonArc documentation 
suggests that could be run to re-covert any existing messages that are 
still available in order to add to those even.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Remove hacks for old bad qsort() implementations?

2008-03-16 Thread Tom Lane
There are several places in tuplesort.c (and perhaps elsewhere) where
we explicitly work around limitations of various platforms' qsort()
functions.  Notably, there's this bit in comparetup_index_btree

/*
 * If key values are equal, we sort on ItemPointer.  This does not affect
 * validity of the finished index, but it offers cheap insurance against
 * performance problems with bad qsort implementations that have trouble
 * with large numbers of equal keys.
 */

which I unquestioningly copied into comparetup_index_hash yesterday.
However, oprofile is telling me that doing this is costing
*significantly* more than just returning zero would do:

  9081  0.3050 :tuple1 = (IndexTuple) a-tuple;
  3759  0.1263 :tuple2 = (IndexTuple) b-tuple;
   :
   :{
130409  4.3800 :BlockNumber blk1 = 
ItemPointerGetBlockNumber(tuple1-t_tid);
 34539  1.1601 :BlockNumber blk2 = 
ItemPointerGetBlockNumber(tuple2-t_tid);
   :
  3281  0.1102 :if (blk1 != blk2)
   812  0.0273 :return (blk1  blk2) ? -1 : 1;
   :}
   :{
28 9.4e-04 :OffsetNumber pos1 = 
ItemPointerGetOffsetNumber(tuple1-t_tid);
 1 3.4e-05 :OffsetNumber pos2 = 
ItemPointerGetOffsetNumber(tuple2-t_tid);
   :
 1 3.4e-05 :if (pos1 != pos2)
 48757  1.6376 :return (pos1  pos2) ? -1 : 1;
   :}
   :
   :return 0;
 56705  1.9045 :}

Looks to me like we're eating more than seven percent of the total
runtime to do this :-(

Now as far as I can see, the original motivation for this (as stated in
the comment) is entirely dead anymore, since we always use our own qsort
implementation in preference to whatever bogus version a given libc
might supply.  What do people think of removing this bit of code in
favor of just returning 0?

I can see a couple of possible objections:

1. Someday we might go back to using platform qsort.  (But surely we
could insist on qsort behaving sanely for equal keys.)

2. If you've got lots of equal keys, it's conceivable that having the
index entries sorted by TID offers some advantage in indexscan speed.
I'm dubious that that's useful, mainly because the planner should prefer
a bitmap scan in such a case; and anyway the ordering is unlikely to
be preserved for long.  But it's something to think about.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [4/4] Proposal of SE-PostgreSQL patches

2008-03-16 Thread Kohei KaiGai
[4/4] - sepostgresql-policy-8.4devel-3.patch

This patch gives us the default security policy for SE-PostgreSQL.
You can build it as a security policy module. It can be linked with
the existing distributor's policy, and reloaded.

-- 
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]
diff -rpNU3 pgace/contrib/sepgsql-policy/Makefile sepgsql/contrib/sepgsql-policy/Makefile
--- pgace/contrib/sepgsql-policy/Makefile	1970-01-01 09:00:00.0 +0900
+++ sepgsql/contrib/sepgsql-policy/Makefile	2008-03-12 20:00:04.0 +0900
@@ -0,0 +1,20 @@
+# SE-PostgreSQL Security Policy
+#--
+
+SHAREDIR := /usr/share/selinux
+
+AWK ?= gawk
+NAME ?= $(strip $(shell $(AWK) -F= '/^SELINUXTYPE/{ print $$2 }' /etc/selinux/config))
+
+SELINUX_POLICY := /usr/share/selinux
+
+all: sepostgresql.pp
+
+install: all
+	install -m 0644 sepostgresql.pp $(SELINUX_POLICY)/$(NAME)
+
+sepostgresql.pp: sepostgresql.te sepostgresql.if sepostgresql.fc
+	make -f $(SELINUX_POLICY)/devel/Makefile NAME=$(NAME)
+
+clean:
+	make -f $(SELINUX_POLICY)/devel/Makefile NAME=$(NAME) clean
diff -rpNU3 pgace/contrib/sepgsql-policy/sepostgresql.fc sepgsql/contrib/sepgsql-policy/sepostgresql.fc
--- pgace/contrib/sepgsql-policy/sepostgresql.fc	1970-01-01 09:00:00.0 +0900
+++ sepgsql/contrib/sepgsql-policy/sepostgresql.fc	2008-03-13 10:21:48.0 +0900
@@ -0,0 +1,17 @@
+#
+# SE-PostgreSQL install path
+#
+/usr/bin/sepostgres		--	gen_context(system_u:object_r:postgresql_exec_t,s0)
+/usr/bin/initdb.sepgsql		--	gen_context(system_u:object_r:postgresql_exec_t,s0)
+/usr/bin/sepg_ctl		--	gen_context(system_u:object_r:initrc_exec_t,s0)
+
+/var/lib/sepgsql(/.*)?			gen_context(system_u:object_r:postgresql_db_t,s0)
+/var/lib/sepgsql/pgstartup\.log		gen_context(system_u:object_r:postgresql_log_t,s0)
+/var/log/sepostgresql\.log.*	--	gen_context(system_u:object_r:postgresql_log_t,s0)
+
+#
+# For source installation
+#
+/usr/local/pgsql/bin/postgres	--	gen_context(system_u:object_r:postgresql_exec_t,s0)
+/usr/local/pgsql/bin/initdb	--	gen_context(system_u:object_r:postgresql_exec_t,s0)
+/usr/local/pgsql/bin/pg_ctl	--	gen_context(system_u:object_r:initrc_exec_t,s0)
diff -rpNU3 pgace/contrib/sepgsql-policy/sepostgresql.if sepgsql/contrib/sepgsql-policy/sepostgresql.if
--- pgace/contrib/sepgsql-policy/sepostgresql.if	1970-01-01 09:00:00.0 +0900
+++ sepgsql/contrib/sepgsql-policy/sepostgresql.if	2008-03-12 20:00:04.0 +0900
@@ -0,0 +1,88 @@
+
+## summary
+##  Marks the specified domain as SE-PostgreSQL server process.
+## /summary
+## param name=domain
+##  summary
+##  Domain to be marked
+##  /summary
+## /param
+#
+interface(`sepgsql_server_domain',`
+	gen_require(`
+		attribute sepgsql_server_type;
+	')
+	typeattribute $1 sepgsql_server_type;
+')
+
+
+## summary
+##  Allow the specified domain unconfined accesses to any database objects
+##  managed by SE-PostgreSQL,
+## /summary
+## param name=domain
+##  summary
+##  Domain allowed access.
+##  /summary
+## /param
+#
+interface(`sepgsql_unconfined_domain',`
+	gen_require(`
+		attribute sepgsql_unconfined_type;
+		attribute sepgsql_client_type;
+	')
+	typeattribute $1 sepgsql_unconfined_type;
+	typeattribute $1 sepgsql_client_type;
+')
+
+
+## summary
+##  Allow the specified domain unprivileged accesses to any database objects
+##  managed by SE-PostgreSQL,
+## /summary
+## param name=domain
+##  summary
+##  Domain allowed access.
+##  /summary
+## /param
+#
+interface(`sepgsql_client_domain',`
+	gen_require(`
+		attribute sepgsql_client_type;
+	')
+	typeattribute $1 sepgsql_client_type;
+')
+
+
+## summary
+##  Allow the specified role to invoke trusted procedures
+## /summary
+## param name=role
+##  summary
+##  The role associated with the domain.
+##  /summary
+## /param
+#
+interface(`sepgsql_trusted_procedure_role',`
+	gen_require(`
+		type sepgsql_trusted_domain_t;
+	')
+	role $1 types sepgsql_trusted_domain_t;
+')
+
+
+## summary
+## Marks as a SE-PostgreSQL loadable shared library module
+## /summary
+## param name=type
+## summary
+## Type marked as a database object type.
+## /summary
+## /param
+#
+interface(`sepgsql_loadable_module',`
+	gen_require(`
+		attribute sepgsql_module_type;
+	')
+	typeattribute $1 sepgsql_module_type;
+')
diff -rpNU3 pgace/contrib/sepgsql-policy/sepostgresql.te sepgsql/contrib/sepgsql-policy/sepostgresql.te
--- pgace/contrib/sepgsql-policy/sepostgresql.te	1970-01-01 09:00:00.0 +0900
+++ sepgsql/contrib/sepgsql-policy/sepostgresql.te	2008-03-12 20:00:04.0 +0900
@@ -0,0 +1,353 @@
+policy_module(sepostgresql, 3.01)
+
+gen_require(`
+class db_database all_db_database_perms;
+class db_table all_db_table_perms;
+ 

[HACKERS] [3/4] Proposal of SE-PostgreSQL patches

2008-03-16 Thread Kohei KaiGai
[3/4] - sepostgresql-pg_dump-8.4devel-3.patch

This patch gives us a feature to dump database with security attribute.
It is turned on with '--enable-selinux' option at pg_dump/pg_dumpall,
when the server works as SE- version.
No need to say, users need to have enough capabilities to dump whole of
database. It it same when they tries to restore the database.

-- 
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]
diff -rpNU3 pgace/src/bin/pg_dump/pg_dump.c sepgsql/src/bin/pg_dump/pg_dump.c
--- pgace/src/bin/pg_dump/pg_dump.c	2008-02-03 01:18:48.0 +0900
+++ sepgsql/src/bin/pg_dump/pg_dump.c	2008-02-03 01:26:35.0 +0900
@@ -118,6 +118,9 @@ static int	g_numNamespaces;
 /* flag to turn on/off dollar quoting */
 static int	disable_dollar_quoting = 0;
 
+/* flag to tuen on/off SE-PostgreSQL support */
+#define SELINUX_SYSATTR_NAME	security_context
+static int enable_selinux = 0;
 
 static void help(const char *progname);
 static void expand_schema_name_patterns(SimpleStringList *patterns,
@@ -267,6 +270,7 @@ main(int argc, char **argv)
 		{disable-dollar-quoting, no_argument, disable_dollar_quoting, 1},
 		{disable-triggers, no_argument, disable_triggers, 1},
 		{use-set-session-authorization, no_argument, use_setsessauth, 1},
+		{enable-selinux, no_argument, enable_selinux, 1},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -419,6 +423,8 @@ main(int argc, char **argv)
 	disable_triggers = 1;
 else if (strcmp(optarg, use-set-session-authorization) == 0)
 	use_setsessauth = 1;
+else if (strcmp(optarg, enable-selinux) == 0)
+	enable_selinux = 1;
 else
 {
 	fprintf(stderr,
@@ -549,6 +555,24 @@ main(int argc, char **argv)
 	std_strings = PQparameterStatus(g_conn, standard_conforming_strings);
 	g_fout-std_strings = (std_strings  strcmp(std_strings, on) == 0);
 
+	if (enable_selinux) {
+		/* confirm whther server support SELinux features */
+		const char *tmp = PQparameterStatus(g_conn, security_sysattr_name);
+
+		if (!tmp) {
+			write_msg(NULL, could not get security_sysattr_name from libpq\n);
+			exit(1);
+		}
+		if (!!strcmp(SELINUX_SYSATTR_NAME, tmp) != 0) {
+			write_msg(NULL, server does not have SELinux feature\n);
+			exit(1);
+		}
+		if (g_fout-remoteVersion  80204) {
+			write_msg(NULL, server version is too old (%u)\n, g_fout-remoteVersion);
+			exit(1);
+		}
+	}
+
 	/* Set the datestyle to ISO to ensure the dump's portability */
 	do_sql_command(g_conn, SET DATESTYLE = ISO);
 
@@ -771,6 +795,7 @@ help(const char *progname)
 	printf(_(  --use-set-session-authorization\n
 			   use SESSION AUTHORIZATION commands instead of\n
 	  ALTER OWNER commands to set ownership\n));
+	printf(_(  --enable-selinuxenable to dump security context in SE-PostgreSQL\n));
 
 	printf(_(\nConnection options:\n));
 	printf(_(  -h, --host=HOSTNAME  database server host or socket directory\n));
@@ -1160,7 +1185,8 @@ dumpTableData_insert(Archive *fout, void
 	if (fout-remoteVersion = 70100)
 	{
 		appendPQExpBuffer(q, DECLARE _pg_dump_cursor CURSOR FOR 
-		  SELECT * FROM ONLY %s,
+		  SELECT * %s FROM ONLY %s,
+		  (!enable_selinux ?  : , SELINUX_SYSATTR_NAME),
 		  fmtQualifiedId(tbinfo-dobj.namespace-dobj.name,
 		 classname));
 	}
@@ -1774,11 +1800,32 @@ dumpBlobComments(Archive *AH, void *arg)
 			Oid			blobOid;
 			char	   *comment;
 
+			blobOid = atooid(PQgetvalue(res, i, 0));
+
+			/* dump security context of binary large object */
+			if (enable_selinux) {
+PGresult	*__res;
+char		query[512];
+
+snprintf(query, sizeof(query),
+		 SELECT lo_get_security(%u), blobOid);
+__res = PQexec(g_conn, query);
+check_sql_result(__res, g_conn, query, PGRES_TUPLES_OK);
+
+if (PQntuples(__res) != 1) {
+	write_msg(NULL, lo_get_security(%u) returns %d tuples\n,
+			  blobOid, PQntuples(__res));
+	exit_nicely();
+}
+archprintf(AH, SELECT lo_set_security(%u, '%s');\n,
+		   blobOid, PQgetvalue(__res, 0, 0));
+PQclear(__res);
+			}
+
 			/* ignore blobs without comments */
 			if (PQgetisnull(res, i, 1))
 continue;
 
-			blobOid = atooid(PQgetvalue(res, i, 0));
 			comment = PQgetvalue(res, i, 1);
 
 			printfPQExpBuffer(commentcmd, COMMENT ON LARGE OBJECT %u IS ,
@@ -2886,6 +2933,7 @@ getTables(int *numTables)
 	int			i_owning_col;
 	int			i_reltablespace;
 	int			i_reloptions;
+	int			i_selinux;
 
 	/* Make sure we are in proper schema */
 	selectSourceSchema(pg_catalog);
@@ -2926,6 +2974,7 @@ getTables(int *numTables)
 		  d.refobjsubid as owning_col, 
 		  (SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, 
 		  array_to_string(c.reloptions, ', ') as reloptions 
+		  %s 
 		  from pg_class c 
 		  left join pg_depend d on 
 		  (c.relkind = '%c' and 
@@ -2935,6 +2984,7 @@ getTables(int *numTables)
 		  where relkind in ('%c', '%c', '%c', '%c') 
 		  order by 

Re: [HACKERS] Single table forcing sequential scans on query plans

2008-03-16 Thread Tom Lane
Cristian Gafton [EMAIL PROTECTED] writes:
 On Sun, 16 Mar 2008, Tom Lane wrote:
 While thinking about your report I was considering having VACUUM and
 ANALYZE always set relpages to at least 1.  Then seeing relpages=0
 would indeed indicate a never-analyzed table, whereas relpages=1
 when physical table size is zero could be taken to indicate that
 we should trust the table to be really empty.  I'm not sure though
 whether this sort of convention would confuse any existing code.

 If having a discrepancy between relpages and table size is a concern,
 could relpages be a negative value to mark a non-analyzed table?

No, the value is really a uint32, though we don't declare it that way
for lack of having any such SQL type :-(.  (uint32)-1 is just as legal
a value as 1, though perhaps a lot less likely.  Anyway, client code
looking at the column is probably more likely to get confused by a
negative value for relpages than by a value that doesn't match
underlying reality (which it can't easily see anyway).

 Could you confirm that your problem cases are actually caused by this
 effect and not something else?

 Yes, confirmed. The runaway queries all are joining against an empty 
 temporary table.

Good, just wanted to be sure.  If there are not objections, I'll
put in the at-least-1 hack.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches

2008-03-16 Thread Kohei KaiGai
It seems to me some of SE-PostgreSQL patches are not delivered yet,
although [3/4] and [4/4] were already done.

Does anti-spam system caught my previous three messages?
If necessary, I will send them again.

Thanks,

Kohei KaiGai wrote:
 The series of patches are the proposal of Security-Enhanced PostgreSQL
 (SE-PostgreSQL) for the upstreamed PostgreSQL 8.4 development cycle.
 
  [1/4] sepostgresql-pgace-8.4devel-3.patch
  provides PGACE (PostgreSQL Access Control Extension) framework
  [2/4] sepostgresql-sepgsql-8.4devel-3.patch
  provides SE-PostgreSQL feature, based on PGACE framework.
  [3/4] sepostgresql-pg_dump-8.4devel-3.patch
  enables pg_dump to dump database with security attribute.
  [4/4] sepostgresql-policy-8.4devel-3.patch
  provides the default security policy for SE-PostgreSQL.
 - snip -

-- 
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [0/4] Proposal of SE-PostgreSQL patches

2008-03-16 Thread Kohei KaiGai
The series of patches are the proposal of Security-Enhanced PostgreSQL
(SE-PostgreSQL) for the upstreamed PostgreSQL 8.4 development cycle.

 [1/4] sepostgresql-pgace-8.4devel-3.patch
 provides PGACE (PostgreSQL Access Control Extension) framework
 [2/4] sepostgresql-sepgsql-8.4devel-3.patch
 provides SE-PostgreSQL feature, based on PGACE framework.
 [3/4] sepostgresql-pg_dump-8.4devel-3.patch
 enables pg_dump to dump database with security attribute.
 [4/4] sepostgresql-policy-8.4devel-3.patch
 provides the default security policy for SE-PostgreSQL.

We can provide a quick overview of SE-PostgreSQL at:
http://code.google.com/p/sepgsql/wiki/WhatIsSEPostgreSQL

Any comment and suggestion are welcome.
Thanks,


ENVIRONMENT
---
Please confirm your environment.
The followings are requriements of SE-PostgreSQL.
 * Fedora 8 or later system
 * SELinux is enabled and working
 * kernel-2.6.24 or later
 * selinux-policy and selinux-policy-devel v3.0.8 or later
 * libselinux, policycoreutils

INSTALLATION

$ tar jxvf postgresql-snapshot.tar.bz2
$ cd postgresql-snapshot
$ patch -p1  ../sepostgresql-pgace-8.4devel-3.patch
$ patch -p1  ../sepostgresql-sepgsql-8.4devel-3.patch
$ patch -p1  ../sepostgresql-pg_dump-8.4devel-3.patch
$ patch -p1  ../sepostgresql-policy-8.4devel-3.patch

$ ./configure --enable-selinux
$ make
$ make -C contrib/sepgsql-policy
$ su
# make install

# /usr/sbin/semodule -i contrib/sepgsql-policy/sepostgresql.pp
  (NOTE: semodule is a utility to load/unload security policy modules.)

# /sbin/restorecon -R /usr/local/pgsql
  (NOTE: restorecon is a utilicy to initialize security context of files.)

SETUP
-
# mkdir -p /opt/sepgsql
# chown foo_user:var_group /opt/sepgsql
# chcon -t postgresql_db_t /opt/sepgsql
  (NOTE: chcon is a utility to set up security context of files.)
# exit

$ /usr/sbin/run_init /usr/local/pgsql/bin/initdb -D /opt/sepgsql
  (NOTE: run_init is a utility to start a program, as if it is branched from 
init script.)
$ /usr/local/pgsql/bin/pg_ctl -D /opt/sepgsql start

-- 
OSS Platform Development Division, NEC
KaiGai Kohei [EMAIL PROTECTED]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers