Re: [HACKERS] [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory

2007-05-08 Thread Peter Eisentraut
Hiroshi Inoue wrote:
 I've not seen your reply yet.

You keep sending your emails to randomly invented addresses, so I don't 
get them.

I think your questions have been answered in the meantime.  If not, 
please address them to me.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Allow use of stable functions with constraint exclusion

2007-05-08 Thread ITAGAKI Takahiro

Tom Lane [EMAIL PROTECTED] wrote:

 Marshall, Steve [EMAIL PROTECTED] writes:
  I have developed a small patch to optimizer/util/plancat.c that 
  eliminates one of hte caveats associated with constraint exclusions, 
  namely the inability to avoid searching tables based on the results of 
  stable functions.
 
 Do you not understand why this is completely unsafe?

I think the proposal itself is very useful, because time-based
partitioning is commonly used and functions like now() or
CURRENT_TIMESTAMP are marked as stable.

I'm not clear why the optimization is unsafe. I'm confused to read the
definition of stable functions in our documentation. Which is required
for stable functions 'stable in a single table scan' or 'stable in a
SQL statements' ? If the latter definition is true, can we use them
in constraint exclusions?

| STABLE indicates that the function cannot modify the database, and
| that within a single table scan it will consistently return the same
| result for the same argument values, but that its result could change
| across SQL statements.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory

2007-05-08 Thread Peter Eisentraut
Am Dienstag, 8. Mai 2007 01:45 schrieb Hiroshi Inoue:
 Must I mail them directly to you in the first place ?

Yes.

 Please note that the question is to your recent sudden change to
 psqlodbc.

These are the same files that psqlodbc has been shipping for years.  Just now 
they are in the CVS repository, to address Tom Lane's concern about being 
able to reproduce the build.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas
Here's my roadmap for the scan-resistant buffer cache and 
synchronized scans patches.


1. Fix the current vacuum behavior of throwing dirty buffers to the 
freelist, forcing a lot of WAL flushes. Instead, use a backend-private 
ring of shared buffers that are recycled. This is what Simon's 
scan-resistant buffer manager did.


The theory here is that if a page is read in by vacuum, it's unlikely to 
be accessed in the near future, therefore it should be recycled. If 
vacuum doesn't dirty the page, it's best to reuse the buffer immediately 
for the next page. However, if the buffer is dirty (and not just because 
we set hint bits), we ought to delay writing it to disk until the 
corresponding WAL record has been flushed to disk.


Simon's patch used a fixed size ring of buffers that are recycled, but I 
think the ring should be dynamically sized. Start with a small ring, and 
whenever you need to do a WAL flush to write a dirty buffer, increase 
the ring size. On every full iteration through the ring, decrease its 
size to trim down an unnecessarily large ring.


This only alters the behavior of vacuums, and it's pretty safe to say it 
won't get worse than what we have now. In the future, we can use the 
buffer ring for seqscans as well; more on that on step 3.


2. Implement the list/table of last/ongoing seq scan positions. This is 
Jeff's synchronized scans patch. When a seq scan starts on a table 
larger than some threshold, it starts from where the previous seq scan 
is currently, or where it ended. This will synchronize the scans so that 
for two concurrent scans the total I/O is halved in the best case. There 
should be no other effect on performance.


If you have a partitioned table, or union of multiple tables or any 
other plan where multiple seq scans are performed in arbitrary order, 
this change won't change the order the partitions are scanned and won't 
therefore ensure they will be synchronized.



Now that we have both pieces of the puzzle in place, it's time to 
consider what more we can do with them:



3A. To take advantage of the cache trail of a previous seq scan, scan 
backwards from where the previous seq scan ended, until a you hit a 
buffer that's not in cache.


This will allow taking advantage of the buffer cache even if the table 
doesn't fit completely in RAM. That can make a big difference if the 
table size is just slightly bigger than RAM, and can avoid the nasty 
surprise when a table grows beyond RAM size and queries start taking 
minutes instead of seconds.


This should be a non-controversial change on its own from performance 
point of view. No query should get slower, and some will become faster. 
But see step 3B:


3B. Currently, sequential scans on a large table spoils the buffer cache 
by evicting other pages from the cache. In CVS HEAD, as soon as the 
table is larger than shared_buffers, the pages in the buffer won't be 
used to speed up running the same query again, and there's no reason to 
believe the pages read in would be more useful than any other page in 
the database, and in particular the pages that were in the buffer cache 
before the huge seq scan. If the table being scanned is  5 * 
shared_buffers, the scan will evict every other page from the cache if 
there's no other activity in the database (max usage_count is 5).


If the table is much larger than shared_buffers, say 10 times as large, 
even with the change 3B to read the pages that are in cache first, using 
all shared_buffers to cache the table will only speed up the query by 
10%. We should not spoil the cache for such a small gain, and use the 
local buffer ring strategy instead. It's better to make queries that are 
slow anyway a little bit slower, than making queries that are normally 
really fast, slow.



As you may notice, 3A and 3B are at odds with each other. We can 
implement both, but you can't use both strategies in the same scan. 
Therefore we need to have decision logic of some kind to figure out 
which strategy is optimal.


A simple heuristic is to decide based on the table size:

 0.1*shared_buffers - start from page 0, keep in cache (like we do now)
 5 * shared_buffers - start from last read page, keep in cache
 5 * shared_buffers - start from last read page, use buffer ring

I'm not sure about the constants, we might need to make them GUC 
variables as Simon argued, but that would be the general approach.


In the future, I'm envisioning a smarter algorithm to size the local 
buffer ring. Take all buffers with usage_count=0, plus a few with 
usage_count=1 from the clock sweep. That way if there's a lot of buffers 
in the buffer cache that are seldomly used, we'll use more buffers to 
cache the large scan, and vice versa. And no matter how large the scan, 
it wouldn't blow all buffers from the cache. But if you execute the same 
query again, the buffers left in the cache from the last scan were 
apparently useful, so we use a bigger ring this time.


I'm going to do this 

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Luke Lonergan
Heikki,

On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc)
implement dynamic I/O caching.  The experiments have shown that benefit
of re-using PG buffer cache on large sequential scans is vanishingly
small when the buffer cache size is small compared to the system memory.
Since this is a normal and recommended situation (OS I/O cache is
auto-tuning and easy to administer, etc), IMO the effort to optimize
buffer cache reuse for seq scans  1 x buffer cache size is not
worthwhile.

On 3B: The scenario described is multiple readers seq scanning large
table and sharing bufcache, but in practice this is not a common
situation.  The common situation is multiple queries joining several
small tables to one or more large tables that are  1 x bufcache.  In
the common scenario, the dominant factor is the ability to keep the
small tables in bufcache (or I/O cache for that matter) while running
the I/O bound large table scans as fast as possible.

To that point - an important factor in achieving max I/O rate for large
tables ( 1 x bufcache) is avoiding the pollution of the CPU L2 cache.
This is commonly in the range of 512KB - 2MB, which is only important
when considering a bound on the size of the ring buffer.  The effect has
been demonstrated to be significant - in the 20%+ range.  Another thing
to consider is the use of readahead inside the heapscan, in which case
sizes = 32KB are very effective.

We've implemented both ideas (ring buffer, readahead) and see very
significant improvements in I/O and query speeds on DSS workloads.  I
would expect benefits on OLTP as well.

The modifications you suggest here may not have the following
properties:
- don't pollute bufcache for seqscan of tables  1 x bufcache
- for tables  1 x bufcache use a ringbuffer for I/O that is ~ 32KB to
minimize L2 cache pollution

- Luke

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Heikki Linnakangas
 Sent: Tuesday, May 08, 2007 3:40 AM
 To: PostgreSQL-development
 Cc: Jeff Davis; Simon Riggs
 Subject: [HACKERS] Seq scans roadmap
 
 Here's my roadmap for the scan-resistant buffer cache and 
 synchronized scans patches.
 
 1. Fix the current vacuum behavior of throwing dirty buffers 
 to the freelist, forcing a lot of WAL flushes. Instead, use a 
 backend-private ring of shared buffers that are recycled. 
 This is what Simon's scan-resistant buffer manager did.
 
 The theory here is that if a page is read in by vacuum, it's 
 unlikely to be accessed in the near future, therefore it 
 should be recycled. If vacuum doesn't dirty the page, it's 
 best to reuse the buffer immediately for the next page. 
 However, if the buffer is dirty (and not just because we set 
 hint bits), we ought to delay writing it to disk until the 
 corresponding WAL record has been flushed to disk.
 
 Simon's patch used a fixed size ring of buffers that are 
 recycled, but I think the ring should be dynamically sized. 
 Start with a small ring, and whenever you need to do a WAL 
 flush to write a dirty buffer, increase the ring size. On 
 every full iteration through the ring, decrease its size to 
 trim down an unnecessarily large ring.
 
 This only alters the behavior of vacuums, and it's pretty 
 safe to say it won't get worse than what we have now. In the 
 future, we can use the buffer ring for seqscans as well; more 
 on that on step 3.
 
 2. Implement the list/table of last/ongoing seq scan 
 positions. This is Jeff's synchronized scans patch. When a 
 seq scan starts on a table larger than some threshold, it 
 starts from where the previous seq scan is currently, or 
 where it ended. This will synchronize the scans so that for 
 two concurrent scans the total I/O is halved in the best 
 case. There should be no other effect on performance.
 
 If you have a partitioned table, or union of multiple tables 
 or any other plan where multiple seq scans are performed in 
 arbitrary order, this change won't change the order the 
 partitions are scanned and won't therefore ensure they will 
 be synchronized.
 
 
 Now that we have both pieces of the puzzle in place, it's 
 time to consider what more we can do with them:
 
 
 3A. To take advantage of the cache trail of a previous seq 
 scan, scan 
 backwards from where the previous seq scan ended, until a you hit a 
 buffer that's not in cache.
 
 This will allow taking advantage of the buffer cache even if 
 the table 
 doesn't fit completely in RAM. That can make a big difference if the 
 table size is just slightly bigger than RAM, and can avoid the nasty 
 surprise when a table grows beyond RAM size and queries start taking 
 minutes instead of seconds.
 
 This should be a non-controversial change on its own from performance 
 point of view. No query should get slower, and some will 
 become faster. 
 But see step 3B:
 
 3B. Currently, sequential scans on a large table spoils the 
 buffer cache 
 by evicting other pages from the cache. In CVS HEAD, as 

Re: [HACKERS] Allow use of immutable functions operating on constants with constraint exclusion

2007-05-08 Thread Marshall, Steve

ITAGAKI Takahiro wrote:


Tom Lane [EMAIL PROTECTED] wrote:

 


Marshall, Steve [EMAIL PROTECTED] writes:
   

I have developed a small patch to optimizer/util/plancat.c that 
eliminates one of hte caveats associated with constraint exclusions, 
namely the inability to avoid searching tables based on the results of 
stable functions.
 


Do you not understand why this is completely unsafe?
   



I think the proposal itself is very useful, because time-based
partitioning is commonly used and functions like now() or
CURRENT_TIMESTAMP are marked as stable.

I'm not clear why the optimization is unsafe. I'm confused to read the
definition of stable functions in our documentation. Which is required
for stable functions 'stable in a single table scan' or 'stable in a
SQL statements' ? If the latter definition is true, can we use them
in constraint exclusions?

| STABLE indicates that the function cannot modify the database, and
| that within a single table scan it will consistently return the same
| result for the same argument values, but that its result could change
| across SQL statements
 

The lack of safety comes from prepared statements.  If the above 
optimization was used, the value of the stable function would be used 
when the statement was prepared, and the query plan would then be set 
using the stable function value as though it were a constant.  For 
partitioned tables, this could result in a failure to scan tables needed 
to meet the query constraints.


I think the optimization could work if you could exclude prepared 
statements.  However, I looked at the planning code and found no clear 
way to distinguish between a statement being prepared for later 
execution, and a statement being planned only for immediate execution.   
As a result, I don't think stable functions can (or should) be expanded 
to help optimize queries using constraint exclusion. 

However, I think it would be possible to expand immutable functions 
operating on constants to optimize constraint exclusion.  Immutable 
functions will always return the same result given the same inputs, so 
this would be safe.  Currently, immutable functions are not expanded 
during planning such that the first query would be optimized using 
constraint exclusion, while the second query would not:


SELECT * FROM test_bulletins WHERE created_at  '2006-09-09 
05:00:00+00'::timestamptz;
SELECT * FROM test_bulletins WHERE created_at  '2006-09-09 
05:00:00+00'::timestamptz + '0 days'::interval;


See the attached SQL file for table creation and other SQL examples.

The real question here is if the optimization is worth the effort.  
Personally, I commonly use queries of this sort, and so would be in 
favor of this expansion of immutable functions operating on constrats as 
an optimization.  I find it convenient to use the database to do the 
time manipulation (e.g. adding intervals to timestamps).  However, the 
logic to manipulate times can be pushed into application code if need 
be.   I've found I have to do a lot of explaining to developers as to 
why two queries that look so similar perform very differently.




--
--  Create and connect to the test database
--
CREATE DATABASE test_ce_db;
ALTER DATABASE test_ce_db OWNER TO postgres;

\connect test_ce_db;

--
--  Make a parent table and three child tables partitioned by time using 
created_at column.
--
CREATE TABLE test_bulletins (
created_at timestamp with time zone NOT NULL,
data text NOT NULL
);

CREATE TABLE test_bulletins_20060908 (CONSTRAINT time_partition_limits CHECK 
(((created_at = '2006-09-08 00:00:00+00'::timestamp with time zone) AND 
(created_at  '2006-09-09 00:00:00+00'::timestamp with time zone)))
)
INHERITS (test_bulletins);

CREATE TABLE test_bulletins_20060909 (CONSTRAINT time_partition_limits CHECK 
(((created_at = '2006-09-09 00:00:00+00'::timestamp with time zone) AND 
(created_at  '2006-09-10 00:00:00+00'::timestamp with time zone)))
)
INHERITS (test_bulletins);

CREATE TABLE test_bulletins_20060910 (CONSTRAINT time_partition_limits CHECK 
(((created_at = '2006-09-10 00:00:00+00'::timestamp with time zone) AND 
(created_at  '2006-09-11 00:00:00+00'::timestamp with time zone)))
)
INHERITS (test_bulletins);

--
--  Setup environment for queries. 
--
SET constraint_exclusion = on;
\pset footer off;

--
--  Do test case queries.
--
SELECT 'This query should avoid use of table test_bulletins_20060908, even with 
OLD CE code' as Test case 1;
EXPLAIN SELECT * FROM test_bulletins WHERE created_at  '2006-09-09 
05:00:00+00'::timestamptz;

SELECT 'This query should avoid use of table test_bulletins_20060908, ONLY with 
NEW CE code' as Test case 2;
EXPLAIN SELECT * FROM test_bulletins WHERE created_at  '2006-09-09 
05:00:00+00'::timestamptz + '0 days'::interval;

SELECT 'This query uses a stable function; it should NOT be optimized' as Test 
case 3;
EXPLAIN SELECT * FROM test_bulletins WHERE created_at  now() - (now() - 
'2006-09-09 05:00:00+00'::timestamptz);


Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas

Luke Lonergan wrote:

On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc)
implement dynamic I/O caching.  The experiments have shown that benefit
of re-using PG buffer cache on large sequential scans is vanishingly
small when the buffer cache size is small compared to the system memory.
Since this is a normal and recommended situation (OS I/O cache is
auto-tuning and easy to administer, etc), IMO the effort to optimize
buffer cache reuse for seq scans  1 x buffer cache size is not
worthwhile.


That's interesting. Care to share the results of the experiments you 
ran? I was thinking of running tests of my own with varying table sizes.


The main motivation here is to avoid the sudden drop in performance when 
a table grows big enough not to fit in RAM. See attached diagram for 
what I mean. Maybe you're right and the effect isn't that bad in practice.


I'm thinking of attacking 3B first anyway, because it seems much simpler 
to implement.



On 3B: The scenario described is multiple readers seq scanning large
table and sharing bufcache, but in practice this is not a common
situation.  The common situation is multiple queries joining several
small tables to one or more large tables that are  1 x bufcache.  In
the common scenario, the dominant factor is the ability to keep the
small tables in bufcache (or I/O cache for that matter) while running
the I/O bound large table scans as fast as possible.


How is that different from what I described?


To that point - an important factor in achieving max I/O rate for large
tables ( 1 x bufcache) is avoiding the pollution of the CPU L2 cache.
This is commonly in the range of 512KB - 2MB, which is only important
when considering a bound on the size of the ring buffer.  The effect has
been demonstrated to be significant - in the 20%+ range.  Another thing
to consider is the use of readahead inside the heapscan, in which case
sizes = 32KB are very effective.


Yeah I remember the discussion on the L2 cache a while back.

What do you mean with using readahead inside the heapscan? Starting an 
async read request?



The modifications you suggest here may not have the following
properties:
- don't pollute bufcache for seqscan of tables  1 x bufcache
- for tables  1 x bufcache use a ringbuffer for I/O that is ~ 32KB to
minimize L2 cache pollution


So the difference is that you don't want 3A (the take advantage of pages 
already in buffer cache) strategy at all, and want the buffer ring 
strategy to kick in earlier instead. Am I reading you correctly?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
inline: seqscan-caching.png
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Luke Lonergan
Heikki,

 That's interesting. Care to share the results of the 
 experiments you ran? I was thinking of running tests of my 
 own with varying table sizes.

Yah - it may take a while - you might get there faster.

There are some interesting effects to look at between I/O cache
performance and PG bufcache, and at those speeds the only tool I've
found that actually measures scan rate in PG is VACUUM.  SELECT
COUNT(*) measures CPU consumption in the aggregation node, not scan
rate.

Note that the copy from I/O cache to PG bufcache is where the L2 effect
is seen.
 
 The main motivation here is to avoid the sudden drop in 
 performance when a table grows big enough not to fit in RAM. 
 See attached diagram for what I mean. Maybe you're right and 
 the effect isn't that bad in practice.

There are going to be two performance drops, first when the table
doesn't fit into PG bufcache, the second when it doesn't fit in bufcache
+ I/O cache.  The second is severe, the first is almost insignificant
(for common queries).
 
 How is that different from what I described?

My impression of your descriptions is that they overvalue the case where
there are multiple scanners of a large ( 1x bufcache) table such that
they can share the first load of the bufcache, e.g. your 10% benefit
for table = 10x bufcache argument.  I think this is a non-common
workload, rather there are normally many small tables and several large
tables such that sharing the PG bufcache is irrelevant to the query
speed.

 Yeah I remember the discussion on the L2 cache a while back.
 
 What do you mean with using readahead inside the heapscan? 
 Starting an async read request?

Nope - just reading N buffers ahead for seqscans.  Subsequent calls use
previously read pages.  The objective is to issue contiguous reads to
the OS in sizes greater than the PG page size (which is much smaller
than what is needed for fast sequential I/O).
 
  The modifications you suggest here may not have the following
  properties:
  - don't pollute bufcache for seqscan of tables  1 x bufcache
  - for tables  1 x bufcache use a ringbuffer for I/O that 
 is ~ 32KB to 
  minimize L2 cache pollution
 
 So the difference is that you don't want 3A (the take 
 advantage of pages already in buffer cache) strategy at all, 
 and want the buffer ring strategy to kick in earlier instead. 
 Am I reading you correctly?

Yes, I think the ring buffer strategy should be used when the table size
is  1 x bufcache and the ring buffer should be of a fixed size smaller
than L2 cache (32KB - 128KB seems to work well).

- Luke


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas

Luke Lonergan wrote:
What do you mean with using readahead inside the heapscan? 
Starting an async read request?


Nope - just reading N buffers ahead for seqscans.  Subsequent calls use
previously read pages.  The objective is to issue contiguous reads to
the OS in sizes greater than the PG page size (which is much smaller
than what is needed for fast sequential I/O).


Are you filling multiple buffers in the buffer cache with a single 
read-call? The OS should be doing readahead for us anyway, so I don't 
see how just issuing multiple ReadBuffers one after each other helps.



Yes, I think the ring buffer strategy should be used when the table size
is  1 x bufcache and the ring buffer should be of a fixed size smaller
than L2 cache (32KB - 128KB seems to work well).


I think we want to let the ring grow larger than that for updating 
transactions and vacuums, though, to avoid the WAL flush problem.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Allow use of immutable functions operating on constants with constraint exclusion

2007-05-08 Thread Martijn van Oosterhout
On Tue, May 08, 2007 at 08:08:28AM -0400, Marshall, Steve wrote:
 However, I think it would be possible to expand immutable functions 
 operating on constants to optimize constraint exclusion.  Immutable 
 functions will always return the same result given the same inputs, so 
 this would be safe.  Currently, immutable functions are not expanded 
 during planning such that the first query would be optimized using 
 constraint exclusion, while the second query would not:

The problem with this is that at planning time you don't necessarily
have an active transaction snapshot. Prepared statements are the
obvious example, but I think even in one-off statements there's no
snapshot until after the planner has completed. This is also one of the
problems with type input/output functions looking up stuff in tables.

There was discussion about the handling type input/output functins and
casts as a sort off InitExpr that is executed once, then inserted into
the tree. However, that would still be too late to affect the planning.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[HACKERS] Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory

2007-05-08 Thread Hiroshi Inoue
Peter Eisentraut wrote:
 Am Dienstag, 8. Mai 2007 01:45 schrieb Hiroshi Inoue:
 Must I mail them directly to you in the first place ?
 
 Yes.

Oh I seem to have been apart from the community too long.
Could you please tell me where I can find the rule ?

regards,
HIroshi Inoue

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Allow use of immutable functions operating on constants with constraint exclusion

2007-05-08 Thread Tom Lane
Marshall, Steve [EMAIL PROTECTED] writes:
 Currently, immutable functions are not expanded 
 during planning

Incorrect, see eval_const_expressions().

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Feature lists for 8.3 and 8.4

2007-05-08 Thread Abraham, Danny
Hi,

I am  migrating from Sybase to Postgres.

trying to get a hold on the issue of future releases feature list.

1. Where can I find the feature list for 8.3 ? When is it expected?
2. And for 8.4?
3. Who is the guy to ask about a feature like startup migrate in ORACLE?

Thanks

Danny







---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Feature lists for 8.3 and 8.4

2007-05-08 Thread Lukas Kahwe Smith

Abraham, Danny wrote:

Hi,

I am  migrating from Sybase to Postgres.

trying to get a hold on the issue of future releases feature list.

1. Where can I find the feature list for 8.3 ? When is it expected?
2. And for 8.4?


There are various places to look for this kind of information, but 
probably the easiest to digest is looking at the todo lists on the wiki:

http://developer.postgresql.org/index.php/Todo:Contents

regards,
Lukas

PS: Since this is a fairly new effort, if you have any hints on how to 
improve things, let us know.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Managing the community information stream

2007-05-08 Thread Andrew Sullivan
On Mon, May 07, 2007 at 07:36:55AM -0500, Jim Nasby wrote:

 Instead, if all feature requests are tracked then users can vote on  
 what's most important to them.

I am sympathetic to the issues you and Andrew are describing (I
understand Bruce's stream analogy, but I think Andrew is right that
from the user's point of view, it's not usable).  But I am not
convinced that users voting on desired features will get us the
users' desired features.  The features we get are mostly the features
that have attracted developers.  The method by which that attraction
happens is interesting, but I don't think it's democratic.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Information security isn't a technological problem.  It's an economics
problem.
--Bruce Schneier

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory

2007-05-08 Thread Peter Eisentraut
Am Dienstag, 8. Mai 2007 15:12 schrieb Hiroshi Inoue:
 Oh I seem to have been apart from the community too long.
 Could you please tell me where I can find the rule ?

The only rule there is is that if you want to talk to person X, you write to 
person X.  That rule is as old as communications.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Zeugswetter Andreas ADI SD

 Nope - just reading N buffers ahead for seqscans.  Subsequent 
 calls use previously read pages.  The objective is to issue 
 contiguous reads to the OS in sizes greater than the PG page 
 size (which is much smaller than what is needed for fast 
 sequential I/O).

Problem here is that eighter you issue the large read into throwaway
private memory and hope that when you later read 8k you get the page
from OS buffercache, or you need ScatterGather IO and a way to grab 32
buffers at once.

 Yes, I think the ring buffer strategy should be used when the 
 table size is  1 x bufcache and the ring buffer should be of 
 a fixed size smaller than L2 cache (32KB - 128KB seems to work well).

How do you merge those two objectives? It seems the ring buffer needs to
be at least as large as the contiguous read size.
Thus you would need at least 256k ring buffer. Better yet have twice the
IO size as ring buffer size, so two sessions can alternately take the
lead while the other session still blocks a prev page. Modern L2 cache
is 8 Mb, so 512k seems no problem ?

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Zeugswetter Andreas ADI SD

  What do you mean with using readahead inside the heapscan? 
  Starting an async read request?
  
  Nope - just reading N buffers ahead for seqscans.  Subsequent calls 
  use previously read pages.  The objective is to issue 
 contiguous reads 
  to the OS in sizes greater than the PG page size (which is much 
  smaller than what is needed for fast sequential I/O).
 
 Are you filling multiple buffers in the buffer cache with a 
 single read-call?

yes, needs vector or ScatterGather IO.
 
 The OS should be doing readahead for us 
 anyway, so I don't see how just issuing multiple ReadBuffers 
 one after each other helps.

Last time I looked OS readahead was only comparable to 32k blocked
reads.
256k blocked reads still perform way better. Also when the OS is
confronted
with an IO storm the 256k reads perform way better than OS readahead.

Andreas

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Managing the community information stream

2007-05-08 Thread Andrew Dunstan



Andrew Sullivan wrote:

On Mon, May 07, 2007 at 07:36:55AM -0500, Jim Nasby wrote:

  
Instead, if all feature requests are tracked then users can vote on  
what's most important to them.



I am sympathetic to the issues you and Andrew are describing (I
understand Bruce's stream analogy, but I think Andrew is right that
from the user's point of view, it's not usable).  But I am not
convinced that users voting on desired features will get us the
users' desired features.  The features we get are mostly the features
that have attracted developers.  The method by which that attraction
happens is interesting, but I don't think it's democratic.


  


Getting votes might provide a useful point of information, not a way of 
making decisions, though.


I certainly don't regard it as a must-have feature.

cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Managing the community information stream

2007-05-08 Thread Jim Nasby

On May 8, 2007, at 9:50 AM, Andrew Sullivan wrote:

On Mon, May 07, 2007 at 07:36:55AM -0500, Jim Nasby wrote:

Instead, if all feature requests are tracked then users can vote on
what's most important to them.


I am sympathetic to the issues you and Andrew are describing (I
understand Bruce's stream analogy, but I think Andrew is right that
from the user's point of view, it's not usable).  But I am not
convinced that users voting on desired features will get us the
users' desired features.  The features we get are mostly the features
that have attracted developers.  The method by which that attraction
happens is interesting, but I don't think it's democratic.


It may... it may not. If a high-demand feature sits around long  
enough it could well attract someone capable of working on it, but  
who isn't a current contributor. Or it could attract a bounty.


I'm also not sure if PostgreSQL is quite the same as other OSS  
projects. My impression is that we have quite a few developers who no  
longer do much if any database development (ie: they're not serious  
users); they continue to contribute because of other reasons. I  
suspect developers like that are not unlikely to scratch an itch that  
isn't their own.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Gregory Stark

Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:

 Are you filling multiple buffers in the buffer cache with a 
 single read-call?

 yes, needs vector or ScatterGather IO.

I would expect that to get only moderate improvement. To get the full benefit
I would think you would want to either fire off a separate thread to do the
read-ahead, use libaio, or funnel the read-ahead requests to a separate thread
like our bgwriter only it would be a bgreader or something like that.

 The OS should be doing readahead for us 
 anyway, so I don't see how just issuing multiple ReadBuffers 
 one after each other helps.

 Last time I looked OS readahead was only comparable to 32k blocked reads.
 256k blocked reads still perform way better. Also when the OS is confronted
 with an IO storm the 256k reads perform way better than OS readahead.

Well that's going to depend on the OS. Last I checked Linux's readahead logic
is pretty straightforward and doesn't try to do any better than 32k readahead
and is easily fooled. However I wouldn't be surprised if that's changed. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Managing the community information stream

2007-05-08 Thread Lukas Kahwe Smith

Jim Nasby wrote:

On May 8, 2007, at 9:50 AM, Andrew Sullivan wrote:

On Mon, May 07, 2007 at 07:36:55AM -0500, Jim Nasby wrote:

Instead, if all feature requests are tracked then users can vote on
what's most important to them.


I am sympathetic to the issues you and Andrew are describing (I
understand Bruce's stream analogy, but I think Andrew is right that
from the user's point of view, it's not usable).  But I am not
convinced that users voting on desired features will get us the
users' desired features.  The features we get are mostly the features
that have attracted developers.  The method by which that attraction
happens is interesting, but I don't think it's democratic.


It may... it may not. If a high-demand feature sits around long enough 
it could well attract someone capable of working on it, but who isn't a 
current contributor. Or it could attract a bounty.


Also keep in mind that many of the developers are working for companies 
that ensure that resources get allocated according to what users need 
and not only by what developers are motivated to work on.


That being said, it seems obvious that so far PostgreSQL has been mainly 
driven by what developers feel like implementing. I think this is also 
what ensured the high level of standards compliance of PostgreSQL, since 
features were tailored for experienced DBA types, rather than end users 
that are less experienced in how to leverage these standards.


regards,
Lukas

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Patch: Allocation of kind codes for spatial type.

2007-05-08 Thread Tom Lane
Ale Raza [EMAIL PROTECTED] writes:
 Attached is the modified pg_statistic.h file for this patch. As you
 mentioned 200 - 299 is reserved for ESRI st_geometry type. Out of these
 values I am using code 200 for st_geometry.

Done.

For future reference, what people normally mean by a patch is
diff -c output.  Sending the whole modified file is not convenient
because it can't be dropped-in if there have been any changes since the
version you worked from (as indeed there have been, in this case).

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [COMMITTERS] pgsql: Teach tuplesort.c about top N sorting, in which only the first

2007-05-08 Thread Magnus Hagander
Magnus Hagander wrote:
 Alvaro Herrera wrote:
 Jim Nasby wrote:

 If we really want to make the logfile the approved method for  
 monitoring performance, then why do we have the stats infrastructure  
 at all? It could all be replaced with logging output and pgfouine.
 First we'd have to fix the usability problem of our redirect_stderr
 stuff for pgfouine (multiline messages from different backends are
 mixed, according to Guillaume).
 
 I'd like to sign on to the list of people saying that logging isn't the
 best way to do performance measuring. Providing a way to get at the
 counters in realtime for monitoring or graphing or whatever is what
 AFAIK everybody else do, and it's for a good reason - it fits in to
 existing monitoring/management solutions. It may be that our current
 stats system isn't the best way to do this, but if it isn't that just
 means we have to come up with a better way.


Speaking of which, it might be interesting to actually show these values
in the stats collector. I was thinking three cols for each database
(probably the best level?) that counts each of those three counters. If
you have a lot of sorts (percentage-wise) spilling to disk, it is often
something you want to investigate, so exposing it that way seems like a
good thing.

Comments on that? Oh, and is it too late to sneak this into 8.3 claiming
it's an extension to the addition of the new sort feature? ;-)

//Magnus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Managing the community information stream

2007-05-08 Thread Lukas Kahwe Smith

Hi,

guess I missed hackers on my initial reply. So I am re-sending the reply 
I send to Joshua based on the reply I send to him in regards to a 
hackers@ posting.


Read below.

regards,
Lukas

Joshua D. Drake wrote:




That being said, it seems obvious that so far PostgreSQL has been 
mainly driven by what developers feel like implementing. I think this 
is also what ensured the high level of standards compliance of 
PostgreSQL, since features were tailored for experienced DBA types, 
rather than end users that are less experienced in how to leverage 
these standards.


PostgreSQL has *never* been developed with the DBA in mind. Keep in mind 
that most of the postgresql developers have *zero* real world 
experience. Nor do they run postgresql themselves in real world 
production environments.


Well, certainly more with a DBA in mind than a middle tier developer?

regards,
Lukas

PS: Did you mean to only reply to me?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] problem using semaphores

2007-05-08 Thread Neil Conway
On Wed, 2007-09-05 at 01:07 +0530, Raja Agrawal wrote:
 We are using the following piece of code for updating a list
 synchronously i.e. no two threads would update the list at a time.

Do you mean threads or processes?

 Is the following way of using semaphores not correct in the postgres
 environment?

Why can't you use LWLocks?

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] savepoints and upgrading locks

2007-05-08 Thread Jeff Davis

This thread here became a TODO item:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg01011.php

During that discussion a patch was produced that nobody seemed to have
objections to. The main problem seemed to be that it wouldn't always
downgrade the lock on a ROLLBACK TO (as I understand it), which doesn't
seem like a problem to me.

Is there a reason this isn't a part of 8.3, or was it just forgotten?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] Patch: Allocation of kind codes for spatial type.

2007-05-08 Thread Alvaro Herrera
Ale Raza wrote:
 Ok, Thanks, Which release it will be in, PostgreSQL 8.2.5? 

8.3, but you can actually use the numbers in whatever release you
please as it is unlikely that they would have been used by anyone else
(much less somebody with which you would be sharing a database).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Jeff Davis
On Tue, 2007-05-08 at 11:40 +0100, Heikki Linnakangas wrote:
 I'm going to do this incrementally, and we'll see how far we get for 
 8.3. We might push 3A and/or 3B to 8.4. First, I'm going to finish up 
 Simon's patch (step 1), run some performance tests with vacuum, and 
 submit a patch for that. Then I'll move to Jeff's patch (step 2).

I think it's best to postpone 3A (one aspect of my patch). There are a
couple reasons:

1) The results didn't show enough benefit yet.
2) Complex interactions with Simon's patch.

I think it's an area of research for the future, but for now I just want
to concentrate on the most important aspect of my patch: the
synchronized scanning ( #2 in your list ).

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Jeff Davis
On Tue, 2007-05-08 at 07:47 -0400, Luke Lonergan wrote:
 Heikki,
 
 On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc)
 implement dynamic I/O caching.  The experiments have shown that benefit
 of re-using PG buffer cache on large sequential scans is vanishingly
 small when the buffer cache size is small compared to the system memory.
 Since this is a normal and recommended situation (OS I/O cache is
 auto-tuning and easy to administer, etc), IMO the effort to optimize
 buffer cache reuse for seq scans  1 x buffer cache size is not
 worthwhile.
 

I think 3A is still an interesting idea, but I agree that it needs some
results to back it up. Let's save 3A for the next release so that we
have more time to see.

 To that point - an important factor in achieving max I/O rate for large
 tables ( 1 x bufcache) is avoiding the pollution of the CPU L2 cache.
 This is commonly in the range of 512KB - 2MB, which is only important
 when considering a bound on the size of the ring buffer.  The effect has
 been demonstrated to be significant - in the 20%+ range.  Another thing
 to consider is the use of readahead inside the heapscan, in which case
 sizes = 32KB are very effective.

One thing I'd like us to keep in mind is to have a reasonable number of
buffers active for a sequential scan. If the number is too small, my
sync scans might not even work. Right now my patch only reports every 16
pages, so 32KB (=4 pages) is not going to work for sync scans (I suppose
only testing will tell).

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory

2007-05-08 Thread Hiroshi Inoue
Peter Eisentraut wrote:
 Am Dienstag, 8. Mai 2007 15:12 schrieb Hiroshi Inoue:
 Oh I seem to have been apart from the community too long.
 Could you please tell me where I can find the rule ?
 
 The only rule there is is that if you want to talk to person X, you write 
 to 
 person X.  That rule is as old as communications.

Hmmm have I misunderstood mailing lists ?
Is the rule right in case of communications via mailing lists ?
For example I know a ML which doesn't use direct reply mais at all.
As for PostgreSQL a few years ago AFAIR it was left to each member
 whether he sends direct reply mails as well.
I don't know if it is stll so currently.

regards,
Hiroshi Inoue



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org