date:20070313

Re: [HACKERS] [COMMITTERS] pgsql: Make configuration parameters fall back to their default values

2007-03-13 Thread Magnus Hagander

On Mon, Mar 12, 2007 at 08:20:53PM -0500, Andrew Dunstan wrote:
 Gregory Stark wrote:
  Tom Lane [EMAIL PROTECTED] writes:
 
  [EMAIL PROTECTED] (Peter Eisentraut) writes:
  Make configuration parameters fall back to their default values when
  they
  are removed from the configuration file.
 
  It appears that this patch has broken custom GUC variables; at the very
  least it's broken plperl.
 
  Huh, it occurs to me that I haven't seen any plperl regression tests fly
  by
  when I've been running regression tests myself. What do I have to do to
  test
  if plperl, plpython, etc work with the packed varlena patch?
 
 
 
 cd src/pl; make installcheck

Is there any particular reason why we don't run these as part of a
general make check?

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread Magnus Hagander

On Mon, Mar 12, 2007 at 10:05:58PM -0700, Sean Utt wrote:
 And then what? Make the search box on www.postgresql.org able to handle an 
 email address as search text without throwing a shoe?
 
 Search for [EMAIL PROTECTED] or any other 'email' address from the 
 postgres home page. Barfage every time.
 Easy for some isn't easy for all, apparently. Left that out as a test case 
 did we? Someone searching a mailing list for an email address? Who wudda 
 thunk it? It works without the . -- don't know why, but then I also don't 
 know why someone hasn't tried that before me.

Obviously not, since nobody has reported it before.
Fixed, insofar that we don't barf on it, but show you the no hits message.
Because all email addresses are obfuscated in the archives, they're not
actually searchable.

Thanks for reporting the issue. You're advised to report issues with the
website to pgsql-www instead for a faster response, since web people
don't always monitor all the other lists.

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] czech national site is moved

2007-03-13 Thread Pavel Stehule


Hello

I moved czech wiki from http://postgresql.interweb.cz to 
http://www.pgsql.cz. Can somobody update web link?


Thank You
Pavel Stehule

_
Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. 
http://messenger.msn.cz/



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] czech national site is moved

2007-03-13 Thread Devrim GÜNDÜZ

Hi,

On Tue, 2007-03-13 at 10:06 +0100, Pavel Stehule wrote:

 I moved czech wiki from http://postgresql.interweb.cz to 
 http://www.pgsql.cz. Can somobody update web link?

Done -- but you should send this to pgsql-www, not to this list.

Regards,
-- 
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/




signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] CLUSTER and MVCC

2007-03-13 Thread Heikki Linnakangas


Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
In each chain, there must be at least one non-dead tuple with xmin  
Oldestxmin.


Huh?  Typically *all* the tuples but the last are dead, for varying
values of dead.  Please be more specific what you mean.


I meant dead as in HeapTupleSatisfiesVacuum(...) == HEAPTUPLE_DEAD.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread Csaba Nagy

On Tue, 2007-03-13 at 00:43, Richard Huxton wrote:
 Josh Berkus wrote:
  I really don't see any way you could implement UDFs other than EAV that 
  wouldn't be immensely awkward, or result in executing DDL at runtime.
 
 What's so horrible about DDL at runtime? Obviously, you're only going to 
 allow specific additions to specific schemas/tables, but why not?

Well, exclusively locking the table for DDL is not always possible in
production systems. We also shortly had a solution where we added new
columns on the fly, and we had to ditch it... it was simply not working.
The users were usually adding columns in the peek business hours, and in
the same hours it was simply impossible to take an exclusive lock on
that table. I think DDL will actually also exclusively lock parent
tables of FK relationships (I might be mistaken here, but I think I have
seen such lock), which is even worse.

After it caused extensive perceived downtime for hours, we simply
redesigned the feature so that the users need now to announce beforehand
how many different types of new columns they will need and we just
create a few extra of them, and assign them to the users needs as they
need it... the unused columns stay null and hopefully don't have too big
overhead, but it's the simplest solution we found which actually works.
When the user runs out of spare columns, the admin will create some new
spares in quiet hours.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread Richard Huxton


Sean Utt wrote:
And then what? Make the search box on www.postgresql.org able to handle 
an email address as search text without throwing a shoe?


Search for [EMAIL PROTECTED] or any other 'email' address from the 
postgres home page. Barfage every time.
Easy for some isn't easy for all, apparently. Left that out as a test 
case did we? Someone searching a mailing list for an email address? Who 
wudda thunk it? It works without the . -- don't know why, but then I 
also don't know why someone hasn't tried that before me.


Had a bad day?

And I'll be damned if you're not right - it doesn't return results for 
dev@archonet.com but does for [EMAIL PROTECTED] Presumably something to do 
with (not ?) splitting the email address on .. Can't believe no-one 
has noticed this before (me, for example). I guess that even though I 
search a lot, it's not on email addrs.


Have you reported it to the www team?

Sure, sounds like a simple solution to me... Richard said 
sarcastically. Would be funnier if the search on the website wasn't 
broken in a completely stupid, almost ironic way. Ah, irony and sarcasm 
-- the ugly twins.


Actually, it was Greg who said that. And it was *me* the (really very 
gentle) sarcasm was directed at.


Yeah, we have to dynamically generate queries day in and day out. But 
then some of us actually work for a living.


Umm - like all of us?

Since we already have to do that, maybe someone could make that easier? 


Good idea!


Isn't that really the point here?


Not as I was reading the discussion.

 Someone asked if something would be
useful, and the people who use the database to do real work said YES, 
and here's how I might use it. Like full text seach and recursive 
queries, user defined (fields|attributes|properties) and the ability to 
manage them would be BUTTER! Is it a difficult problem? YES, but if it 
wasn't, why should it be worth an advanced degree?


I think the main discussion has been around:
1. Whether Edward's final-year project is basically EAV (in which case 
he'll probably need to work hard to get good marks).
2. Whether dynamically generating DDL is safe/practical in a business 
setting. This seems to split some *very* experienced people about 50:50. 
Half of them agree with me and the other half are inexplicably wrong ;-)


If you read section 0.3 of Edward's project proposal he argues that 
dynamic DDL causes problems for the application because: However, SQL 
does not provide an easy syntax for querying these properties. (meaning 
the changed structure of the database). I'm not saying this part is 
easy, but I'm not convinced it's harder than doing it the wrong way. 
At least not if you do as Edward does and enforce types.


Now, in 0.3.1 he describes a normalised webpage=tags table pair and 
shows some cumbersome-looking SQL. However, I'd argue this is due to the 
verbose nature of the SQL rather than the underlying expressions.


He then looks at what I think of as the system settings table* 
problem, where you have a bunch of configuration-settings you'd tend to 
store in a single table (setting_name, value), except you'd like to have 
different types for each setting (a further wrinkle is that you might 
like lists of settings - do you use arrays or a sub-table?). This is 
your classic EAV scenario.


Now, he proposes a set of tables - one per property, so you can enforce 
type constraints, but he will need to create syntax to make this easier 
to query. Presumably it'll need an automatically-generated view over the 
top. (Edward - tip: read up on Date's thoughts on automatically 
determining what table you can insert a row in based on its type).


This certainly looks like a variant on EAV to me, and I'm not convinced 
that it's gaining much since you'll have possibly huge numbers of joins 
going on in the background while not having any semantics to your table 
definitions.


The other variant (which I've used myself) is to have a type column 
and a trigger to enforce the text-value is valid for type.


Now, if you do it properly, that is define tables as you would as a 
human designer, then you do have the problem of informing your 
application on the relevant structure. Now, the syntax issues of this 
are largely solved - plenty of ActiveRecord-style classes out there for 
Ruby,Python,Perl,PHP,...


The interesting question then becomes where the practical limits in such 
flexibility are. Simple attribute/value pairs are easy enough. How do 
you feel about changes to existing tables though? How about new tables 
that add relationships between existing tables? At what point does the 
application just become pgAdmin?



* Another good example is the questionnaire. Users need to be able to 
define their own lists of questions and if driving_licence=no then 
don't bother to ask type of car=?. Edward - if you want to see a 
schema that implements this sort of thing, contact me off list and I'll 
see what I can do. The client for that project will probably be

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-13 Thread Simon Riggs

On Mon, 2007-03-12 at 22:16 -0700, Luke Lonergan wrote:

 You may know we've built something similar and have seen similar gains.

Cool

 We're planning a modification that I think you should consider: when there
 is a sequential scan of a table larger than the size of shared_buffers, we
 are allowing the scan to write through the shared_buffers cache.

Write? For which operations?

I was thinking to do this for bulk writes also, but it would require
changes to bgwriter's cleaning sequence. Are you saying to write say ~32
buffers then fsync them, rather than letting bgwriter do that? Then
allow those buffers to be reused?

 The hypothesis is that if a relation is of a size equal to or less than the
 size of shared_buffers, it is cacheable and should use the standard LRU
 approach to provide for reuse.

Sounds reasonable. Please say more.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-13 Thread Simon Riggs

On Tue, 2007-03-13 at 13:40 +0900, ITAGAKI Takahiro wrote:
 Simon Riggs [EMAIL PROTECTED] wrote:
 
With the default
value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers 
in pool,
just like existing sequential scans. Is this intended?
   
  New test version enclosed, where scan_recycle_buffers = 0 doesn't change
  existing VACUUM behaviour.
 
 This is a result with scan_recycle_buffers.v3.patch. I used normal VACUUM
 with background load using slowdown-ed pgbench in this instance. I believe
 the patch is useful in normal cases, not only for VACUUM FREEZE.
 
N |  time  | WAL flush(*)
 -++---
0 | 112.8s | 44.3%
1 | 148.9s | 52.1%
8 | 105.1s | 17.6%
   16 |  96.9s |  8.7%
   32 | 103.9s |  6.3%
   64 |  89.4s |  6.6%
  128 |  80.0s |  3.8%

Looks good.

Not sure what value of N to pick for normal use. The objectives are
i) don't let VACUUMs spoil the cache
ii) speed up standalone VACUUMs
iii) don't let VACUUM cause others to repeatedly WAL flush

I'm thinking N=16 meets all 3 objectives. We could make VACUUM go faster
still, but by knocking more blocks out of cache that someone doing real
work might need. That will slow them down almost as much as forcing them
to flush WAL, so I'd want to be conservative with VACUUM.

Does anybody think we need a new parameter for this, or are we happy at
16 buffers in the recycle loop for VACUUM?

At this point I should note something I haven't mentioned before.
VACUUMs force other backends to flush out WAL only when we have an I/O
bound workload. If the database already fits in memory then BufferAlloc
never needs to run and therefore we don't need to flush WAL. So I can
understand that the effect of WAL flushing may not have been noticed by
many testers. 

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Mario Weilguni

Hi,

I've a problem with a database, I can dump the database to a file, but 
restoration fails, happens with 8.1.4.

Steps to reproduce:
create database testdb with encoding='UTF8';
\c testdb
create table test(x text);
insert into test values ('\244'); == Is akzepted, even if not UTF8.

pg_dump testdb -f testdb.dump -Fc
pg_restore -f testdb.dump -d testdb = fails with an error: 
ERROR:  invalid byte sequence for encoding UTF8: 0xa4

The problem itself comes from a CSV file, which is imported with \copy without 
proper quoting (so I have to fix this anyway), but I still think this is an 
error, making restoration very complicated in such cases...

Or am I doing something completly wrong here?

Best regards,
Mario Weilguni


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [COMMITTERS] pgsql: Make configuration parameters fall back to their default values

2007-03-13 Thread Andrew Dunstan


Magnus Hagander wrote:

On Mon, Mar 12, 2007 at 08:20:53PM -0500, Andrew Dunstan wrote:
  

Gregory Stark wrote:


Tom Lane [EMAIL PROTECTED] writes:

  

[EMAIL PROTECTED] (Peter Eisentraut) writes:


Make configuration parameters fall back to their default values when
they
are removed from the configuration file.
  

It appears that this patch has broken custom GUC variables; at the very
least it's broken plperl.


Huh, it occurs to me that I haven't seen any plperl regression tests fly
by
when I've been running regression tests myself. What do I have to do to
test
if plperl, plpython, etc work with the packed varlena patch?

  

cd src/pl; make installcheck



Is there any particular reason why we don't run these as part of a
general make check?

//Magnus

  



Probably historical more than anything else.

The core tests all run regardless of configuration, though, and the PL 
tests use a different database (by design). When we standardised this we 
did just enough to enable the buildfarm clients to test PLs sanely. If 
you think we need more, have a go at it.


I should perhaps point out that the buildfarm client can be used to do a 
comprehensive build and test on your sources, including all the 
configured PLs, ECPG and the contrib tests, using either the 
--from-source or --from-source-clean flags. These were originally 
designed to help diagnose and fix problems disclosed during normal 
buildfarm runs, but I have found it quite useful when working on 
substantial projects. You don't need to be registered as a buildfarm 
member to use the client program, in these modes - no results are 
uploaded to the server when these flags are used.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Albe Laurenz

Mario Weilguni wrote:

 Steps to reproduce:
 create database testdb with encoding='UTF8';
 \c testdb
 create table test(x text);
 insert into test values ('\244'); == Is akzepted, even if not UTF8.

This is working as expected, see the remark in
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQ
L-SYNTAX-STRINGS

It is your responsibility that the byte sequences you create
 are valid characters in the server character set encoding.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Mario Weilguni

Am Dienstag, 13. März 2007 14:46 schrieb Albe Laurenz:
 Mario Weilguni wrote:
  Steps to reproduce:
  create database testdb with encoding='UTF8';
  \c testdb
  create table test(x text);
  insert into test values ('\244'); == Is akzepted, even if not UTF8.

 This is working as expected, see the remark in
 http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQ
 L-SYNTAX-STRINGS

 It is your responsibility that the byte sequences you create
  are valid characters in the server character set encoding.

In that case, pg_dump is doing wrong here and should quote the output. IMO it 
cannot be defined as working as expected, when this makes any database dumps 
worthless, without any warnings at dump-time.

pg_dump should output \244 itself in that case.

Best regards,
Mario Weilguni

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread David Fetter

On Tue, Mar 13, 2007 at 09:31:45AM +, Richard Huxton wrote:
 Sean Utt wrote:
 And then what? Make the search box on www.postgresql.org able to
 handle an email address as search text without throwing a shoe?
 
 Search for [EMAIL PROTECTED] or any other 'email' address from the
 postgres home page. Barfage every time.  Easy for some isn't easy
 for all, apparently. Left that out as a test case did we? Someone
 searching a mailing list for an email address? Who wudda thunk it?
 It works without the . -- don't know why, but then I also don't
 know why someone hasn't tried that before me.
 
 Had a bad day?

Today was a good day ;)

 Since we already have to do that, maybe someone could make that easier? 
 
 Good idea!

Hear, hear!

  Someone asked if something would be
 useful, and the people who use the database to do real work said YES, 
 and here's how I might use it. Like full text seach and recursive 
 queries, user defined (fields|attributes|properties) and the ability to 
 manage them would be BUTTER! Is it a difficult problem? YES, but if it 
 wasn't, why should it be worth an advanced degree?
 
 I think the main discussion has been around:
 1. Whether Edward's final-year project is basically EAV (in which case 
 he'll probably need to work hard to get good marks).

As Josh mentioned, I'm on of the people who says EAV is never
justified.  We do have a way of expressing wide ranges of constraints
not known in advance: it's called SQL, and people need to do some
design using it, however much they may fear that they've left
something out somehow. :)

 2. Whether dynamically generating DDL is safe/practical in a
 business setting. This seems to split some *very* experienced people
 about 50:50.  Half of them agree with me and the other half are
 inexplicably wrong ;-)

For the record, I'm in the ag'in column.

 * Another good example is the questionnaire.

With all due respect, this is a solved problem *without EAV or
run-time DDL*.  The URL below has one excellent approach to this.

http://www.varlena.com/GeneralBits/110.php

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Major Feature Interactions

2007-03-13 Thread Merlin Moncure


On 2/28/07, Simon Riggs [EMAIL PROTECTED] wrote:

There's a number of major features all coming together over the next
weeks, so I wanted to raise some awareness of possible areas of
interaction. My concern is more than just will multiple patches apply
together?, but extends into whether the features interact with each
other with synergy, ignore each other or have negative impact.

The projects that concern me are the following internals projects

Plan Invalidation
Autovacuum changes
Bitmap indexes
GIT indexes
Dead Space Map (DSM)
HOT Updating
Updateable cursors
Restartable VACUUM


What about the interaction of plan invalidation and the security
problems with 'security definer'?  I am especially curious if pl
functions will no longer regenerate their plans with each session for
8.3.  Is that going to be the case?  It wasn't completely clear from
the related discussions AFAICT.

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Andrew Dunstan


Mario Weilguni wrote:

Am Dienstag, 13. März 2007 14:46 schrieb Albe Laurenz:
  

Mario Weilguni wrote:


Steps to reproduce:
create database testdb with encoding='UTF8';
\c testdb
create table test(x text);
insert into test values ('\244'); == Is akzepted, even if not UTF8.
  

This is working as expected, see the remark in
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQ
L-SYNTAX-STRINGS

It is your responsibility that the byte sequences you create
 are valid characters in the server character set encoding.



In that case, pg_dump is doing wrong here and should quote the output. IMO it 
cannot be defined as working as expected, when this makes any database dumps 
worthless, without any warnings at dump-time.


pg_dump should output \244 itself in that case.

  


The sentence quoted from the docs is perhaps less than a model of 
clarity. I would take it to mean that no client-encoding - 
server-encoding translation will take place. Does it really mean that 
the server will happily accept any escaped byte sequence, whether or not 
it is valid for the server encoding? If so that seems ... odd.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread Richard Huxton


David Fetter wrote:

On Tue, Mar 13, 2007 at 09:31:45AM +, Richard Huxton wrote:

* Another good example is the questionnaire.


With all due respect, this is a solved problem *without EAV or
run-time DDL*.  The URL below has one excellent approach to this.

http://www.varlena.com/GeneralBits/110.php


Which broadly speaking was the solution I used for my questionnaire, 
except I had a restricted set of types so basically just coerced them to 
text and side-stepped the inheritance issue. To the extent that it's 
dynamic, it's still just EAV though.


It doesn't remove the need for run-time DDL if you allow users to add 
their own questions. If I add a National Insurance No. (Social 
security No) then either it's:

1. Plain text, and can take clearly invalid codes
2. A user-space construct with regexp matches etc (basically recreating DDL)
3. DDL.

And as the example says, you need to create the table types in advance. 
If you want to add e.g. Work History (employer, from_date, to_date) to 
a questionnaire then you'll need dynamic DDL (of form #2 or #3 above).


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Mario Weilguni

Am Dienstag, 13. März 2007 15:12 schrieb Andrew Dunstan:
 The sentence quoted from the docs is perhaps less than a model of
 clarity. I would take it to mean that no client-encoding -
 server-encoding translation will take place. Does it really mean that
 the server will happily accept any escaped byte sequence, whether or not
 it is valid for the server encoding? If so that seems ... odd.

Yes, \octal sequences are accepted even if invalid. The problem is, pgdump 
will happily dump those sequences as is, so in that case a char ascii 0xa4 is 
emitted, and so the dump cannot be restored with pg_restore. 

A dangerous feature IMO, and will make a lot of DB admins very unhappy if they 
have to validate every day if the precious database dumps can be restored in 
case of an error. 

Best regards,
Mario Weilguni



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread Andrew Dunstan


David Fetter wrote:

I think the main discussion has been around:
1. Whether Edward's final-year project is basically EAV (in which case 
he'll probably need to work hard to get good marks).



As Josh mentioned, I'm on of the people who says EAV is never
justified.  We do have a way of expressing wide ranges of constraints
not known in advance: it's called SQL, and people need to do some
design using it, however much they may fear that they've left
something out somehow. :)

  


ISTM that the biggest problem with EAV is that is is far too seductive, 
and allows for lazy design (or lack of design). There might be odd cases 
(pace David) that require it, but I strongly suspect in most cases it is 
not necessary.


Perhaps we need some sort of discriminant record type ;-)

cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-13 Thread Luke Lonergan

Simon,

On 3/13/07 2:37 AM, Simon Riggs [EMAIL PROTECTED] wrote:

 We're planning a modification that I think you should consider: when there
 is a sequential scan of a table larger than the size of shared_buffers, we
 are allowing the scan to write through the shared_buffers cache.
 
 Write? For which operations?

I'm actually just referring to the sequential scan writing into the shared
buffers cache, sorry for the write through :-)
  
 I was thinking to do this for bulk writes also, but it would require
 changes to bgwriter's cleaning sequence. Are you saying to write say ~32
 buffers then fsync them, rather than letting bgwriter do that? Then
 allow those buffers to be reused?

Off topic, but we think we just found the reason(s) for the abysmal heap
insert performance of pgsql and are working on a fix to that as well.  It
involves two main things: the ping-ponging seeks used to extend a relfile
and the bgwriter not flushing aggressively enough.  We're hoping to move the
net heap insert rate from 12MB/s for a single stream to something more like
100 MB/s per stream, but it may take a week to get some early results and
find out if we're on the right track.  We've been wrong on this before ;-)

- Luke   



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread Teodor Sigaev


Hmm, hstore + (optionally) functional indexes. Is it answer?

Edward Stanley wrote:

Hi,


Was wondering if people would mind having a read over what I plan to do for my 
undergraduate honours project - you can get the proposal here:


http://www.mcs.vuw.ac.nz/~eddie/489_Proposal.pdf

What I'd basically like to know is 
a) Is this problem worth solving? 
b) Is there already a good solution (particularly, within PostgreSQL)?

c) Any other feedback or comments you may have.


Regards
Eddie Stanley


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Albe Laurenz

Mario Weilguni wrote:
 Steps to reproduce:
 create database testdb with encoding='UTF8';
 \c testdb
 create table test(x text);
 insert into test values ('\244'); == Is akzepted, even if not UTF8.

 This is working as expected, see the remark in
 
 http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS

 It is your responsibility that the byte sequences you create
  are valid characters in the server character set encoding.
 
 In that case, pg_dump is doing wrong here and should quote the output. IMO it 
 cannot be defined as working as expected, when this makes any database dumps 
 worthless, without any warnings at dump-time.
 
 pg_dump should output \244 itself in that case.

True. Here is a test case on 8.2.3
(OS, database and client all use UTF8):

test= CREATE TABLE test(x text);
CREATE TABLE
test= INSERT INTO test VALUES ('correct: ä');
INSERT 0 1
test= INSERT INTO test VALUES (E'incorrect: \244');
INSERT 0 1
test= \q
laurenz:~ pg_dump -d -t test -f test.sql

Here is an excerpt from 'od -c test.sql':

0001040   e   n   z  \n   -   -  \n  \n   I   N   S   E   R   T   I
0001060   N   T   O   t   e   s   t   V   A   L   U   E   S
0001100   (   '   c   o   r   r   e   c   t   : 303 244   '   )   ;
0001120  \n   I   N   S   E   R   T   I   N   T   O   t   e   s
0001140   t   V   A   L   U   E   S   (   '   i   n   c   o   r
0001160   r   e   c   t   : 244   '   )   ;  \n  \n  \n   -   -  \n

The invalid character (octal 244) is in the INSERT statement!

This makes psql gag:

test= DROP TABLE test;
DROP TABLE
test= \i test.sql 
SET
SET
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
INSERT 0 1
psql:test.sql:33: ERROR:  invalid byte sequence for encoding UTF8: 0xa4
HINT:  This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by client_encoding.

A fix could be either that the server checks escape sequences for validity
or that pg_dump outputs invalid bytes as escape sequences.
Or pg_dump could stop with an error.
I think that the cleanest way would be the first.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Andrew Dunstan


Albe Laurenz wrote:

A fix could be either that the server checks escape sequences for validity
  


This strikes me as essential. If the db has a certain encoding ISTM we 
are promising that all the text data is valid for that encoding.


The question in my mind is how we help people to recover from the fact 
that we haven't done that.


cheers

andrew




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Joshua D. Drake

Andrew Dunstan wrote:
 Albe Laurenz wrote:
 A fix could be either that the server checks escape sequences for
 validity
   
 
 This strikes me as essential. If the db has a certain encoding ISTM we
 are promising that all the text data is valid for that encoding.
 
 The question in my mind is how we help people to recover from the fact
 that we haven't done that.

Is this any different than the issues of moving 8.0.x to 8.1 UTF8? Where
we had to use iconv?

Joshua D. Drake

 
 cheers
 
 andrew
 
 
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-13 Thread Mario Weilguni

Am Dienstag, 13. März 2007 16:38 schrieb Joshua D. Drake:
 Andrew Dunstan wrote:
  Albe Laurenz wrote:
  A fix could be either that the server checks escape sequences for
  validity
 
  This strikes me as essential. If the db has a certain encoding ISTM we
  are promising that all the text data is valid for that encoding.
 
  The question in my mind is how we help people to recover from the fact
  that we haven't done that.

 Is this any different than the issues of moving 8.0.x to 8.1 UTF8? Where
 we had to use iconv?


What issues? I've upgraded several 8.0 database to 8.1. without having to use 
iconv. Did I miss something?

Regards,
Mario Weilguni

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Major Feature Interactions

2007-03-13 Thread Tom Lane

Merlin Moncure [EMAIL PROTECTED] writes:
 What about the interaction of plan invalidation and the security
 problems with 'security definer'?  I am especially curious if pl
 functions will no longer regenerate their plans with each session for
 8.3.  Is that going to be the case?  It wasn't completely clear from
 the related discussions AFAICT.

No, there's no intention of caching plans across sessions.

I'm intending to build some infrastructure in namespace.c to support
a stack of override search_path values, and then tweak plancache.c
to override search_path during a replan.  (It has to be a stack because
nested replan events are entirely possible.)  I think this
infrastructure will also work for attaching search paths to functions;
but we still lack an agreed design spec for how that should look to the
user.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Michael Ledford


It appears that we didn't do enough research in regards to the recent
DST switch. We poorly assumed that having our machine's timezone files
up to date would be sufficient not knowing that our version of
postgres relied on its own timezone files.

The question is... can we symlink the share/postgresql/timezone/
directory to our OS X /usr/share/zoneinfo to without fear of breaking
anything in the future?

I'm also curious about the rationale to maintain a separate timezone
data files for machines that supply them.

Sincerely,
Michael

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Jeff Davis

On Mon, 2007-03-12 at 13:21 +, Simon Riggs wrote:
 So based on those thoughts, sync_scan_offset should be fixed at 16,
 rather than being variable. In addition, ss_report_loc() should only
 report its position every 16 blocks, rather than do this every time,
 which will reduce overhead of this call.

If we fix sync_scan_offset at 16, we might as well just get rid of it.
Sync scans are only useful on large tables, and getting a free 16 pages
over a scan isn't worth the trouble. However, even without
sync_scan_offset, sync scans are still a valuable feature.

I agree that ss_report_loc() doesn't need to report on every call. If
there's any significant overhead I agree that it should report less
often. Do you think that the overhead is significant on such a simple
function?

 
 To match that, scan_recycle_buffers should be fixed at 32. So GUCs for
 sync_scan_offset and scan_recycle_buffers would not be required at all.
 
 IMHO we can also remove sync_scan_threshold and just use NBuffers
 instead. That way we get the benefit of both patches or neither, making
 it easier to understand what's going on.

I like the idea of reducing tuning parameters, but we should, at a
minimum, still allow an on/off button for sync scans. My tests revealed
that the wrong combination of OS/FS/IO-Scheduler/Controller could result
in bad I/O behavior.

 If need be, the value of scan_recycle_buffers can be varied upwards
 should the scans drift apart, as a way of bringing them back together.

If the scans aren't being brought together, that means that one of the
scans is CPU bound or outside the combined cache trail (shared_buffers
+ OS buffer cache). 

 We aren't tracking whether they are together or apart, so I would like
 to see some debug output from synch scans to allow us to assess how far
 behind the second scan is as it progresses. e.g.
 LOG:  synch scan currently on block N, trailing pathfinder by M blocks
 issued every 128 blocks as we go through the scans. 
 
 Thoughts?
 

It's hard to track where all the scans are currently. One of the
advantages of my patch is its simplicity: the scans don't need to know
about other specific scans, and there is no concept in the code of a
head scan or a pack.

There is no easy way to tell which scan is ahead and which is behind.
There was a discussion when I submitted this proposal at the beginning
of 8.3, but I didn't see enough benefit to justify all of the costs and
risks associated with scans communicating between eachother. I
certainly can't implement that kind of thing before feature freeze, and
I think there's a risk of lock contention for the communication
required. I'm also concerned that -- if the scans are too
interdependent -- it would make postgres less robust against the
disappearance of a single backend (i.e. what if the backend that is
leading a scan dies?).

Regards,
Jeff Davis




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Josh Berkus

Michael,

 I'm also curious about the rationale to maintain a separate timezone
 data files for machines that supply them.

It's because we found that we couldn't ensure consistency between operating 
systems while relying on OS files.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Tom Lane

Jeff Davis [EMAIL PROTECTED] writes:
 I agree that ss_report_loc() doesn't need to report on every call. If
 there's any significant overhead I agree that it should report less
 often. Do you think that the overhead is significant on such a simple
 function?

One extra LWLock cycle per page processed definitely *is* a significant
overhead ... can you say context swap storm?  I'd think about doing it
once every 100 or so pages.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Jeff Davis

On Tue, 2007-03-13 at 12:53 -0400, Tom Lane wrote:
 Jeff Davis [EMAIL PROTECTED] writes:
  I agree that ss_report_loc() doesn't need to report on every call. If
  there's any significant overhead I agree that it should report less
  often. Do you think that the overhead is significant on such a simple
  function?
 
 One extra LWLock cycle per page processed definitely *is* a significant
 overhead ... can you say context swap storm?  I'd think about doing it
 once every 100 or so pages.
 

No lock is needed to store the hint. If somehow the hint (which is
stored in a static table, no pointers) gets invalid data due to a race
condition, the new scan will simply consider the hint invalid and start
at 0.

I did this precisely to avoid causing a performance regression for usage
patterns that don't benefit from sync scans.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Josh Berkus

Tom,

 You can try the symlink game if you want, but it'll be on your own head
 whether it works or not.  (For the record, I am hoping to do exactly
 that in future releases for Red Hat ... but in that context I know what
 the system's timezone code is.  I'm less sure that I know what Apple
 is using.)

Yeah, Solaris wants us to do the same thing ... use their files.  At some 
point, I'll have to hack it and see how it works.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Michael Ledford


You can try the symlink game if you want, but it'll be on your own head
whether it works or not.  (For the record, I am hoping to do exactly
that in future releases for Red Hat ... but in that context I know what
the system's timezone code is.  I'm less sure that I know what Apple
is using.)


Thank you all for your speedy and informative replies. I had a good
idea why you wouldn't symlink (for release consistency), but I wasn't
aware that there were format differences in the timezone files.

Currently Apple's format appears to work fine with postgresql. And
given the responses and to make a quick job of it I will be copying
Apple's files only on the machines affected instead of symlinking
until we can coordinate a new version update. It seems that we are
only being affected in Canada...

Thanks again,
Michael

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Simon Riggs

On Mon, 2007-03-12 at 17:46 -0700, Jeff Davis wrote:
 On Mon, 2007-03-12 at 13:21 +, Simon Riggs wrote:
  So based on those thoughts, sync_scan_offset should be fixed at 16,
  rather than being variable. In addition, ss_report_loc() should only
  report its position every 16 blocks, rather than do this every time,
  which will reduce overhead of this call.
 
 If we fix sync_scan_offset at 16, we might as well just get rid of it.
 Sync scans are only useful on large tables, and getting a free 16 pages
 over a scan isn't worth the trouble. However, even without
 sync_scan_offset, 

Not sure what you mean by a free 16 pages. Please explain?

 sync scans are still a valuable feature.

I have always thought synch scans are a valuable feature too. 

 I agree that ss_report_loc() doesn't need to report on every call. If
 there's any significant overhead I agree that it should report less
 often. Do you think that the overhead is significant on such a simple
 function?

Lets try without it and see. There's no need to access shared memory so
often.

 I like the idea of reducing tuning parameters, but we should, at a
 minimum, still allow an on/off button for sync scans. My tests revealed
 that the wrong combination of OS/FS/IO-Scheduler/Controller could result
 in bad I/O behavior.

Agreed

  If need be, the value of scan_recycle_buffers can be varied upwards
  should the scans drift apart, as a way of bringing them back together.
 
 If the scans aren't being brought together, that means that one of the
 scans is CPU bound or outside the combined cache trail (shared_buffers
 + OS buffer cache). 
 
  We aren't tracking whether they are together or apart, so I would like
  to see some debug output from synch scans to allow us to assess how far
  behind the second scan is as it progresses. e.g.
  LOG:  synch scan currently on block N, trailing pathfinder by M blocks
  issued every 128 blocks as we go through the scans. 
  
  Thoughts?
  
 
 It's hard to track where all the scans are currently. One of the
 advantages of my patch is its simplicity: the scans don't need to know
 about other specific scans, and there is no concept in the code of a
 head scan or a pack.

I'd still like to be able to trace each scan to see how far ahead/behind
it is from the other scans on the same table, however we do that.

Any backend can read the position of other backend's scans, so it should
be easy enough to put in a regular LOG entry that shows how far
ahead/behind they are from other scans. We can trace just one backend
and have it report on where it is with respect to other backends, or you
could have them all calculate their position and have just the lead scan
report the position of all other scans.

 There is no easy way to tell which scan is ahead and which is behind.
 There was a discussion when I submitted this proposal at the beginning
 of 8.3, but I didn't see enough benefit to justify all of the costs and
 risks associated with scans communicating between eachother. I
 certainly can't implement that kind of thing before feature freeze, and
 I think there's a risk of lock contention for the communication
 required. I'm also concerned that -- if the scans are too
 interdependent -- it would make postgres less robust against the
 disappearance of a single backend (i.e. what if the backend that is
 leading a scan dies?).

I've not mentioned that again.

I'd like to see the trace option to allow us to tell whether its working
as well as we'd like it to pre-release and in production. Also I want to
see whether various settings of scan_recycle_buffers help/hinder the
effectiveness of synch scans, as others have worried it might.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Jeff Davis

On Tue, 2007-03-13 at 10:08 -0700, Jeff Davis wrote:
  One extra LWLock cycle per page processed definitely *is* a significant
  overhead ... can you say context swap storm?  I'd think about doing it
  once every 100 or so pages.
  
 
 No lock is needed to store the hint. If somehow the hint (which is
 stored in a static table, no pointers) gets invalid data due to a race
 condition, the new scan will simply consider the hint invalid and start
 at 0.
 
 I did this precisely to avoid causing a performance regression for usage
 patterns that don't benefit from sync scans.
 

I'd also like to add that, if a lock was required, a constant offset
would also seem to prone to a context swap storm; it would just happen
100th as much. We'd need to do something to spread the locks over time.

That being said, I'll adjust it to report once per hundred pages anyway,
because there's really no drawback.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Simon Riggs

On Tue, 2007-03-13 at 10:08 -0700, Jeff Davis wrote:
 On Tue, 2007-03-13 at 12:53 -0400, Tom Lane wrote:
  Jeff Davis [EMAIL PROTECTED] writes:
   I agree that ss_report_loc() doesn't need to report on every call. If
   there's any significant overhead I agree that it should report less
   often. Do you think that the overhead is significant on such a simple
   function?
  
  One extra LWLock cycle per page processed definitely *is* a significant
  overhead ... can you say context swap storm?  I'd think about doing it
  once every 100 or so pages.
  
 
 No lock is needed to store the hint. If somehow the hint (which is
 stored in a static table, no pointers) gets invalid data due to a race
 condition, the new scan will simply consider the hint invalid and start
 at 0.
 
 I did this precisely to avoid causing a performance regression for usage
 patterns that don't benefit from sync scans.

Shared memory access is still a performance/scalability concern because
so many people want access to it at the same time. 

There really is no need to do this after each block. 8 CPUs ought to be
able to do 8 scans without tripping over each other. Especially if they
are on separate tables.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread David Fetter

On Tue, Mar 13, 2007 at 02:21:37PM +, Richard Huxton wrote:
 David Fetter wrote:
 On Tue, Mar 13, 2007 at 09:31:45AM +, Richard Huxton wrote:
 * Another good example is the questionnaire.
 
 With all due respect, this is a solved problem *without EAV or
 run-time DDL*.  The URL below has one excellent approach to this.
 
 http://www.varlena.com/GeneralBits/110.php
 
 Which broadly speaking was the solution I used for my questionnaire,
 except I had a restricted set of types so basically just coerced
 them to text and side-stepped the inheritance issue. To the extent
 that it's dynamic, it's still just EAV though.

That's precisely the difference between the above solution and yours,
and it's the difference between a good design and one that will come
up and bit you on the as^Hnkle.

 It doesn't remove the need for run-time DDL if you allow users to add 
 their own questions.

Sure it does.   When a user, who should be talking with you, wants to
ask a new kind of question, that's the start of a discussion about
what new kind(s) of questions would be generally applicable in the
questionnaire schema.  Then, when you come to an agreement, you roll
it into the new schema, and the whole system gets an improvement.

 If I add a National Insurance No. (Social 
 security No) then either it's:
 1. Plain text, and can take clearly invalid codes
 2. A user-space construct with regexp matches etc (basically recreating DDL)
 3. DDL.

DDL, yes.  Run-time, no.

 And as the example says, you need to create the table types in advance. 
 If you want to add e.g. Work History (employer, from_date, to_date) to 
 a questionnaire then you'll need dynamic DDL (of form #2 or #3 above).

Again see above for new types of questions and answers.  You can pay
for it once up front in a re-jigger of the schema, or you will pay a
much larger price when you discover you've got EAV goo all over the
place.  There is a third option, of course, which is, I'm sorry,
Dave.  I can't do that. ;)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Jeff Davis

On Tue, 2007-03-13 at 17:17 +, Simon Riggs wrote:
 On Tue, 2007-03-13 at 10:08 -0700, Jeff Davis wrote:
  On Tue, 2007-03-13 at 12:53 -0400, Tom Lane wrote:
   Jeff Davis [EMAIL PROTECTED] writes:
I agree that ss_report_loc() doesn't need to report on every call. If
there's any significant overhead I agree that it should report less
often. Do you think that the overhead is significant on such a simple
function?
   
   One extra LWLock cycle per page processed definitely *is* a significant
   overhead ... can you say context swap storm?  I'd think about doing it
   once every 100 or so pages.
   
  
  No lock is needed to store the hint. If somehow the hint (which is
  stored in a static table, no pointers) gets invalid data due to a race
  condition, the new scan will simply consider the hint invalid and start
  at 0.
  
  I did this precisely to avoid causing a performance regression for usage
  patterns that don't benefit from sync scans.
 
 Shared memory access is still a performance/scalability concern because
 so many people want access to it at the same time. 
 
 There really is no need to do this after each block. 8 CPUs ought to be
 able to do 8 scans without tripping over each other. Especially if they
 are on separate tables.
 

Ok, I'll do it every 100 pages.

Regards,
Jeff Davis




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Josh Berkus

Michael,

 Currently Apple's format appears to work fine with postgresql. And
 given the responses and to make a quick job of it I will be copying
 Apple's files only on the machines affected instead of symlinking
 until we can coordinate a new version update. It seems that we are
 only being affected in Canada...

Canada and AU changed their DST rules with  4 months notice.  

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Joshua D. Drake

Hello,

When updating for DST we found this nifty problem. This does not happen
on 8.2.

postgres=# SET TIMEZONE TO 'PST8PDT';
SET
postgres=# select now();
  now
---
 2007-03-13 10:41:07.034754-07
(1 row)

postgres=# SET TIMEZONE TO PST8PDT;
SET
postgres=# SELECT NOW();
  now
---
 2007-03-13 09:41:07.036069-08
(1 row)

postgres=# set time zone PST8PDT;
SET
postgres=# select now();
  now
---
 2007-03-13 09:41:07.038045-08
(1 row)

postgres=# SET TIME ZONE 'PST8PDT';
SET
postgres=# select now();
  now
---
 2007-03-13 10:41:07.039145-07
(1 row)



-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Hannu Krosing

Ühel kenal päeval, T, 2007-03-13 kell 12:53, kirjutas Tom Lane:
 Jeff Davis [EMAIL PROTECTED] writes:
  I agree that ss_report_loc() doesn't need to report on every call. If
  there's any significant overhead I agree that it should report less
  often. Do you think that the overhead is significant on such a simple
  function?
 
 One extra LWLock cycle per page processed definitely *is* a significant
 overhead ... can you say context swap storm?  I'd think about doing it
 once every 100 or so pages.

Can't we do it in some lock-free way ? writing page numbers (4-byte
ints) into a predetermined location isn shared mem should be atomic on
all platforms we support (still may cause some cache ping-pong in
multiprocessor systems, but this should be much cheaper), and even an
occasional error in establishing the scan head should not be
catastrophic.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Joshua D. Drake

Joshua D. Drake wrote:
 Hello,
 
 When updating for DST we found this nifty problem. This does not happen
 on 8.2.
 

Updated to show timezone:

postgres=# SET TIMEZONE TO 'PST8PDT';
SET
postgres=# select now();
  now
---
 2007-03-13 10:44:33.156381-07
(1 row)

postgres=# SHOW TIMEZONE;
 TimeZone
--
 PST8PDT
(1 row)

postgres=# SET TIMEZONE TO PST8PDT;
SET
postgres=# SHOW TIMEZONE;
 TimeZone
--
 pst8pdt
(1 row)

postgres=# SELECT NOW();
 now
--
 2007-03-13 09:44:33.16286-08
(1 row)

postgres=# set time zone PST8PDT;
SET
postgres=# SHOW time zone;
 TimeZone
--
 pst8pdt
(1 row)

postgres=# select now();
  now
---
 2007-03-13 09:44:33.166297-08
(1 row)

postgres=# SET TIME ZONE 'PST8PDT';
SET
postgres=# SHOW TIME ZONE;
 TimeZone
--
 PST8PDT
(1 row)

postgres=# select now();
  now
---
 2007-03-13 10:44:38.352985-07
(1 row)

postgres=#




 postgres=# SET TIMEZONE TO 'PST8PDT';
 SET
 postgres=# select now();
   now
 ---
  2007-03-13 10:41:07.034754-07
 (1 row)
 
 postgres=# SET TIMEZONE TO PST8PDT;
 SET
 postgres=# SELECT NOW();
   now
 ---
  2007-03-13 09:41:07.036069-08
 (1 row)
 
 postgres=# set time zone PST8PDT;
 SET
 postgres=# select now();
   now
 ---
  2007-03-13 09:41:07.038045-08
 (1 row)
 
 postgres=# SET TIME ZONE 'PST8PDT';
 SET
 postgres=# select now();
   now
 ---
  2007-03-13 10:41:07.039145-07
 (1 row)
 
 
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Zdenek Kotala


Tom Lane wrote:

Josh Berkus josh@agliodbs.com writes:

Michael,

I'm also curious about the rationale to maintain a separate timezone
data files for machines that supply them.


It's because we found that we couldn't ensure consistency between operating 
systems while relying on OS files.


Partly that, and partly that we needed operations that the standard C
library doesn't supply.  Hence we had to have direct access to the
timezone database, and since different systems have different
representations of timezone data, we couldn't really rely on the
system's data.

You can try the symlink game if you want, but it'll be on your own head
whether it works or not.  (For the record, I am hoping to do exactly
that in future releases for Red Hat ... but in that context I know what
the system's timezone code is.  I'm less sure that I know what Apple
is using.)



I have following idea:

1) add guc varibale which enable usage of OS time zone files

2) add extra parameters into ./configure script which enable OS TZ 
support in the code and get path to OS TZ files.


It probably will require some OS specific code for reading and 
translation TZ file format.


   What do you mean about it?

Thanks Zdenek

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread Richard Huxton


David Fetter wrote:

On Tue, Mar 13, 2007 at 02:21:37PM +, Richard Huxton wrote:

David Fetter wrote:

On Tue, Mar 13, 2007 at 09:31:45AM +, Richard Huxton wrote:

* Another good example is the questionnaire.

With all due respect, this is a solved problem *without EAV or
run-time DDL*.  The URL below has one excellent approach to this.

http://www.varlena.com/GeneralBits/110.php

Which broadly speaking was the solution I used for my questionnaire,
except I had a restricted set of types so basically just coerced
them to text and side-stepped the inheritance issue. To the extent
that it's dynamic, it's still just EAV though.


That's precisely the difference between the above solution and yours,
and it's the difference between a good design and one that will come
up and bit you on the as^Hnkle.


It's still basically EAV (either approach). The key fault with EAV is 
that the tables have no semantic meaning - answer_int contains number of 
oranges, days since birth and the price of a tube ticket in pennies. 
Now, with a questionnaire that might not matter because everything is an 
answer and you're not necessarily going to do much more than 
count/aggregate it.


It doesn't remove the need for run-time DDL if you allow users to add 
their own questions.


Sure it does.   When a user, who should be talking with you, wants to
ask a new kind of question, that's the start of a discussion about
what new kind(s) of questions would be generally applicable in the
questionnaire schema.  Then, when you come to an agreement, you roll
it into the new schema, and the whole system gets an improvement.


Fine, but if you're not letting the user extend the system, then it's 
not really addressing Edward's original posting, is it? If the user's 
talking to me, I might as well just write the DDL myself - it's the talk 
that'll take the time, not writing a dozen lines of SQL.


The interesting part of the problem (from a Comp-Sci point of view) is 
precisely in automating part of that discussion. It's providing an 
abstraction so that you don't end up with a mass of attributes while 
still providing freedom to the user.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Josh Berkus

Zdenec,

 I have following idea:

 1) add guc varibale which enable usage of OS time zone files

 2) add extra parameters into ./configure script which enable OS TZ
 support in the code and get path to OS TZ files.

If we're adding it as a configure-time variable, there's no reason to have 
a GUC.

 It probably will require some OS specific code for reading and
 translation TZ file format.

This would require each OS to make a committment to support their 
compatibility code.  Like, we would have to maintain the Solaris code, if 
any was required.  Ideally, no glue code would be required.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Andrew Dunstan


Josh Berkus wrote:

Tom,

  

You can try the symlink game if you want, but it'll be on your own head
whether it works or not.  (For the record, I am hoping to do exactly
that in future releases for Red Hat ... but in that context I know what
the system's timezone code is.  I'm less sure that I know what Apple
is using.)



Yeah, Solaris wants us to do the same thing ... use their files.  At some 
point, I'll have to hack it and see how it works.


  


perhaps we should have a configure flag like 
--with-timezone-base=/usr/share/zoneinfo


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] HOT WIP Patch - Version 4.4

2007-03-13 Thread Pavan Deolasee



Please see the version 4.4 of HOT WIP patch posted on pgsql-patches.
I have  fixed couple of bugs in the earlier version posted. Other than
that there are not any significant changes in the patch.

The row-level fragmentation had a bug where we were
unintentionally sorting the line pointers array more than
once. Also, the defragmented lengths were computed wrongly
and was a source of many errors.

Another bug fix was in the heap_hot_fetch() code path where
we were asserting on finding a LP_DELETEd tuple in the hot
chain. I later realized that this is not required and we
should rather just assume that the chain is broken, something
very similar to the xmax/xmin checks.


Thanks,
Pavan

--


EnterpriseDBhttp://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Daylight Saving Time question PostgreSQL 8.1.4

2007-03-13 Thread Tom Lane

Josh Berkus josh@agliodbs.com writes:
 Zdenec,
 I have following idea:
 1) add guc varibale which enable usage of OS time zone files
 2) add extra parameters into ./configure script which enable OS TZ
 support in the code and get path to OS TZ files.

 If we're adding it as a configure-time variable, there's no reason to have 
 a GUC.

I see zero reason to have either.  It would only make sense to do this
in the context of a platform-specific distribution such as an RPM, and
in that context the simplest solution is to let the RPM specfile make
the substitution (ie, after make install and before packaging,
rm -rf PG's timezone tree and insert a symlink).  Then it's on the RPM
packager's head whether it's the right thing to do or not.  A configure
switch strikes me as mostly a foot-gun, because the average user of
Postgres won't have any way to know whether the files are compatible.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-13 Thread David Fetter

On Tue, Mar 13, 2007 at 05:54:34PM +, Richard Huxton wrote:
 David Fetter wrote:
 On Tue, Mar 13, 2007 at 02:21:37PM +, Richard Huxton wrote:
 David Fetter wrote:
 On Tue, Mar 13, 2007 at 09:31:45AM +, Richard Huxton wrote:
 * Another good example is the questionnaire.
 With all due respect, this is a solved problem *without EAV or
 run-time DDL*.  The URL below has one excellent approach to this.
 
 http://www.varlena.com/GeneralBits/110.php
 Which broadly speaking was the solution I used for my
 questionnaire, except I had a restricted set of types so basically
 just coerced them to text and side-stepped the inheritance issue.
 To the extent that it's dynamic, it's still just EAV though.
 
 That's precisely the difference between the above solution and
 yours, and it's the difference between a good design and one that
 will come up and bit you on the as^Hnkle.
 
 It's still basically EAV (either approach).  The key fault with EAV
 is that the tables have no semantic meaning - answer_int contains
 number of oranges, days since birth and the price of a tube ticket
 in pennies.

Stuffing all of those into an answer_int is *precisely* what the end
user must not do.  That's pilot error.

 Now, with a questionnaire that might not matter because everything
 is an answer and you're not necessarily going to do much more than
 count/aggregate it.

See above.

 It doesn't remove the need for run-time DDL if you allow users to
 add their own questions.
 
 Sure it does.   When a user, who should be talking with you, wants
 to ask a new kind of question, that's the start of a discussion
 about what new kind(s) of questions would be generally applicable
 in the questionnaire schema.  Then, when you come to an agreement,
 you roll it into the new schema, and the whole system gets an
 improvement.
 
 Fine, but if you're not letting the user extend the system, then
 it's not really addressing Edward's original posting, is it?

It's my contention that Edward's original idea is ill-posed.  SQL is
just fine for doing this kind of thing, and it's *not that hard*.

 If the user's talking to me, I might as well just write the DDL
 myself - it's the talk that'll take the time, not writing a dozen
 lines of SQL.

It's the talk that's the important part.  Machines are really bad at
seeing the broader picture.  In the attempt to save a few minutes'
discussion, he's trying to borrow that time from a system asked to do
things that computers are inherently bad at doing, and every end user
will pay that time back at a very high rate of interest.  This is
precisely the kind of false economy that so plagues software
development and maintenance these days.

 The interesting part of the problem (from a Comp-Sci point of view)
 is precisely in automating part of that discussion.  It's providing
 an abstraction so that you don't end up with a mass of attributes
 while still providing freedom to the user.

This freedom and efficiency you're talking about is better supplied,
IMHO, by putting a standard DDL for questionnaires up on a pgfoundry
or an SF.net.  That way, improvements to the DDL get spread all over
the world, and a very large amount of wheel reinvention gets avoided.
Reusable components are a big chunk of both freedom and efficiency. :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Jeff Davis

On Tue, 2007-03-13 at 17:11 +, Simon Riggs wrote:
 On Mon, 2007-03-12 at 17:46 -0700, Jeff Davis wrote:
  On Mon, 2007-03-12 at 13:21 +, Simon Riggs wrote:
   So based on those thoughts, sync_scan_offset should be fixed at 16,
   rather than being variable. In addition, ss_report_loc() should only
   report its position every 16 blocks, rather than do this every time,
   which will reduce overhead of this call.
  
  If we fix sync_scan_offset at 16, we might as well just get rid of it.
  Sync scans are only useful on large tables, and getting a free 16 pages
  over a scan isn't worth the trouble. However, even without
  sync_scan_offset, 
 
 Not sure what you mean by a free 16 pages. Please explain?
 

By free I mean already in cache, and therefore don't have to do I/O to
get it. I used the term loosely above, so let me re-explain:

My only point was that 16 is essentially 0 when it comes to
sync_scan_offset, because it's a small number of blocks over the course
of the scan of a large table.

If sync_scan_offset is 0, my patch will cause scans on a big table to
start where other scans are, and those scans should tend to stay
together and use newly-cached pages efficiently (and achieve the primary
goal of the patch).

The advantage of sync_scan_offset is that, in some situations, a second
scan can actually finish faster than if it were the only query
executing, because a previous scan has already caused some blocks to be
cached. However, 16 is a small number because that benefit would only be
once per scan, and sync scans are only helpful on large tables.

  I like the idea of reducing tuning parameters, but we should, at a
  minimum, still allow an on/off button for sync scans. My tests revealed
  that the wrong combination of OS/FS/IO-Scheduler/Controller could result
  in bad I/O behavior.
 
 Agreed
 

Do you have an opinion about sync_scan_threshold versus a simple
sync_scan_enable?

 I'd still like to be able to trace each scan to see how far ahead/behind
 it is from the other scans on the same table, however we do that.
 
 Any backend can read the position of other backend's scans, so it should

Where is that information stored? Right now my patch will overwrite the
hints of other backends, because I'm using a static data structure
(rather than one that grows). I do this to avoid the need for locking.

 be easy enough to put in a regular LOG entry that shows how far
 ahead/behind they are from other scans. We can trace just one backend
 and have it report on where it is with respect to other backends, or you
 could have them all calculate their position and have just the lead scan
 report the position of all other scans.
 

I already have each backend log it's progression through the tablescan
every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I
currently use this information to see whether scans are staying together
or not. I think this gives us the information we need without backends
needing to communicate the information during execution.

I think I will increase the resolution of the scan progress so that we
can track every 5k or even 1k blocks read per pid per scan. That might
tell us more about the shared memory usage versus OS cache.

Is there any other information you need reported?

 I'd like to see the trace option to allow us to tell whether its working
 as well as we'd like it to pre-release and in production. Also I want to
 see whether various settings of scan_recycle_buffers help/hinder the
 effectiveness of synch scans, as others have worried it might.
 

Can you tell me what you mean by trace option, if you mean something
different than tracking the relative positions of the scans?

I will update my patch and send it along so that we can see how they
work together. 

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 When updating for DST we found this nifty problem. This does not happen
 on 8.2.

And your point is?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Joshua D. Drake

Tom Lane wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:
 When updating for DST we found this nifty problem. This does not happen
 on 8.2.
 
 And your point is?

Was the test case not clear? Is there something else I should have done?
Does this not seem a problem to you?

Sincerely,

Joshua D. Drake


 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 Does this not seem a problem to you?

No.  8.2 is case-insensitive on timezone names, but prior releases were not.
I believe the difference you're seeing in 8.1 is that 'PST8PDT' binds to
the zic database entry by that name, while 'pst8pdt' falls back on the
POSIX-default rules, which are presumably still the old DST law.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Joshua D. Drake

Tom Lane wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:
 Does this not seem a problem to you?
 
 No.  8.2 is case-insensitive on timezone names, but prior releases were not.
 I believe the difference you're seeing in 8.1 is that 'PST8PDT' binds to
 the zic database entry by that name, while 'pst8pdt' falls back on the
 POSIX-default rules, which are presumably still the old DST law.

Well in my test case, I explicitly note that PST8PDT != 'PST8PDT' .

That is my actual concern. Before DST it wasn't an issue, now it is. Is
that PostgreSQL's problem? Probably not, but it certainly seems
inconsistent.

Sincerely,

Joshua D. Drake


 
   regards, tom lane
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 That is my actual concern. Before DST it wasn't an issue, now it is. Is
 that PostgreSQL's problem? Probably not, but it certainly seems
 inconsistent.

[ digs around... ]  Here we are (in localtime.c):

/*
 * The DST rules to use if TZ has no rules and we can't load TZDEFRULES.
 * We default to US rules as of 1999-08-17.
 * POSIX 1003.1 section 8.1.1 says that the default DST rules are
 * implementation dependent; for historical reasons, US rules are a
 * common default.
 */
#define TZDEFRULESTRING ,M4.1.0,M10.5.0

AFAICS there is not any provision in the code that reads that string to
have different rules in different years, so you're more or less stuck
with being wrong before 2007 or being wrong after.  Realistically this
means that POSIX-style rule strings are no longer useful, and you'd
better start using one of the zic timezone names.  Which, prior to 8.2,
means you'd better spell PST8PDT in upper case ... or maybe insert a
symlink in the timezone/ tree to make the lower-case name valid ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Joshua D. Drake


 AFAICS there is not any provision in the code that reads that string to
 have different rules in different years, so you're more or less stuck
 with being wrong before 2007 or being wrong after.  Realistically this
 means that POSIX-style rule strings are no longer useful, and you'd
 better start using one of the zic timezone names.  Which, prior to 8.2,
 means you'd better spell PST8PDT in upper case ... or maybe insert a
 symlink in the timezone/ tree to make the lower-case name valid ...

O.k., really trying to not be difficult here but... if we modified 8.1.9
(to be released) to not fold the case of the unquoted timezone, doesn't
that resolve the issue entirely? (well it is a band-aid but still).

Joshua D. Drake


 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Simon Riggs

On Tue, 2007-03-13 at 11:28 -0700, Jeff Davis wrote:
 On Tue, 2007-03-13 at 17:11 +, Simon Riggs wrote:
  On Mon, 2007-03-12 at 17:46 -0700, Jeff Davis wrote:
   On Mon, 2007-03-12 at 13:21 +, Simon Riggs wrote:
So based on those thoughts, sync_scan_offset should be fixed at 16,
rather than being variable. In addition, ss_report_loc() should only
report its position every 16 blocks, rather than do this every time,
which will reduce overhead of this call.
   
   If we fix sync_scan_offset at 16, we might as well just get rid of it.
   Sync scans are only useful on large tables, and getting a free 16 pages
   over a scan isn't worth the trouble. However, even without
   sync_scan_offset, 
  
  Not sure what you mean by a free 16 pages. Please explain?
  
 
 By free I mean already in cache, and therefore don't have to do I/O to
 get it. I used the term loosely above, so let me re-explain:
 
 My only point was that 16 is essentially 0 when it comes to
 sync_scan_offset, because it's a small number of blocks over the course
 of the scan of a large table.
 
 If sync_scan_offset is 0, my patch will cause scans on a big table to
 start where other scans are, and those scans should tend to stay
 together and use newly-cached pages efficiently (and achieve the primary
 goal of the patch).

OK

 The advantage of sync_scan_offset is that, in some situations, a second
 scan can actually finish faster than if it were the only query
 executing, because a previous scan has already caused some blocks to be
 cached. However, 16 is a small number because that benefit would only be
 once per scan, and sync scans are only helpful on large tables.

Alright, understood. That last part is actually something I now want to
avoid because it's using the current cache-spoiling behaviour of
seqscans to advantage. I'd like to remove that behaviour, but it sounds
like we can have both
- SeqScans that don't spoil cache
- Synch scans
by setting sync_scan_offset to zero.

   I like the idea of reducing tuning parameters, but we should, at a
   minimum, still allow an on/off button for sync scans. My tests revealed
   that the wrong combination of OS/FS/IO-Scheduler/Controller could result
   in bad I/O behavior.
  
  Agreed
  
 
 Do you have an opinion about sync_scan_threshold versus a simple
 sync_scan_enable?

enable_sync_scan?

  I'd still like to be able to trace each scan to see how far ahead/behind
  it is from the other scans on the same table, however we do that.
  
  Any backend can read the position of other backend's scans, so it should
 
 Where is that information stored? Right now my patch will overwrite the
 hints of other backends, because I'm using a static data structure
 (rather than one that grows). I do this to avoid the need for locking.

OK, well, we can still read it before we overwrite it to calc the
difference. That will at least allow us to get a difference between
points as we go along. That seems like its worth having, even if it
isn't accurate for 3+ concurrent scans.

  be easy enough to put in a regular LOG entry that shows how far
  ahead/behind they are from other scans. We can trace just one backend
  and have it report on where it is with respect to other backends, or you
  could have them all calculate their position and have just the lead scan
  report the position of all other scans.
  
 
 I already have each backend log it's progression through the tablescan
 every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I
 currently use this information to see whether scans are staying together
 or not. I think this gives us the information we need without backends
 needing to communicate the information during execution.

Well, that is good, thank you for adding that after initial discussions.

Does it have the time at which a particular numbered block is reached?
(i.e. Block #117 is not the same thing as the 117th block scanned). We
can use that to compare the time difference of each scan.

 I think I will increase the resolution of the scan progress so that we
 can track every 5k or even 1k blocks read per pid per scan. That might
 tell us more about the shared memory usage versus OS cache.
 
 Is there any other information you need reported?

Not sure yet! I just want to look one level deeper, to see if everything
is working like we think it should.

  I'd like to see the trace option to allow us to tell whether its working
  as well as we'd like it to pre-release and in production. Also I want to
  see whether various settings of scan_recycle_buffers help/hinder the
  effectiveness of synch scans, as others have worried it might.
  
 
 Can you tell me what you mean by trace option, if you mean something
 different than tracking the relative positions of the scans?
 
 I will update my patch and send it along so that we can see how they
 work together. 

Great

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end

Re: [HACKERS] 8.1.x (tested 8.1.8) timezone bugs

2007-03-13 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 O.k., really trying to not be difficult here but... if we modified 8.1.9
 (to be released) to not fold the case of the unquoted timezone, doesn't
 that resolve the issue entirely? (well it is a band-aid but still).

No, it doesn't; the issue is what happens when there's no match in the
zic database for your TZ name.

Not to mention that we can't realistically put the case-folding change
into a minor release, both for compatibility reasons and because it was
part of a fairly major overhaul of the timezone code.  I no longer
recall what-all it was connected to, but there were a number of
interrelated changes there.

Digging around some more in the zic code, I note that zic has an option
-p to select a specific known timezone as posixrules, and that the
fallback code seems to prefer that to the hardwired string.  That might
work for us, if what it takes from the zone is just the DST rules and
not the specific offset.  Why don't you try
ln -s EST5EDT /usr/share/postgresql/timezone/posixrules
(adjust path as needed) and then see how it behaves with 'pst8pdt'?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Jeff Davis

On Tue, 2007-03-13 at 19:24 +, Simon Riggs wrote:
  The advantage of sync_scan_offset is that, in some situations, a second
  scan can actually finish faster than if it were the only query
  executing, because a previous scan has already caused some blocks to be
  cached. However, 16 is a small number because that benefit would only be
  once per scan, and sync scans are only helpful on large tables.
 
 Alright, understood. That last part is actually something I now want to
 avoid because it's using the current cache-spoiling behaviour of
 seqscans to advantage. I'd like to remove that behaviour, but it sounds
 like we can have both
 - SeqScans that don't spoil cache
 - Synch scans
 by setting sync_scan_offset to zero.
 

Precisely.

If there is a cache-spoiling effect of the OS buffer cache that we want
to take advantage of, we could still set it to a non-zero value. But the
utility of sync_scan_offset does decrease with your patch, so removing
it altogether is a possibility (hopefully the numbers will tell us what
to do).


  Do you have an opinion about sync_scan_threshold versus a simple
  sync_scan_enable?
 
 enable_sync_scan?
 

After looking at other GUC names, I suggest that it's either
sync_scan (for on/off) or sync_scan_threshold (if we do want to
allow a numerical threshold). All the GUCs beginning with enable_ are
planner settings.

If we only allow on/off, we could probably just sync scan every table
because of your recycle_buffers patch.

   I'd still like to be able to trace each scan to see how far ahead/behind
   it is from the other scans on the same table, however we do that.
   
   Any backend can read the position of other backend's scans, so it should
  
  Where is that information stored? Right now my patch will overwrite the
  hints of other backends, because I'm using a static data structure
  (rather than one that grows). I do this to avoid the need for locking.
 
 OK, well, we can still read it before we overwrite it to calc the
 difference. That will at least allow us to get a difference between
 points as we go along. That seems like its worth having, even if it
 isn't accurate for 3+ concurrent scans.

Let me know if the things I list below don't cover what the information
you're looking for here. It would be easy for me to emit a log message
at the time it's overwriting the hint, but that would be a lot of noise:
every time ss_report_loc() is called, which we discussed would be once
per 100 pages read per scan. 

   be easy enough to put in a regular LOG entry that shows how far
   ahead/behind they are from other scans. We can trace just one backend
   and have it report on where it is with respect to other backends, or you
   could have them all calculate their position and have just the lead scan
   report the position of all other scans.
   
  
  I already have each backend log it's progression through the tablescan
  every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I
  currently use this information to see whether scans are staying together
  or not. I think this gives us the information we need without backends
  needing to communicate the information during execution.
 
 Well, that is good, thank you for adding that after initial discussions.
 
 Does it have the time at which a particular numbered block is reached?
 (i.e. Block #117 is not the same thing as the 117th block scanned). We
 can use that to compare the time difference of each scan.

Right now it logs when a scan starts, what block number of the table it
starts on, and also prints out the current block it's scanning every N
blocks (100k or 10k depending on debug level). The time and the pid are,
of course, available from log_prefix.

I'll add the table OID to each log message in case we test, for example,
a single backend scanning multiple tables at once.

I'll also clean it up a bit, so that the information is a little easier
to grep out of the postgres logfiles and easier to analyze/graph.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Synchronized Scan update

2007-03-13 Thread Simon Riggs

On Tue, 2007-03-13 at 13:39 -0700, Jeff Davis wrote:

   Do you have an opinion about sync_scan_threshold versus a simple
   sync_scan_enable?
  
  enable_sync_scan?
  
 
 After looking at other GUC names, I suggest that it's either
 sync_scan (for on/off) or sync_scan_threshold (if we do want to
 allow a numerical threshold). All the GUCs beginning with enable_ are
 planner settings.

How about: sync_seqscans

so the phrase matches the equivalent enable_ parameter

 If we only allow on/off, we could probably just sync scan every table
 because of your recycle_buffers patch.

The buffer recycling only makes sense for large scans, so there's an
exact match for when both techniques need to kick-in.

I think I'd just lose this parameter and have it kick-in at either
NBuffers or NBuffers/2. We don't need another parameter...

I'm not planning to have scan_recycle_buffers continue into the
production version.

I'd still like to be able to trace each scan to see how far ahead/behind
it is from the other scans on the same table, however we do that.

Any backend can read the position of other backend's scans, so it should
   
   Where is that information stored? Right now my patch will overwrite the
   hints of other backends, because I'm using a static data structure
   (rather than one that grows). I do this to avoid the need for locking.
  
  OK, well, we can still read it before we overwrite it to calc the
  difference. That will at least allow us to get a difference between
  points as we go along. That seems like its worth having, even if it
  isn't accurate for 3+ concurrent scans.
 
 Let me know if the things I list below don't cover what the information
 you're looking for here. It would be easy for me to emit a log message
 at the time it's overwriting the hint, but that would be a lot of noise:
 every time ss_report_loc() is called, which we discussed would be once
 per 100 pages read per scan. 
 
be easy enough to put in a regular LOG entry that shows how far
ahead/behind they are from other scans. We can trace just one backend
and have it report on where it is with respect to other backends, or you
could have them all calculate their position and have just the lead scan
report the position of all other scans.

   
   I already have each backend log it's progression through the tablescan
   every 100k blocks to DEBUG (higher DEBUG gives every 10k blocks). I
   currently use this information to see whether scans are staying together
   or not. I think this gives us the information we need without backends
   needing to communicate the information during execution.
  
  Well, that is good, thank you for adding that after initial discussions.
  
  Does it have the time at which a particular numbered block is reached?
  (i.e. Block #117 is not the same thing as the 117th block scanned). We
  can use that to compare the time difference of each scan.
 
 Right now it logs when a scan starts, what block number of the table it
 starts on, and also prints out the current block it's scanning every N
 blocks (100k or 10k depending on debug level). The time and the pid are,
 of course, available from log_prefix.

Can you make it log every block whose id is divisible by 100k or 10k?

Otherwise one scan will log blocks 100,000... 200,000 ... etc and the
next scan will log 17357 117357 ... etc which will be much harder to
work out. That will give us lap times for every 100,000 blocks.

I'm particularly interested in the turning point where the scan starts
again at the beginning of the file. It would be good to know what
blockid it turned at and when that was. We may get out of step at that
point. Maybe. We'll find out.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bug in VACUUM FULL ?

2007-03-13 Thread Tom Lane

 Pavan Deolasee [EMAIL PROTECTED] writes:
 The problem mentioned before is hard to reproduce with the
 suggested change, but its not completely gone away. I have
 seen that again on CVS HEAD with the patch applied.
 I am facing another issue with VACUUM FULL. This
 problem gets reproduced with HOT very easily, but takes
 few attempts to reproduce with CVS HEAD, but it
 certainly exists.

I've developed the attached patch against HEAD, and no longer see any
funny behavior.  Would appreciate it if you'd test some more, though.

regards, tom lane

Index: vacuum.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.348
diff -c -r1.348 vacuum.c
*** vacuum.c13 Mar 2007 00:33:40 -  1.348
--- vacuum.c13 Mar 2007 18:00:14 -
***
*** 1880,1885 
--- 1880,1894 
 * To be on the safe side, we abandon the repair_frag 
process if
 * we cannot find the parent tuple in vtlinks.  This 
may be overly
 * conservative; AFAICS it would be safe to move the 
chain.
+*
+* Also, because we distinguish DEAD and RECENTLY_DEAD 
tuples
+* using OldestXmin, which is a rather coarse test, it 
is quite
+* possible to have an update chain in which a tuple we 
think is
+* RECENTLY_DEAD links forward to one that is 
definitely DEAD.
+* In such a case the RECENTLY_DEAD tuple must actually 
be dead,
+* but it seems too complicated to try to make VACUUM 
remove it.
+* We treat each contiguous set of RECENTLY_DEAD tuples 
as a
+* separately movable chain, ignoring any intervening 
DEAD ones.
 */
if (((tuple.t_data-t_infomask  HEAP_UPDATED) 
 
!TransactionIdPrecedes(HeapTupleHeaderGetXmin(tuple.t_data),
***
*** 1892,1897 
--- 1901,1907 
Buffer  Cbuf = buf;
boolfreeCbuf = false;
boolchain_move_failed = false;
+   boolmoved_target = false;
ItemPointerData Ctid;
HeapTupleData tp = tuple;
Sizetlen = tuple_len;
***
*** 1919,1925 
 * If this tuple is in the begin/middle of the 
chain then we
 * have to move to the end of chain.  As with 
any t_ctid
 * chase, we have to verify that each new tuple 
is really the
!* descendant of the tuple we came from.
 */
while (!(tp.t_data-t_infomask  
(HEAP_XMAX_INVALID |

  HEAP_IS_LOCKED)) 
--- 1929,1941 
 * If this tuple is in the begin/middle of the 
chain then we
 * have to move to the end of chain.  As with 
any t_ctid
 * chase, we have to verify that each new tuple 
is really the
!* descendant of the tuple we came from; 
however, here we
!* need even more than the normal amount of 
paranoia.
!* If t_ctid links forward to a tuple 
determined to be DEAD,
!* then depending on where that tuple is, it 
might already
!* have been removed, and perhaps even replaced 
by a MOVED_IN
!* tuple.  We don't want to include any DEAD 
tuples in the
!* chain, so we have to recheck 
HeapTupleSatisfiesVacuum.
 */
while (!(tp.t_data-t_infomask  
(HEAP_XMAX_INVALID |

  HEAP_IS_LOCKED)) 
***
*** 1933,1938 
--- 1949,1955 
OffsetNumber nextOffnum;
ItemId  nextItemid;
HeapTupleHeader nextTdata;
+   HTSV_Result nextTstatus;
  
nextTid = tp.t_data-t_ctid;
priorXmax = 
HeapTupleHeaderGetXmax(tp.t_data);
***
*** 1963,1968 
--- 1980,1998

Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring

2007-03-13 Thread Greg Smith


On Mon, 12 Mar 2007, Tom Lane wrote:


It might also be interesting to know exactly how many buffers were
pinned at the time the scan passed over them.  In theory it should be a
small fraction, but maybe it isn't ...


It is; the theory holds for all the tests I tried today.  The actual 
pinned buffers were so few (typically a fraction of the clients) that I 
reverted to just lumping them in with the recently used ones.  To better 
reflect the vast majority of what it's interacting with, in my patch I 
renamed the SyncOneBuffer skip_pinned to skip_recently_used.  It seems 
natural that something currently pinned would also be considered recently 
used, the current naming I didn't find so obvious.


I'm also now collecting clean vs. dirty usage histogram counts as well 
since you suggested it.  Nothing exciting to report there so far, may note 
something interesting after I collect more data.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend

62 matches

Mail list logo