Re: [HACKERS] Need -fwrapv or -fno-strict-overflow for gcc-4.3

2008-03-10 Thread Tom Lane
Kris Jurka [EMAIL PROTECTED] writes:
 Gcc 4.3 has started to perform optimizations based on the denial of the 
 existence of signed overflow. Building CVS HEAD with gcc 4.3rc2 I get the 
 following warnings:

Hmm, I suspect that it's not so much that they're performing new
optimizations as that they're being polite enough to warn us about it.
Which is a step forward, because one of the main gripes I had about this
before was that there was no way to tell where it broke your code.  Now
you can tell; not that they've provided any useful alternative to
choosing -fwrapv, but at least you can tell what they broke.

This list is actually pretty scary, because what it tells us in so many
words is that gcc 4.3 is diking out a lot of security-critical overflow
checks.  My old gripe you cited was about gcc 4.1.  I am wondering if
these checks have been no-ops in Postgres builds done with gcc 4.1 and
up, and we're only just now being told about it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Oleg Bartunov

On Sun, 9 Mar 2008, Tom Lane wrote:


Simon Riggs [EMAIL PROTECTED] writes:

I've coded a small patch to allow CaseSensitive synonyms.


Applied with corrections (it'd be good if you at least pretended to test
stuff before submitting it).

Would a similar parameter be useful for any of the other dictionary
types?


There are many things desirable to do with dictionaries, for example,
say dictionary to return an original word plus it's normal form. Another
feature is a not recognize-and-stop dictionaries, but allow 
filtering dictionary. We have a feeling that a little middleware would help

implement this, and CaseSensitive too.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need -fwrapv or -fno-strict-overflow for gcc-4.3

2008-03-10 Thread Kris Jurka



On Mon, 10 Mar 2008, Tom Lane wrote:

I am wondering if these checks have been no-ops in Postgres builds done 
with gcc 4.1 and up, and we're only just now being told about it.




Since gcc 4.2 supports -Wstrict-overflow, I rebuilt pg with that to see 
what it's doing currently.  I'm not sure what -Wstrict-overflow level 
-Wall implies with gcc 4.3, but I cranked it up to 5 on 4.2 to get the 
most details out of it.  I don't see any of the warnings I saw on 4.3 and 
I get the attached list which looks less dangerous spot checking a couple 
items, but I haven't gone through the whole list.


Additionally the comments in the blog posting[1] I linked to previously, a 
user asks, how come I don't see any warnings with -Wstrict-overflow on 
gcc 4.2.3 and it's answered I think the full effects only come in on gcc 
mainline. At least, gcc 4.2 does not eliminate the loop, but gcc 4.3 
will.  So clearly 4.3 is doing something new here, but that doesn't prove 
we're safe on previous versions.


[1] http://www.airs.com/blog/archives/120

Kris Jurka
exec.c:245: warning: assuming signed overflow does not occur when changing X +- 
C1 cmp C2 to X cmp C1 +- C2
exec.c:245: warning: assuming signed overflow does not occur when changing X +- 
C1 cmp C2 to X cmp C1 +- C2
localtime.c:322: warning: assuming signed overflow does not occur when changing 
X +- C1 cmp C2 to X cmp C1 +- C2
localtime.c:814: warning: assuming signed overflow does not occur when changing 
X +- C1 cmp C2 to X cmp C1 +- C2
pgtz.c:480: warning: assuming signed overflow does not occur when distributing 
negation across division
pgtz.c:492: warning: assuming signed overflow does not occur when distributing 
negation across division
pgtz.c:503: warning: assuming signed overflow does not occur when distributing 
negation across division
zic.c:862: warning: assuming signed overflow does not occur when simplifying 
conditional to constant
heaptuple.c:1018: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
heaptuple.c:1845: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
heaptuple.c:907: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
gram.y:9471: warning: assuming signed overflow does not occur when simplifying 
conditional to constant
gistutil.c:185: warning: assuming signed overflow does not occur when changing 
X +- C1 cmp C2 to X cmp C1 +- C2
gistutil.c:307: warning: assuming signed overflow does not occur when changing 
X +- C1 cmp C2 to X cmp C1 +- C2
gistutil.c:417: warning: assuming signed overflow does not occur when changing 
X +- C1 cmp C2 to X cmp C1 +- C2
parse_node.c:70: warning: assuming signed overflow does not occur when changing 
X +- C1 cmp C2 to X cmp C1 +- C2
copy.c:2613: warning: assuming signed overflow does not occur when changing X 
+- C1 cmp C2 to X cmp C1 +- C2
gistsplit.c:494: warning: assuming signed overflow does not occur when changing 
X +- C1 cmp C2 to X cmp C1 +- C2
tuptoaster.c:1017: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
tuptoaster.c:590: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
nbtutils.c:74: warning: assuming signed overflow does not occur when changing X 
+- C1 cmp C2 to X cmp C1 +- C2
freespace.c:1567: warning: assuming signed overflow does not occur when 
simplifying division
freespace.c:1568: warning: assuming signed overflow does not occur when 
simplifying division
freespace.c:1637: warning: assuming signed overflow does not occur when 
simplifying division
freespace.c:1638: warning: assuming signed overflow does not occur when 
simplifying division
arrayfuncs.c:2837: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
arrayfuncs.c:2701: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
dict.c:63: warning: assuming signed overflow does not occur when simplifying 
multiplication
arrayfuncs.c:314: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
cash.c:312: warning: assuming signed overflow does not occur when negating a 
division
spell.c:1181: warning: assuming signed overflow does not occur when simplifying 
multiplication
deadlock.c:790: warning: assuming signed overflow does not occur when 
simplifying conditional to constant
spell.c:1119: warning: assuming signed overflow does not occur when simplifying 
conditional to constant
s_lock.c:150: warning: assuming signed overflow does not occur when changing X 
+- C1 cmp C2 to X cmp C1 +- C2
s_lock.c:155: warning: assuming signed overflow does not occur when changing X 
+- C1 cmp C2 to X cmp C1 +- C2
datetime.c:2072: warning: assuming signed overflow does not occur when 
simplifying conditional to constant
format_type.c:408: warning: assuming signed overflow does not occur when 
simplifying / or % to  or 
int.c:1130: warning: assuming signed overflow does not occur when simplifying 

Re: [HACKERS] Lazy constraints / defaults

2008-03-10 Thread Martijn van Oosterhout
On Sun, Mar 09, 2008 at 10:45:59PM +0100, Dawid Kuroczko wrote:
   alter table tab add column col integer not null default 42 check (col  0);
 
 I think this will not solve the OP's problem.  He wants to minimize the time
 a table is under exclusive lock, and this ALTER command will effectively
 rewrite the whole table (to add new not null column).

ISTM a while back someone adding a patch that made the above almost
instantaneous. Probably remembering wrong though.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Maximum statistics target

2008-03-10 Thread Peter Eisentraut
Am Freitag, 7. März 2008 schrieb Tom Lane:
 I'm not wedded to the number 1000 in particular --- obviously that's
 just a round number.  But it would be good to see some performance tests
 with larger settings before deciding that we don't need a limit.

Well, I'm not saying we should raise the default statistics target.  But 
setting an arbitrary limit on the grounds that larger values might slow the 
system is like limiting the size of tables because larger tables will cause 
slower queries.  Users should have the option of finding out the best balance 
for themselves.  If there are concerns with larger statistics targets, we 
should document them.  I find nothing about this in the documentation at the 
moment.

 IIRC, egjoinsel is one of the weak spots, so tests involving planning of
 joins between two tables with large MCV lists would be a good place to
 start.

I have run tests with joining two and three tables with 10 million rows each, 
and the planning times seem to be virtually unaffected by the statistics 
target, for values between 10 and 80.  They all look more or less like 
this:

test=# explain select * from test1, test2 where test1.a = test2.b;
 QUERY PLAN
-
 Hash Join  (cost=308311.00..819748.00 rows=1000 width=16)
   Hash Cond: (test1.a = test2.b)
   -  Seq Scan on test1  (cost=0.00..144248.00 rows=1000 width=8)
   -  Hash  (cost=144248.00..144248.00 rows=1000 width=8)
 -  Seq Scan on test2  (cost=0.00..144248.00 rows=1000 width=8)
(5 rows)

Time: 132,350 ms

and with indexes

test=# explain select * from test1, test2 where test1.a = test2.b;
 QUERY PLAN

 Merge Join  (cost=210416.65..714072.26 rows=1000 width=16)
   Merge Cond: (test1.a = test2.b)
   -  Index Scan using test1_index1 on test1  (cost=0.00..282036.13 
rows=1000 width=8)
   -  Index Scan using test2_index1 on test2  (cost=0.00..282036.13 
rows=1000 width=8)
(4 rows)

Time: 168,455 ms

The time to analyze is also quite constant, just before you run out of 
memory. :)  The MaxAllocSize is the limiting factor in all this.  In my 
example, statistics targets larger than about 80 created pg_statistic 
rows that would have been larger than 1GB, so they couldn't be stored.

I suggest that we get rid of the limit of 1000, adequately document whatever 
issues might exist with large values (possibly not many, see above), and add 
an error message more user-friendly than invalid memory alloc request size 
for the cases where the value is too large to be storable.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Simon Riggs
On Sun, 2008-03-09 at 23:03 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  I've coded a small patch to allow CaseSensitive synonyms.
 
 Applied with corrections (it'd be good if you at least pretended to test
 stuff before submitting it).

It is a frequent accusation of yours that I don't test things, which is
incorrect. Defending against that makes me a liar twice in your eyes. If
you look more closely at what happens you'll understand that your own
rigid expectations are what causes these problems. 

If you thought at all you'd realise that nobody would be stupid enough
to try to sneak untested code into Postgres; all bugs would point
directly back to anybody attempting that. That isn't true just of
Postgres, its true of any group of people working together on any task,
not just software or open source software.

As Greg mentions on another thread, not all patches are *intended* to be
production quality by their authors. Many patches are shared for the
purpose of eliciting general feedback. You yourself encourage a group
development approach and specifically punish those people dropping
completely finished code into the queue and expecting it to be
committed as-is. So people produce patches in various states of
readiness, knowing that they may have to produce many versions before it
is finally accepted. Grabbing at a piece of code, then shouting
unclean, unclean just destroys the feedback process and leaves
teamwork in tatters.

My arse doesn't need wiping, thanks, nor does my bottom need smacking,
nor are you ever likely to catch me telling fibs. If you think so,
you're wrong and you should reset.

What you will find from me and others, in the past and realistically in
the future too, are patches that vary according to how near to
completion they are. Not the same thing as completed, yet varying in
quality. If they are incomplete it is because of the idea to receive
feedback at various points. Some patches need almost none e.g. truncate
triggers (1-2 versions), some patches need almost constant feedback e.g.
async commit (24+ versions before commit). The existence of an
intermediate patch in no way signals laziness, lack of intention to
complete or any other failure to appreciate the software development
process.

If you want people to work on Postgres alongside you, I'd appreciate a
software development process that didn't roughly equate to charging at a
machine gun trench across a minefield. If you insist on following that
you should at least stop wondering why it is that the few people to have
made more than a few steps are determined and grim individuals and start
thinking about the many skilled people who have chosen non-combatant
status, and why.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum statistics target

2008-03-10 Thread Cédric Villemain
Le Monday 10 March 2008, Peter Eisentraut a écrit :
 Am Freitag, 7. März 2008 schrieb Tom Lane:
  I'm not wedded to the number 1000 in particular --- obviously that's
  just a round number.  But it would be good to see some performance tests
  with larger settings before deciding that we don't need a limit.

 Well, I'm not saying we should raise the default statistics target.  But
 setting an arbitrary limit on the grounds that larger values might slow the
 system is like limiting the size of tables because larger tables will cause
 slower queries.  Users should have the option of finding out the best
 balance for themselves.  If there are concerns with larger statistics
 targets, we should document them.  I find nothing about this in the
 documentation at the moment.

I find 2 things:
«Increasing the target causes a proportional increase in the time and space 
needed to do ANALYZE. »
in http://www.postgresql.org/docs/current/static/sql-analyze.html
and
« ... at the price of consuming more space in pg_statistic and slightly more 
time to compute the estimates»
in http://www.postgresql.org/docs/current/static/planner-stats.html

But probably not clear enought about time impact in query plan.



  IIRC, egjoinsel is one of the weak spots, so tests involving planning of
  joins between two tables with large MCV lists would be a good place to
  start.

 I have run tests with joining two and three tables with 10 million rows
 each, and the planning times seem to be virtually unaffected by the
 statistics target, for values between 10 and 80.  They all look more or
 less like this:

 test=# explain select * from test1, test2 where test1.a = test2.b;
  QUERY PLAN
 ---
-- Hash Join  (cost=308311.00..819748.00 rows=1000 width=16)
Hash Cond: (test1.a = test2.b)
-  Seq Scan on test1  (cost=0.00..144248.00 rows=1000 width=8)
-  Hash  (cost=144248.00..144248.00 rows=1000 width=8)
  -  Seq Scan on test2  (cost=0.00..144248.00 rows=1000
 width=8) (5 rows)

 Time: 132,350 ms

 and with indexes

 test=# explain select * from test1, test2 where test1.a = test2.b;
  QUERY PLAN
 ---
- Merge Join  (cost=210416.65..714072.26 rows=1000
 width=16)
Merge Cond: (test1.a = test2.b)
-  Index Scan using test1_index1 on test1  (cost=0.00..282036.13
 rows=1000 width=8)
-  Index Scan using test2_index1 on test2  (cost=0.00..282036.13
 rows=1000 width=8)
 (4 rows)

 Time: 168,455 ms

 The time to analyze is also quite constant, just before you run out of
 memory. :)  The MaxAllocSize is the limiting factor in all this.  In my
 example, statistics targets larger than about 80 created pg_statistic
 rows that would have been larger than 1GB, so they couldn't be stored.

 I suggest that we get rid of the limit of 1000, adequately document
 whatever issues might exist with large values (possibly not many, see
 above), and add an error message more user-friendly than invalid memory
 alloc request size for the cases where the value is too large to be
 storable.



-- 
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Andrew Dunstan



Simon Riggs wrote:

As Greg mentions on another thread, not all patches are *intended* to be
production quality by their authors. Many patches are shared for the
purpose of eliciting general feedback. You yourself encourage a group
development approach and specifically punish those people dropping
completely finished code into the queue and expecting it to be
committed as-is. 
  


If you post a patch that is not intended to be of production quality, it 
is best to mark it so explicitly. Then nobody can point fingers at you. 
Also, Bruce would then know not to put it in the queue of patches 
waiting for application.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum statistics target

2008-03-10 Thread Guillaume Smet
On Mon, Mar 10, 2008 at 11:36 AM, Peter Eisentraut [EMAIL PROTECTED] wrote:
  The time to analyze is also quite constant, just before you run out of
  memory. :)  The MaxAllocSize is the limiting factor in all this.  In my
  example, statistics targets larger than about 80 created pg_statistic
  rows that would have been larger than 1GB, so they couldn't be stored.

From my experience on real life examples, the time to analyze is far
from being constant when you raise the statistics target but it may be
related to the schema of our tables.

cityvox=# \timing
Timing is on.
cityvox=# show default_statistics_target ;
 default_statistics_target
---
 10
(1 row)

Time: 0.101 ms
cityvox=# ANALYZE evenement;
ANALYZE
Time: 406.069 ms
cityvox=# ANALYZE evenement;
ANALYZE
Time: 412.355 ms
cityvox=# set default_statistics_target = 30;
SET
Time: 0.165 ms
cityvox=# ANALYZE evenement;
ANALYZE
Time: 1419.161 ms
cityvox=# ANALYZE evenement;
ANALYZE
Time: 1381.754 ms
cityvox=# set default_statistics_target = 100;
SET
Time: 1.853 ms
cityvox=# ANALYZE evenement;
ANALYZE
Time: 5211.785 ms
cityvox=# ANALYZE evenement;
ANALYZE
Time: 5178.764 ms

That said I totally agree that it's not a good idea to have a strict
maximum value if we haven't technical reasons for that.

--
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy constraints / defaults

2008-03-10 Thread Michał Zaborowski
Hello,
  Let me try again...

Here is simple example.
To do:
alter table users add column  integer not null default 0;
Table is big, updated, referenced etc (big - means that alter lock the
table long enought
to kill the system). Note that it is not my design - I have to do
alter the table... but

Solution:
1. alter table users add column  integer; -- short lock
2. alter table users alter column  set default 0;
3. update users set  = 0 where users.id between a and b; --
preparing for constraint - in small chunks
4. update users set  = 0 where  is null;
5. alter table users alter column  set not null;

Works, but I hate it.
I would like to do:
alter table users add column  integer not null default 0;
- with something like concurrently or no check - and let PG to do
the job. In that case I expect
PG to update meta data, and for updated rows set default - in other
case they can not satisfy check.
It would be great that step 3 has been done, but I understand it can
be a problem. I see that breaking
operation integrity is needed. I have a script with some parameters
that do it almost automatically.
What I want to point is that PG becomes more and more popular. People
use it for bigger and bigger
databases. In that case typical alter can be a PITA. If something can
be done by DB, I would like it
to be done in this way - as safer and faster way. In this particular
case - I expect DB to take care about
new and updated data. Correcting older rows is nice to have. That
parameter can be stored to inform
everybody - that some data may not satisfy check or null can be found
instead of default.

Look at commit_delay / commit_siblings. System is faster, but if
something go wrong - something (else)
will be lost. It is DBA decision what to choose. If DB keeps all
information in pg_class, pg_attribute
everyone can get how the changes ware made.

-- 
Regards,
  Michał Zaborowski (TeXXaS)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum statistics target

2008-03-10 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Freitag, 7. März 2008 schrieb Tom Lane:
 IIRC, egjoinsel is one of the weak spots, so tests involving planning of
 joins between two tables with large MCV lists would be a good place to
 start.

 I have run tests with joining two and three tables with 10 million rows each,
 and the planning times seem to be virtually unaffected by the statistics 
 target, for values between 10 and 80.

It's not possible to believe that you'd not notice O(N^2) behavior for N
approaching 80 ;-).  Perhaps your join columns were unique keys, and
thus didn't have any most-common-values?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Simon Riggs
On Mon, 2008-03-10 at 08:24 -0400, Andrew Dunstan wrote:
 
 Simon Riggs wrote:
  As Greg mentions on another thread, not all patches are *intended* to be
  production quality by their authors. Many patches are shared for the
  purpose of eliciting general feedback. You yourself encourage a group
  development approach and specifically punish those people dropping
  completely finished code into the queue and expecting it to be
  committed as-is. 

 If you post a patch that is not intended to be of production quality, it 
 is best to mark it so explicitly. Then nobody can point fingers at you. 
 Also, Bruce would then know not to put it in the queue of patches 
 waiting for application.

So it can be forgotten about entirely? H. 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum statistics target

2008-03-10 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Freitag, 7. März 2008 schrieb Tom Lane:
 IIRC, egjoinsel is one of the weak spots, so tests involving planning of
 joins between two tables with large MCV lists would be a good place to
 start.

 I have run tests with joining two and three tables with 10 million rows each,
 and the planning times seem to be virtually unaffected by the statistics 
 target, for values between 10 and 80.

 It's not possible to believe that you'd not notice O(N^2) behavior for N
 approaching 80 ;-).  Perhaps your join columns were unique keys, and
 thus didn't have any most-common-values?

We could remove the hard limit on statistics target and impose the limit
instead on the actual size of the arrays. Ie, allow people to specify larger
sample sizes and discard unreasonably large excess data (possibly warning them
when that happens).

That would remove the screw case the original poster had where he needed to
scan a large portion of the table to see at least one of every value even
though there were only 169 distinct values.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 I think if you post something marked Work In Progress, there is an 
 implied commitment on your part to post something complete at a later stage.

It *wasn't* marked Work In Progress, and Simon went out of his way to
cross-post it to -patches, where the thread previously had not been:

http://archives.postgresql.org/pgsql-patches/2007-09/msg00150.php

I don't think either Bruce or I can be faulted for assuming that it was
meant to be applied.  In future perhaps I should take it as a given that
Simon doesn't expect his patches to be applied?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Simon Riggs
On Mon, 2008-03-10 at 09:42 -0400, Andrew Dunstan wrote:

 I think if you post something marked Work In Progress, there is an 
 implied commitment on your part to post something complete at a later stage.
 
 So if it's forgotten you would be the one doing the forgetting. ;-)

But if they aren't on a review list, they won't get reviewed, no matter
what their status. So everybody has to maintain their own status list
and re-submit patches for review monthly until reviewed?

I like the idea of marking things WIP, but I think we need a clear
system where we agree that multiple statuses exist and that they are
described in particular ways.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Andrew Dunstan



Simon Riggs wrote:

On Mon, 2008-03-10 at 08:24 -0400, Andrew Dunstan wrote:
  

Simon Riggs wrote:


As Greg mentions on another thread, not all patches are *intended* to be
production quality by their authors. Many patches are shared for the
purpose of eliciting general feedback. You yourself encourage a group
development approach and specifically punish those people dropping
completely finished code into the queue and expecting it to be
committed as-is. 
  


  
If you post a patch that is not intended to be of production quality, it 
is best to mark it so explicitly. Then nobody can point fingers at you. 
Also, Bruce would then know not to put it in the queue of patches 
waiting for application.



So it can be forgotten about entirely? H. 

  


I think if you post something marked Work In Progress, there is an 
implied commitment on your part to post something complete at a later stage.


So if it's forgotten you would be the one doing the forgetting. ;-)

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Tom Lane
Oleg Bartunov [EMAIL PROTECTED] writes:
 On Sun, 9 Mar 2008, Tom Lane wrote:
 Would a similar parameter be useful for any of the other dictionary
 types?

 There are many things desirable to do with dictionaries, for example,
 say dictionary to return an original word plus it's normal form. Another
 feature is a not recognize-and-stop dictionaries, but allow 
 filtering dictionary. We have a feeling that a little middleware would help
 implement this, and CaseSensitive too.

Hmm, I can see how some middleware would help with folding or not
folding the input token, but what about the words coming from the
dictionary file (particularly the *output* lexeme)?  It's not apparent
to me that it's sensible to try to control that from outside the
dictionary.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Simon Riggs
On Mon, 2008-03-10 at 10:01 -0400, Tom Lane wrote:

 In future perhaps I should take it as a given that
 Simon doesn't expect his patches to be applied?

I think you should take it as a given that Simon would like to try to
work together, sharing ideas and code, without insults and public
derision when things don't fit. 

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Simon Riggs wrote:
  As Greg mentions on another thread, not all patches are *intended* to be
  production quality by their authors. Many patches are shared for the
  purpose of eliciting general feedback. You yourself encourage a group
  development approach and specifically punish those people dropping
  completely finished code into the queue and expecting it to be
  committed as-is. 

 
 If you post a patch that is not intended to be of production quality, it 
 is best to mark it so explicitly. Then nobody can point fingers at you. 
 Also, Bruce would then know not to put it in the queue of patches 
 waiting for application.

It would still be in that queue because we might just mark it as a TODO.

FYI, during this first release cycle, we need to apply patches and
decide on TODOs.  We skipped TODO discussion during feature freeze, so
we need to do it now for held ideas.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Teodor Sigaev

Hmm, I can see how some middleware would help with folding or not
folding the input token, but what about the words coming from the
dictionary file (particularly the *output* lexeme)?  It's not apparent
to me that it's sensible to try to control that from outside the
dictionary.


Right now I see an significant advantage of such layer: two possible extension 
of dictionary  (filtering and storing original form) are independent from nature 
of dictionary. So, instead of modifying of every dictionary we can add some 
layer, common for all dictionary. With syntax like:


CREATE/ALTER TEXT SEARCH DICTIONARY foo  (...) WITH ( filtering=on|off, 
store_original=on|off );


Or per token's type/dictionary pair.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Teodor Sigaev

Well, if you think this can/should be done somewhere outside the
dictionary, should I revert the applied patch?


No, that patch is about case sensitivity of synonym dictionary. I suppose, Simon 
wants to replace 'bill' to 'account', but doesn't want to get 'account Clinton'


For another dictionary ( dictionary of number, snowball ) that option is a 
meaningless.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need -fwrapv or -fno-strict-overflow for gcc-4.3

2008-03-10 Thread Tom Lane
Kris Jurka [EMAIL PROTECTED] writes:
 Gcc 4.3 has started to perform optimizations based on the denial of the 
 existence of signed overflow.
 ...
 I don't understand the difference between -fwrapv and 
 -fno-strict-aliasing, but it seems we need at least one of them.

I don't see -fno-strict-overflow listed at all in the manual for gcc 4.1.
So I think we should go for -fwrapv, which is defined thus:

`-fwrapv'
 This option instructs the compiler to assume that signed arithmetic
 overflow of addition, subtraction and multiplication wraps around
 using twos-complement representation.  This flag enables some
 optimizations and disables others.  This option is enabled by
 default for the Java front-end, as required by the Java language
 specification.

and so doesn't sound nearly as bad as Jakub painted it ;-).  If we use
the other, we are assuming that there are no problems in 4.1, which
feels to me like a dangerous assumption.  4.1 *did* break mysql,
remember; and we have no regression tests checking most of these
security-related overflow tests, so we have no direct proof that we
are not broken.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2008-03-10 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 Hmm, I can see how some middleware would help with folding or not
 folding the input token, but what about the words coming from the
 dictionary file (particularly the *output* lexeme)?  It's not apparent
 to me that it's sensible to try to control that from outside the
 dictionary.

 Right now I see an significant advantage of such layer: two possible 
 extension 
 of dictionary  (filtering and storing original form) are independent from 
 nature 
 of dictionary. So, instead of modifying of every dictionary we can add some 
 layer, common for all dictionary. With syntax like:

 CREATE/ALTER TEXT SEARCH DICTIONARY foo  (...) WITH ( filtering=on|off, 
 store_original=on|off );

 Or per token's type/dictionary pair.

Well, if you think this can/should be done somewhere outside the
dictionary, should I revert the applied patch?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Include Lists for Text Search

2008-03-10 Thread Teodor Sigaev
Right now I see an significant advantage of such layer: two possible 
extension of dictionary  (filtering and storing original form) are 


One more extension: drop too long words. For example, decrease limit of max 
length of word to prevent long to be indexed - word with 100 characters is 
suspiciously long for human input.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Fwd: Re: [PATCHES] 64-bit CommandIds]

2008-03-10 Thread Zoltan Boszormenyi

Hi,

what's your opinion on this?
I saw response only from Alvaro on the -patches list.

Thanks in advance,
Zoltán Böszörményi

 Eredeti üzenet 
Tárgy:  Re: [PATCHES] 64-bit CommandIds
Dátum:  Tue, 04 Mar 2008 21:52:25 +0100
Feladó: Zoltan Boszormenyi [EMAIL PROTECTED]
Címzett:pgsql-patches [EMAIL PROTECTED]
CC: 	Alvaro Herrera [EMAIL PROTECTED], Hans-Juergen Schoenig 
[EMAIL PROTECTED]
Hivatkozások: 	[EMAIL PROTECTED] 
[EMAIL PROTECTED]




Alvaro Herrera írta:

Zoltan Boszormenyi wrote:

  

attached is our patch against HEAD which enables extending CommandIds
to 64-bit. This is for enabling long transactions that really do that much
non-read-only work in one transaction.



I think you should add a pg_control field and corresponding check, to
avoid a 64bit-Cid postmaster to start on a 32bit-Cid data area and vice
versa.
  


I added the check but I needed to add it BEFORE checking for
toast_max_chunk_size otherwise it complained about this more
cryptic problem. I think it's cleaner to report this failure to know
why toast_max_chunk_size != TOAST_MAX_CHUNK_SIZE.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/

diff -dcrpN pgsql.orig/configure pgsql-cid64/configure
*** pgsql.orig/configure	2008-03-02 13:44:42.0 +0100
--- pgsql-cid64/configure	2008-03-04 16:53:46.0 +0100
*** if test -n $ac_init_help; then
*** 1349,1354 
--- 1349,1355 
  Optional Features:
--disable-FEATURE   do not include FEATURE (same as --enable-FEATURE=no)
--enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
+   --enable-huge-commandidenable 64-bit CommandId support
--enable-integer-datetimes  enable 64-bit integer date/time support
--enable-nls[=LANGUAGES]  enable Native Language Support
--disable-shareddo not build shared libraries
*** fi
*** 2175,2180 
--- 2176,2219 
  
  
  #
+ # 64-bit CommandId
+ #
+ echo $as_me:$LINENO: checking whether to build with 64-bit CommandId support 5
+ echo $ECHO_N checking whether to build with 64-bit CommandId support... $ECHO_C 6
+ 
+ pgac_args=$pgac_args enable_huge_commandid
+ 
+ # Check whether --enable-huge-commandid or --disable-huge-commandid was given.
+ if test ${enable_huge_commandid+set} = set; then
+   enableval=$enable_huge_commandid
+ 
+   case $enableval in
+ yes)
+ 
+ cat confdefs.h \_ACEOF
+ #define USE_64BIT_COMMANDID 1
+ _ACEOF
+ 
+   ;;
+ no)
+   :
+   ;;
+ *)
+   { { echo $as_me:$LINENO: error: no argument expected for --enable-huge-commandid option 5
+ echo $as_me: error: no argument expected for --enable-huge-commandid option 2;}
+{ (exit 1); exit 1; }; }
+   ;;
+   esac
+ 
+ else
+   enable_huge_commandid=no
+ 
+ fi;
+ 
+ echo $as_me:$LINENO: result: $enable_huge_commandid 5
+ echo ${ECHO_T}$enable_huge_commandid 6
+ 
+ #
  # 64-bit integer date/time storage (--enable-integer-datetimes)
  #
  { echo $as_me:$LINENO: checking whether to build with 64-bit integer date/time support 5
diff -dcrpN pgsql.orig/configure.in pgsql-cid64/configure.in
*** pgsql.orig/configure.in	2008-03-02 13:44:43.0 +0100
--- pgsql-cid64/configure.in	2008-03-04 16:53:46.0 +0100
*** PGAC_ARG_REQ(with, libs,  [  --with-
*** 128,133 
--- 128,142 
  
  
  #
+ # 64-bit CommandId
+ #
+ AC_MSG_CHECKING([whether to build with 64-bit CommandId support])
+ PGAC_ARG_BOOL(enable, huge-commandid, no, [  --enable-huge-commandidenable 64-bit CommandId support],
+ 		[AC_DEFINE([USE_64BIT_COMMANDID], 1,
+ 			[Define to 1 if you want 64-bit CommandId support. (--enable-huge-commandid)])])
+ AC_MSG_RESULT([$enable_huge_commandid])
+ 
+ #
  # 64-bit integer date/time storage (--enable-integer-datetimes)
  #
  AC_MSG_CHECKING([whether to build with 64-bit integer date/time support])
diff -dcrpN pgsql.orig/doc/src/sgml/installation.sgml pgsql-cid64/doc/src/sgml/installation.sgml
*** pgsql.orig/doc/src/sgml/installation.sgml	2008-02-18 13:49:58.0 +0100
--- pgsql-cid64/doc/src/sgml/installation.sgml	2008-03-04 17:16:14.0 +0100
*** su - postgres
*** 1011,1016 
--- 1011,1027 
/varlistentry
  
varlistentry
+termoption--enable-huge-commandid/option/term
+listitem
+ para
+  Use 64-bit CommandIds if you are planning to run transactions
+  consisting of more than 4 billion commands.  This is off by default
+  to save disk space.
+ /para
+/listitem
+   /varlistentry
+ 
+   varlistentry
 termoption--enable-integer-datetimes/option/term
 listitem
  para
diff -dcrpN pgsql.orig/src/backend/access/transam/xact.c pgsql-cid64/src/backend/access/transam/xact.c
*** 

Re: [HACKERS] [Fwd: Re: [PATCHES] 64-bit CommandIds]

2008-03-10 Thread Gregory Stark

Zoltan Boszormenyi [EMAIL PROTECTED] writes:

 Hi,

 what's your opinion on this?
 I saw response only from Alvaro on the -patches list.

I don't understand. The patch only affects configuration and SQL data type
code. It doesn't actually store the 64-bit commandid anywhere which would be
the actual hard part.

Do phantom command ids mean this all just works magically? Ie, the limit of
2^32 cmin,cmax pairs is still there but as long as you don't have to store
more than that many you get to have 2^64 raw ephemeral commandids?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Detecting large-file support in configure

2008-03-10 Thread Tom Lane
I'm reviewing this patch
http://archives.postgresql.org/pgsql-patches/2007-04/msg00531.php
which contains this configure.in code to decide if it's safe to allow
use of non-segmented files:

  if test $ac_cv_func_fseeko = yes; then
  AC_SYS_LARGEFILE
  fi
  
+ if test $ac_cv_sys_large_files = no -o $segmented_files = yes; then 
+ AC_DEFINE([USE_SEGMENTED_FILES], 1, 
+[Define to split data file in 1GB segments.]) 
+ fi

AFAICT this is simply wrong.  ac_cv_sys_large_files isn't set to reflect
whether the platform has largefile support or not; it's set to reflect
whether you need to #define _LARGE_FILES to get that.  On a platform
where largefile support is native, or is instead enabled by setting
_FILE_OFFSET_BITS to 64, this test would refuse to allow
USE_SEGMENTED_FILES to be disabled.

Another problem is that it'll do the wrong thing if fseeko isn't
present, because then AC_SYS_LARGEFILE isn't executed at all.

A quick look at the configure script suggests that the available macros
don't really set anything that specifically tells you if they found
working largefile support.  I'm considering doing AC_CHECK_SIZEOF(off_t)
and then looking at the result, but I wonder if anyone knows the
approved way to accomplish this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Fwd: Re: [PATCHES] 64-bit CommandIds]

2008-03-10 Thread Heikki Linnakangas

Gregory Stark wrote:

I don't understand. The patch only affects configuration and SQL data type
code. It doesn't actually store the 64-bit commandid anywhere which would be
the actual hard part.


Sure it does, this is the significant part of the patch:

*** pgsql.orig/src/include/c.h  2008-03-02 13:44:45.0 +0100
--- pgsql-cid64/src/include/c.h 2008-03-04 21:05:23.0 +0100
*** typedef TransactionId MultiXactId;
*** 382,388 
--- 382,392 

  typedef uint32 MultiXactOffset;

+ #ifdef USE_64BIT_COMMANDID
+ typedef uint64 CommandId;
+ #else
  typedef uint32 CommandId;
+ #endif

  #define FirstCommandId((CommandId) 0)

CommandId type is used in htup.h and elsewhere, which changes the 
on-disk format.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminating a backend

2008-03-10 Thread Bruce Momjian
Bruce Momjian wrote:
 I have an idea for this TODO item:
 
   * Allow administrators to safely terminate individual sessions either
 via an SQL function or SIGTERM
   
 Lock table corruption following SIGTERM of an individual backend
 has been reported in 8.0.  A possible cause was fixed in 8.1, but
 it is unknown whether other problems exist.  This item mostly
 requires additional testing rather than of writing any new code.
   
 http://archives.postgresql.org/pgsql-hackers/2006-08/msg00174.php
 
 When we get the termination signal, why can't we just set a global
 boolean, do a query cancel, and in the setjmp() code block check the
 global and exit --- at that stage we know we have released all locks and
 can exit cleanly.

Should I add this as a TODO?  Seems so.  Tom commented in the patches
queue that it will not work but I don't understand why.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminating a backend

2008-03-10 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Bruce Momjian wrote:
 When we get the termination signal, why can't we just set a global
 boolean, do a query cancel, and in the setjmp() code block check the
 global and exit --- at that stage we know we have released all locks and
 can exit cleanly.

 Should I add this as a TODO?  Seems so.  Tom commented in the patches
 queue that it will not work but I don't understand why.

The problem with treating it like elog(ERROR) is that you're at the
mercy of user-defined code as to whether you'll actually exit or not.
UDFs can trap elog(ERROR).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminating a backend

2008-03-10 Thread Alvaro Herrera
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Bruce Momjian wrote:
  When we get the termination signal, why can't we just set a global
  boolean, do a query cancel, and in the setjmp() code block check the
  global and exit --- at that stage we know we have released all locks and
  can exit cleanly.
 
  Should I add this as a TODO?  Seems so.  Tom commented in the patches
  queue that it will not work but I don't understand why.
 
 The problem with treating it like elog(ERROR) is that you're at the
 mercy of user-defined code as to whether you'll actually exit or not.
 UDFs can trap elog(ERROR).

Well, we can punt and blame the writer of the UDF if the signal is not
timely honored.  Having something that works for 98% of the cases, can
be fixed for 1% of the remainder, and only fails in 1% (proprietary code
that cannot be fixed) is better than having nothing at all.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminating a backend

2008-03-10 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Bruce Momjian wrote:
  When we get the termination signal, why can't we just set a global
  boolean, do a query cancel, and in the setjmp() code block check the
  global and exit --- at that stage we know we have released all locks and
  can exit cleanly.
 
  Should I add this as a TODO?  Seems so.  Tom commented in the patches
  queue that it will not work but I don't understand why.
 
 The problem with treating it like elog(ERROR) is that you're at the
 mercy of user-defined code as to whether you'll actually exit or not.
 UDFs can trap elog(ERROR).

I don't understand.  I was never considering elog(ERROR).

Right now for cancel we have:

pqsignal(SIGINT, StatementCancelHandler);

I am suggesting we add a new fuction pg_terminate_backend() that does
everything just like cancel, but also sets a global variable that we
check in the loop where we look for the next command and if it is set,
we exit the backend.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum statistics target

2008-03-10 Thread Peter Eisentraut
Am Montag, 10. März 2008 schrieb Gregory Stark:
  It's not possible to believe that you'd not notice O(N^2) behavior for N
  approaching 80 ;-).  Perhaps your join columns were unique keys, and
  thus didn't have any most-common-values?

 We could remove the hard limit on statistics target and impose the limit
 instead on the actual size of the arrays. Ie, allow people to specify
 larger sample sizes and discard unreasonably large excess data (possibly
 warning them when that happens).

I have run some more useful tests now with more distinct values.  The planning 
times do increase, but this is not the primary worry.  If you want to spend 
20 seconds of planning to speed up your query by 40 seconds, this could 
surely be a win in some scenarios, and not a catastrophic loss if not.  The 
practical problems lie with memory usage in ANALYZE, in two ways.  First, at 
some point it will try to construct pg_statistic rows that don't fit into the 
1GB limit, as mentioned upthread.  You get a funny error message and it 
aborts.  This is fixable with some cosmetics.  Second, ANALYZE appears to 
temporarily leak memory (it probably doesn't bother to free things along the 
way, as most of the code does), and so some not so large statistics targets 
(say, 4) can get your system swapping like crazy.  A crafty user could 
probably kill the system that way, perhaps even with the restricted settings 
we have now.  I haven't inspected the code in detail yet, but I imagine a few 
pfree() calls and/or a counter that checks the current memory usage against 
maintenance_work_mem could provide additional safety.  If we could get 
ANALYZE under control, then I imagine this would provide a more natural upper 
bound for the statistics targets, and it would be controllable by the 
administrator.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detecting large-file support in configure

2008-03-10 Thread Peter Eisentraut
Am Montag, 10. März 2008 schrieb Tom Lane:
 A quick look at the configure script suggests that the available macros
 don't really set anything that specifically tells you if they found
 working largefile support.  I'm considering doing AC_CHECK_SIZEOF(off_t)
 and then looking at the result

That was my spontaneous thought while reading through your message.  It seems 
safest and clearest.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminating a backend

2008-03-10 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 I am suggesting we add a new fuction pg_terminate_backend() that does
 everything just like cancel, but also sets a global variable that we
 check in the loop where we look for the next command and if it is set,
 we exit the backend.

And if you never *get* to that loop, what have you accomplished?

Keep in mind that 99% of the excuse for people to want to use SIGTERM is
that the backend isn't responding to SIGINT.  If you've fixed things so
that SIGTERM cannot get them out of any situation that SIGINT doesn't
get them out of, I don't think it's a step forward.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminating a backend

2008-03-10 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  I am suggesting we add a new fuction pg_terminate_backend() that does
  everything just like cancel, but also sets a global variable that we
  check in the loop where we look for the next command and if it is set,
  we exit the backend.
 
 And if you never *get* to that loop, what have you accomplished?
 
 Keep in mind that 99% of the excuse for people to want to use SIGTERM is
 that the backend isn't responding to SIGINT.  If you've fixed things so
 that SIGTERM cannot get them out of any situation that SIGINT doesn't
 get them out of, I don't think it's a step forward.

What I hear people ask is that they don't want the backend to read the
next command but to exit.  That seems like a reasonable request.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminating a backend

2008-03-10 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Keep in mind that 99% of the excuse for people to want to use SIGTERM is
 that the backend isn't responding to SIGINT.  If you've fixed things so
 that SIGTERM cannot get them out of any situation that SIGINT doesn't
 get them out of, I don't think it's a step forward.

 What I hear people ask is that they don't want the backend to read the
 next command but to exit.  That seems like a reasonable request.

[shrug...]  They can do that now, most of the time.  What this is about
is dealing with corner cases, and in that respect what your proposal
will do is replace soluble problems with insoluble ones.  But I suppose
I can't stop you if you're insistent.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Fix for large file support (nonsegment mode support)

2008-03-10 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 There is latest version of nonsegment support patch. I changed 
 LET_OS_MANAGE_FILESIZE to USE_SEGMENTED_FILES and I added 
 -disable-segmented-files switch to configure. I kept tuplestore behavior 
 and it still split file in both mode.

Applied with minor corrections.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum statistics target

2008-03-10 Thread Stephen Denne
 We could remove the hard limit on statistics target and 
 impose the limit
 instead on the actual size of the arrays. Ie, allow people to 
 specify larger
 sample sizes and discard unreasonably large excess data 
 (possibly warning them
 when that happens).
 
 That would remove the screw case the original poster had 
 where he needed to
 scan a large portion of the table to see at least one of 
 every value even
 though there were only 169 distinct values.
 
 -- 
   Gregory Stark


That was my use case, but I wasn't the OP.

Your suggestion would satisfy what I was trying to do. However, a higher stats 
target wouldn't solve my root problem (how the planner uses the gathered 
stats), and the statistics gathered at 1000 (and indeed at 200) are quite a 
good representation of what is in the table.

I don't like the idea of changing one limit into two limits. Or are you 
suggesting changing the algorithm that determines how many, and which pages to 
analyze, perhaps so that it is adaptive to the results of the analysis as it 
progresses? That doesn't sound easy.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Fix for large file support (nonsegment mode support)

2008-03-10 Thread Peter Eisentraut
Tom Lane wrote:
 Zdenek Kotala [EMAIL PROTECTED] writes:
  There is latest version of nonsegment support patch. I changed
  LET_OS_MANAGE_FILESIZE to USE_SEGMENTED_FILES and I added
  -disable-segmented-files switch to configure. I kept tuplestore behavior
  and it still split file in both mode.

 Applied with minor corrections.

Why is this not the default when supported?  I am wondering both from the 
point of view of the user, and in terms of development direction.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Fix for large file support (nonsegment mode support)

2008-03-10 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Applied with minor corrections.

 Why is this not the default when supported?

Fear.

Maybe eventually, but right now I think it's too risky.

One point that I already found out the hard way is that sizeof(off_t) = 8
does not guarantee the availability of largefile support; there can also
be filesystem-level constraints, and perhaps other things we know not of
at this point.

I think this needs to be treated as experimental until it's got a few
more than zero miles under its belt.  I wouldn't be too surprised to
find that we have to implement it as a run-time switch instead of
compile-time, in order to not fail miserably when somebody sticks a
tablespace on an archaic filesystem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Fix for large file support (nonsegment mode support)

2008-03-10 Thread Alvaro Herrera
Peter Eisentraut wrote:
 Tom Lane wrote:
  Zdenek Kotala [EMAIL PROTECTED] writes:
   There is latest version of nonsegment support patch. I changed
   LET_OS_MANAGE_FILESIZE to USE_SEGMENTED_FILES and I added
   -disable-segmented-files switch to configure. I kept tuplestore behavior
   and it still split file in both mode.
 
  Applied with minor corrections.
 
 Why is this not the default when supported?  I am wondering both from the 
 point of view of the user, and in terms of development direction.

Also it would get more buildfarm coverage if it were default.  If it
breaks something we'll notice earlier.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LISTEN vs. two-phase commit

2008-03-10 Thread Tom Lane
Does it make any sense to allow LISTEN or UNLISTEN in a prepared
transaction?

It's certainly not sensical for these actions to affect the backend that
actually executes the COMMIT PREPARED, in the sense of creating or
destroying pg_listener entries for it.  But how can we say that they
should affect the originating backend either?  It might not even be
around anymore.

In the current implementation, LISTEN/UNLISTEN create or delete
pg_listener rows that are then held as uncommitted until COMMIT
PREPARED.  This is bad enough for the LISTEN case, as a pg_listener row
becomes active for a backend PID that might not exist any longer, or
might now refer to a different session.  In the UNLISTEN case it'd
result in blocking any other backend that is unlucky enough to try to
send a notify to the pending-dead tuple.  (Well, actually, there's some
ugly coding in async.c that avoids that, but surely that's a crock.)
And weird as that behavior would be, there would be no way at all to
duplicate it after the long-planned rewrite to get rid of pg_listener
and handle LISTEN/NOTIFY all in memory.

So I'm thinking that PREPARE TRANSACTION should throw an error if any
LISTEN or UNLISTEN is pending in the current transaction.  This is
relatively difficult to enforce correctly in the existing code, but
it will be easy in the rewrite that I'm working on in response to
Laurent Birtz's bug report.

BTW, another little issue I just noticed is that while 2PC can cope
with NOTIFY actions, the eventual notify is sent with the PID of the
backend that executes COMMIT PREPARED, not the one that originally
created the prepared transaction.  It's not clear if this is good,
bad, or indifferent; but at the least it probably deserves a sentence
someplace in the documentation.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Fix for large file support (nonsegment mode support)

2008-03-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Also it would get more buildfarm coverage if it were default.  If it
 breaks something we'll notice earlier.

Since nothing the regression tests do even approach 1GB, the odds that
the buildfarm will notice problems are approximately zero.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LISTEN vs. two-phase commit

2008-03-10 Thread Mark Mielke

Tom Lane wrote:

Does it make any sense to allow LISTEN or UNLISTEN in a prepared
transaction?
...

Comments?
  


Assuming I understand your question - I don't think of LISTEN or 
UNLISTEN as being valuable from a transaction perspective. It's possible 
I'm missing something - but I think the transaction overhead, and 
attempts to re-use PostgreSQL tables to implement LISTEN/NOTIFY to be 
clever but mis-guided. To be practical, LISTEN/NOTIFY should be as fast 
as possible, and should never create performance problems, or incur 
performance overhead related to transactions.


I had thought of using LISTEN/NOTIFY recently, and upon reading the 
threads leading up to this, I was disappointed to hear, and that see for 
myself, how asynchronous notify was not immediate within psql, and how 
under some circumstances, even with asynchronous notify, it may take a 
rather lengthy time before the notify reaches the target. I expect such 
notification to be nearly instantaneous, and given this knowledge, I 
would choose to use a LISTEN/NOTIFY mechanism outside PostgreSQL for my 
next project. Now, does LISTEN/NOTIFY belong outside PostgreSQL in the 
first place? I'm not sure...


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LISTEN vs. two-phase commit

2008-03-10 Thread Tom Lane
Mark Mielke [EMAIL PROTECTED] writes:
 ... I think the transaction overhead, and 
 attempts to re-use PostgreSQL tables to implement LISTEN/NOTIFY to be 
 clever but mis-guided.

Oh, I don't disagree with you.  As I already mentioned, they desperately
need to be rewritten.  However, given that that's not a sanely
back-patchable answer, we have to consider what are the appropriate
semantics for the existing infrastructure.

(Also, if they *were* memory-based then the question of their relation
to 2PC semantics becomes even more urgent.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LISTEN vs. two-phase commit

2008-03-10 Thread Mark Mielke

Tom Lane wrote:

Mark Mielke [EMAIL PROTECTED] writes:
  
... I think the transaction overhead, and 
attempts to re-use PostgreSQL tables to implement LISTEN/NOTIFY to be 
clever but mis-guided.



Oh, I don't disagree with you.  As I already mentioned, they desperately
need to be rewritten.  However, given that that's not a sanely
back-patchable answer, we have to consider what are the appropriate
semantics for the existing infrastructure.

(Also, if they *were* memory-based then the question of their relation
to 2PC semantics becomes even more urgent.)
  


Ah k - so count my vote as I don't think LISTEN should be impacted by 
what sort of COMMIT I use, but I don't believe I'll be using LISTEN as 
it is today, and I definately won't be using it in two-phase commit 
today. For me that is it should be usable in a two-phase commit - but 
it's not usable today. Sorry this isn't a clear answer to your question.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [HACKERS] Terminating a backend

2008-03-10 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Keep in mind that 99% of the excuse for people to want to use SIGTERM is
  that the backend isn't responding to SIGINT.  If you've fixed things so
  that SIGTERM cannot get them out of any situation that SIGINT doesn't
  get them out of, I don't think it's a step forward.
 
  What I hear people ask is that they don't want the backend to read the
  next command but to exit.  That seems like a reasonable request.
 
 [shrug...]  They can do that now, most of the time.  What this is about
 is dealing with corner cases, and in that respect what your proposal
 will do is replace soluble problems with insoluble ones.  But I suppose
 I can't stop you if you're insistent.

I am kind of confused by your reaction to my idea.  I thought we agreed
that there was going to be no way to cleanly terminate a backend at an
arbitrary time, and I thought we were getting better at having query
cancel work in most cases, so it seems combining these two ideas that
query cancel with an immediate exit from the query loop was a perfect
solution to a feature request we get regularly.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers