date:20070320

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 But we've already had a couple of cases of interesting failures going
 unnoticed because of the noise level.  Between duplicate reports about
 busted patches and transient problems on particular build machines
 (out of disk space, misconfiguration, etc) it's pretty hard to not miss
 the once-in-a-while failures.  Is there some other way we could attack
 that problem?

 The real issue is the one you identify of stuff getting lost in the 
 noise. But I'm not sure there's any realistic cure for that.

Maybe we should think about filtering the noise.  Like, say, discarding
every report from mongoose that involves an icc core dump ...
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoosedt=2007-03-20%2006:30:01

That's only semi-serious, but I do think that it's getting harder to
pluck the wheat from the chaff.  My investigations over the weekend
showed that we have got basically three categories of reports:

1. genuine code breakage from unportable patches: normally multiple
reports over a short period until we fix or revert the cause.
2. failures on a single buildfarm member due to misconfiguration,
hardware flakiness, etc.  These are sometimes repeatable and sometimes
not.
3. all the rest, of which some fraction represents bugs we need to fix,
only we don't know they're there.

In category 1 the buildfarm certainly pays for itself, but we'd hoped
that it would help us spot less-reproducible errors too.  The problem
I'm seeing is that category 2 is overwhelming our ability to recognize
patterns within category 3.  How can we dial down the noise level?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Simon Riggs

On Tue, 2007-03-20 at 14:14 +1200, Mark Kirkwood wrote:
 Jeff Davis wrote:
 
  I know we can't keep stats on every combination of columns. My initial
  idea would be to only keep stats about a multi-column index (and
  probably optional for those, too).
  
 
 Maybe you would want to keep single column indexes too, so that (more) 
 accurate estimates for bitmap-and type plans are possible.

We should allow the DBA to specify which groups of cols to keep
statistics on, if there is no index on that group.

That solves the combinatorial explosion problem.


-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Richard Huxton


Simon Riggs wrote:

On Tue, 2007-03-20 at 14:14 +1200, Mark Kirkwood wrote:

Jeff Davis wrote:

I know we can't keep stats on every combination of columns. My initial
idea would be to only keep stats about a multi-column index (and
probably optional for those, too).

Maybe you would want to keep single column indexes too, so that (more) 
accurate estimates for bitmap-and type plans are possible.


We should allow the DBA to specify which groups of cols to keep
statistics on, if there is no index on that group.

That solves the combinatorial explosion problem.


This is one hint I think everyone can agree on. Being able to say that 
values in different columns are related just gives the planner more 
information to work with.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Martijn van Oosterhout

On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote:
 Maybe we should think about filtering the noise.  Like, say, discarding
 every report from mongoose that involves an icc core dump ...
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoosedt=2007-03-20%2006:30:01

Maybe a simple compromise would be being able to setup a set of regexes
that search the output and set a flag it that string is found. If you
find the string, it gets marked with a flag, which means that when you
look at mongoose, any failures that don't have the flag become easier
to spot.

It also means that once you've found a common failure, you can create
the regex and then any other failures with the same string get tagged
also, making unexplained ones easier to spot.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

[HACKERS] Stats processor not restarting

2007-03-20 Thread Magnus Hagander

I've noticed that if for example the autovacuum process dies (such as
with a kill -9 when testing my new shared mem implementation), only
autovac and bgwriter are restarted. The stats collector is terminated,
but not restarted. (Same goes for a regular backend, and not just
autovac)

Is there a reason for this, or is it a bug?

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Alvaro Herrera

Richard Huxton wrote:
 Simon Riggs wrote:
 On Tue, 2007-03-20 at 14:14 +1200, Mark Kirkwood wrote:
 Jeff Davis wrote:
 I know we can't keep stats on every combination of columns. My initial
 idea would be to only keep stats about a multi-column index (and
 probably optional for those, too).
 
 Maybe you would want to keep single column indexes too, so that (more) 
 accurate estimates for bitmap-and type plans are possible.
 
 We should allow the DBA to specify which groups of cols to keep
 statistics on, if there is no index on that group.
 
 That solves the combinatorial explosion problem.
 
 This is one hint I think everyone can agree on. Being able to say that 
 values in different columns are related just gives the planner more 
 information to work with.

It was also suggested that column pairs in FK relationship could be
automatically enabled.  So you don't need to specify those manually.

Now, the hard question is deciding what to keep track of.  I don't think
MCV makes much sense, because what's the MCV of two columns?  Some sort
of correlation index would seem to make certain sense.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Stats processor not restarting

2007-03-20 Thread Alvaro Herrera

Magnus Hagander wrote:
 I've noticed that if for example the autovacuum process dies (such as
 with a kill -9 when testing my new shared mem implementation), only
 autovac and bgwriter are restarted. The stats collector is terminated,
 but not restarted. (Same goes for a regular backend, and not just
 autovac)
 
 Is there a reason for this, or is it a bug?

I would say it is a bug, because the comments and code in ServerLoop()
and reaper() say different.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Andrew Dunstan


Martijn van Oosterhout wrote:

On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote:
  

Maybe we should think about filtering the noise.  Like, say, discarding
every report from mongoose that involves an icc core dump ...
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoosedt=2007-03-20%2006:30:01



Maybe a simple compromise would be being able to setup a set of regexes
that search the output and set a flag it that string is found. If you
find the string, it gets marked with a flag, which means that when you
look at mongoose, any failures that don't have the flag become easier
to spot.

It also means that once you've found a common failure, you can create
the regex and then any other failures with the same string get tagged
also, making unexplained ones easier to spot.


  


You need to show first that this is an adequate tagging mechanism, both 
in tagging things adequately and in not picking up false positives, 
which would make things worse, not better. And even then you need 
someone to do the analysis to create the regex.


The buildfarm works because it leverages our strength, namely automating 
things. But all the tagging suggestions I've seen will involve regular, 
repetitive and possibly boring work, precisely the thing we are not good 
at as a group.


If we had some staff they could be given this task (among others), 
assuming we show that it actually works. We don't, so they can't.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] how to add seconds to a TimestampTz

2007-03-20 Thread Alvaro Herrera

Alvaro Herrera wrote:
 Is there a better way than going to time_t and back?  I am currently
 using this:
 
   db-next_worker =
   time_t_to_timestamptz(timestamptz_to_time_t(current_time) +
 autovacuum_naptime);
 
 (db-next_worker is a TimestampTz, as is current_time.
 autovacuum_naptime is integer for a number of seconds)

For the archives, I just discovered on timestamp.h this macro to help:

avdb-ad_next_worker =
TimestampTzPlusMilliseconds(now, naptime_secs * 1000);

This is a lot simpler and faster ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] modifying the tbale function

2007-03-20 Thread Andrew Dunstan


Florian G. Pflug wrote:

Are we really sure that this isn't a solution in search of a problem?

I think this really depends on how you define problem. Some people
might think that select * from myfunc(...) limit 1 should stop and
return a result after myfunc(...) has returned one row. Others will
say well, just use a different software design that doesn't depend
on this optimization.




I think that's a very thin use case to justify all the scaffolding you 
propose, so put me in with the others I guess.


If we can provide a per-call mechanism for C functions, that should be 
adequate, I think. If you need heavy optimization then you need C 
functions anyway.


cheers

andrew


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Stefan Kaltenbrunner


Andrew Dunstan wrote:

Martijn van Oosterhout wrote:

On Tue, Mar 20, 2007 at 02:57:13AM -0400, Tom Lane wrote:
 

Maybe we should think about filtering the noise.  Like, say, discarding
every report from mongoose that involves an icc core dump ...
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoosedt=2007-03-20%2006:30:01 




Maybe a simple compromise would be being able to setup a set of regexes
that search the output and set a flag it that string is found. If you
find the string, it gets marked with a flag, which means that when you
look at mongoose, any failures that don't have the flag become easier
to spot.

It also means that once you've found a common failure, you can create
the regex and then any other failures with the same string get tagged
also, making unexplained ones easier to spot.


  


You need to show first that this is an adequate tagging mechanism, both 
in tagging things adequately and in not picking up false positives, 
which would make things worse, not better. And even then you need 
someone to do the analysis to create the regex.


The buildfarm works because it leverages our strength, namely automating 
things. But all the tagging suggestions I've seen will involve regular, 
repetitive and possibly boring work, precisely the thing we are not good 
at as a group.


this is probably true - however as a buildfarm admin I occasionally 
wished i had a way to invalidate reports generated from my boxes to 
prevent someone wasting time to investigate them (like errors caused by 
system upgrades,configuration problems or other local issues).


But I agree that it might be difficult to make that manual tagging 
process scalable and reliable enough so that it really is an improvment 
over what we have now.


Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Andrew Dunstan


Stefan Kaltenbrunner wrote:
however as a buildfarm admin I occasionally wished i had a way to 
invalidate reports generated from my boxes to prevent someone wasting 
time to investigate them (like errors caused by system 
upgrades,configuration problems or other local issues).




It would be extremely simply to provide a 'revoke report' API and 
client. Good idea.


But that's quite different from what we have been discussing.

cheers

andrew


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Alvaro Herrera

Andrew Dunstan wrote:

 The buildfarm works because it leverages our strength, namely automating 
 things. But all the tagging suggestions I've seen will involve regular, 
 repetitive and possibly boring work, precisely the thing we are not good 
 at as a group.

You may be forgetting that Martijn and others tagged the
scan.coverity.com database.  Now, there are some untagged errors, but
I'd say that that's because we don't control the tool, so we cannot fix
it if there are false positives.  We do control the buildfarm however,
so we can develop systematic solutions for widespread problems (instead
of forcing us to checking and tagging every single occurance of
widespread problems).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Andrew Dunstan


Alvaro Herrera wrote:

Andrew Dunstan wrote:

  
The buildfarm works because it leverages our strength, namely automating 
things. But all the tagging suggestions I've seen will involve regular, 
repetitive and possibly boring work, precisely the thing we are not good 
at as a group.



You may be forgetting that Martijn and others tagged the
scan.coverity.com database.  Now, there are some untagged errors, but
I'd say that that's because we don't control the tool, so we cannot fix
it if there are false positives.  We do control the buildfarm however,
so we can develop systematic solutions for widespread problems (instead
of forcing us to checking and tagging every single occurance of
widespread problems).

  


Well, I'm sure we can provide appropriate access or data for anyone who 
wants to do research in this area and prove me wrong.


cheers

andrew



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Peter Eisentraut

Alvaro Herrera wrote:
 Now, the hard question is deciding what to keep track of.  I don't
 think MCV makes much sense, because what's the MCV of two columns? 

The combination that occurs most often.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Martijn van Oosterhout wrote:
 Maybe a simple compromise would be being able to setup a set of regexes
 that search the output and set a flag it that string is found. If you
 find the string, it gets marked with a flag, which means that when you
 look at mongoose, any failures that don't have the flag become easier
 to spot.
 
 It also means that once you've found a common failure, you can create
 the regex and then any other failures with the same string get tagged
 also, making unexplained ones easier to spot.

 You need to show first that this is an adequate tagging mechanism, both 
 in tagging things adequately and in not picking up false positives, 
 which would make things worse, not better. And even then you need 
 someone to do the analysis to create the regex.

Well, my experiment over the weekend with doing exactly that convinced
me that regexes could be used successfully to identify common-mode
failures.  So I think Martijn has a fine idea here.  And I don't see a
problem with lack of motivation, at least for those of us who try to pay
attention to buildfarm results --- once you've looked at a couple of
reports of the same issue, you really don't want to have to repeat the
analysis over and over.  But just assuming that every report on a
particular day reflects the same breakage is exactly the risk I wish
we didn't have to take.

For a lot of cases there is not a need for an ongoing filter: we break
something, we get a pile of reports, we fix it, and then we want to tag
all the reports of that something so that we can see if anything else
happened in the same interval.  So for this, something based on an
interactive search API would work fine.  You could even use that for
repetitive problems such as buildfarm misconfigurations, though having
to repeat the search every so often would get old in the end.  The main
thing though is for the database to remember the tags once made.

 The buildfarm works because it leverages our strength, namely automating 
 things. But all the tagging suggestions I've seen will involve regular, 
 repetitive and possibly boring work, precisely the thing we are not good 
 at as a group.

Well, responding to bug reports could be called regular and repetitive
work --- in reality I don't find it so, because every bug is different.
The point I think you are missing is that having something like this
will *eliminate* repetitive, boring work, namely recognizing multiple
reports of the same problem.  The buildfarm has gotten big enough that
some way of dealing with that is desperately needed, else our ability
to spot infrequently-reported issues will disappear entirely.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Stats processor not restarting

2007-03-20 Thread Magnus Hagander

On Tue, Mar 20, 2007 at 08:48:30AM -0400, Alvaro Herrera wrote:
 Magnus Hagander wrote:
  I've noticed that if for example the autovacuum process dies (such as
  with a kill -9 when testing my new shared mem implementation), only
  autovac and bgwriter are restarted. The stats collector is terminated,
  but not restarted. (Same goes for a regular backend, and not just
  autovac)
  
  Is there a reason for this, or is it a bug?
 
 I would say it is a bug, because the comments and code in ServerLoop()
 and reaper() say different.

Bah, sorry about the noise. It was the effect of
PGSTAT_RESTART_INTERVAL.

Do we want to add some logging when we don't restart it due to repeated
failures?

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Stats processor not restarting

2007-03-20 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 Magnus Hagander wrote:
 I've noticed that if for example the autovacuum process dies (such as
 with a kill -9 when testing my new shared mem implementation), only
 autovac and bgwriter are restarted. The stats collector is terminated,
 but not restarted. (Same goes for a regular backend, and not just
 autovac)
 
 Is there a reason for this, or is it a bug?

 I would say it is a bug, because the comments and code in ServerLoop()
 and reaper() say different.

There is code in pgstat_start that limits the frequency with which new
stats collectors can be spawned --- maybe your test case is hitting that?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 It was also suggested that column pairs in FK relationship could be
 automatically enabled.  So you don't need to specify those manually.

Actually, I think you don't particularly need stats for that in most
cases --- if the planner simply took note that the FK relationship
exists, it would know that each row of the FK side joins to exactly
one row of the PK side, which in typical cases is sufficient.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Andrew Dunstan


Tom Lane wrote:

The point I think you are missing is that having something like this
will *eliminate* repetitive, boring work, namely recognizing multiple
reports of the same problem.  The buildfarm has gotten big enough that
some way of dealing with that is desperately needed, else our ability
to spot infrequently-reported issues will disappear entirely.





OK. How about if we have a table of branch, failure_stage, regex, tag, 
description, start_date plus some webby transactions for approved users 
to edit this?


The wrinkle is that applying the tags on the fly is probably not a great 
idea - the status page query is already in desperate need of overhauling 
because it's too slow. So we'd need a daemon to set up the tags in the 
background. But that's an implementation detail. Screen real estate on 
the dashboard page is also in very short supply. Maybe we could play 
with the background colour, so that a tagged failure had, say, a blue 
background, as opposed to the red/pink/yellow we use for failures now. 
Again - an implementation detail.


My biggest worry apart from maintenance (which doesn't matter that much 
- if people don't enter the regexes they don't get the tags they want) 
is that the regexes will not be specific enough, and so give false 
positives on the tags. Then if you're looking for things that aren't 
tagged you be even more likely than today to miss the outliers. Lord 
knows that regexes are hard to get right - I've been using them for a 
couple of decades and they've earned me lots of money, and I still get 
them wrong regularly (including several cases on the buildfarm). but 
maybe we need to take the plunge and see how it works.


This would be a fine SOC project - I at least won't have time to develop 
it for quite some time.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Stats processor not restarting

2007-03-20 Thread Tom Lane

Magnus Hagander [EMAIL PROTECTED] writes:
 Bah, sorry about the noise. It was the effect of
 PGSTAT_RESTART_INTERVAL.
 Do we want to add some logging when we don't restart it due to repeated
 failures?

Not really, but maybe it would be sensible to reset last_pgstat_start_time
when doing a database-wide restart?  The motivation for the timeout was
to reduce cycle wastage if pgstat crashed by itself, but when you've
deliberately SIGQUITed it, that hardly seems to apply ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] modifying the tbale function

2007-03-20 Thread Hannu Krosing

Ühel kenal päeval, T, 2007-03-20 kell 09:35, kirjutas Andrew Dunstan:
 Florian G. Pflug wrote:
  Are we really sure that this isn't a solution in search of a problem?
  I think this really depends on how you define problem. Some people
  might think that select * from myfunc(...) limit 1 should stop and
  return a result after myfunc(...) has returned one row. Others will
  say well, just use a different software design that doesn't depend
  on this optimization.

The same argument could be used for justifying leaving out any
optimisations.

 I think that's a very thin use case to justify all the scaffolding you 
 propose, so put me in with the others I guess.
 
 If we can provide a per-call mechanism for C functions, that should be 
 adequate, I think. 

Sure, having it at least at C level would make the job of pl
implementors wanting to do per-call returns a little easier.

 If you need heavy optimization then you need C 
 functions anyway.

Check out pl/proxy ( http://pgfoundry.org/projects/plproxy ), which is a
dblink replacement and database partitioning tool in one. A short
summary is here
https://developer.skype.com/SkypeGarage/DbProjects/PlProxy (may be a bit
outdated).

I'm sure there would be use-cases (like moving huge tables) where
per-call (or per-kilocall :) ) mechanisms would come handy.

One example would be doing a huge sort split between N partitions so
that each partition sorts its part of the data and then merge the
results in-order from all partitions in-line, without doing re-sorting
on the master.

With a function returning the full recordset the sorting can't be
distributed very effectively, as the whole query result needs to be
saved on the master node before returning.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 The wrinkle is that applying the tags on the fly is probably not a great 
 idea - the status page query is already in desperate need of overhauling 
 because it's too slow. So we'd need a daemon to set up the tags in the 
 background. But that's an implementation detail. Screen real estate on 
 the dashboard page is also in very short supply. Maybe we could play 
 with the background colour, so that a tagged failure had, say, a blue 
 background, as opposed to the red/pink/yellow we use for failures now. 
 Again - an implementation detail.

I'm not sure that the current status dashboard needs to pay any attention
to the tags.  The view that I would like to have of recent failures
across all machines in a branch is the one that needs to be tag-aware,
and perhaps also the existing display of a given machine's branch history.

 My biggest worry apart from maintenance (which doesn't matter that much 
 - if people don't enter the regexes they don't get the tags they want) 
 is that the regexes will not be specific enough, and so give false 
 positives on the tags.

True.  I strongly suggest that we want an interactive search-and-tag
capability *before* worrying about automatic tagging --- one of the
reasons for that is to provide a way to test a regex that you might
then consider adding to the automatic filter for future reports.

 This would be a fine SOC project - I at least won't have time to develop 
 it for quite some time.

Agreed.  Who's maintaining the SOC project list page?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Arturo Perez


I don't know if this has come up yet but

In terms of tagging errors we might be able to use some machine  
learning techniques.



There are NLP/learning systems that interpret logs.  They learn over  
time what is normal and what isn't and can flag things that are  
abnormal.


For example, people are using support vector machines (SVM) analysis  
on log files to do intrusion detection.  Here's a link for intrusion  
detection called Robust Anomaly Detection Using Support Vector  
Machines  http://wwwcsif.cs.ucdavis.edu/~liaoy/research/ 
RSVM_Anomaly_journal.pdf


This paper from IBM gives some more background information on how  
such a thing might work.  http://www.research.ibm.com/journal/sj/413/ 
johnson.html


I have previously used an open source toolkit from CMU called rainbow  
to do these types of analysis.


-arturo


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Andrew Dunstan


Arturo Perez wrote:

I don't know if this has come up yet but

In terms of tagging errors we might be able to use some machine 
learning techniques.



There are NLP/learning systems that interpret logs.  They learn over 
time what is normal and what isn't and can flag things that are abnormal.





We can make extracts of the database (including the log data) available 
to anyone who wants to do research using any learning technique that 
appeals to them.


cheers

andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] Question about tuplestore clients

2007-03-20 Thread Gregory Stark


I tried to make tuplestore free up tuples that would no longer be needed
because they're older than the mark and neither random access nor rewind
capability was needed. This is important for three different purposes:
optimizing merge join to not need to materialize the entire data set,
recursive queries, and window functions.

However I've run into something I didn't expect. It seems merge joins keep a
reference to a tuple *after* they set the mark beyond it. I'm trying to figure
out why this is necessary but I haven't absorbed all of nodeMergejoin yet.

Is it possible I've misdiagnosed this? I think my logic is correct because if
I ifdef out the pfree it passes all regression tests. That doesn't really
prove anything of course but it seems hard to believe I would have an
off-by-one bug in setting the mark that wouldn't show up in the results.

But in my reading of nodeMergejoin so far it seems it keeps a reference to the
first tuple in a set, ie, the tuple it's going to mark. Not any tuple before
that.

Anyways, I just wanted to know if I was missing some other reason references
have to be valid for older tuples. Maybe I'm looking in the wrong place?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Patch for pg_dump

2007-03-20 Thread Dany DeBontridder


Here is a (small) patch to give the ability to pg_dump to export only the
functions (or only one), very useful when you often develop with psql (
postgresql.8.2.3)

Usage:
pg_dump -Q function_name  DATABASE export function_name
pg_dump -QDATABASE  export all the functions

This patch is distributed under the BSD licence


Regards,



D.


PS: I hope it is the correct ml, otherwise excuse me, it is the first time I
propose a patch for postgresql

[HACKERS] Reminder: only 5 days left to submit SoC applications

2007-03-20 Thread Josh Berkus


Students  Professors,

There are only 5 days left to submit your PostgreSQL Google Summer of 
Code Project:

http://www.postgresql.org/developer/summerofcode.html

If you aren't a student, but know a CS student interested in databases, 
testing, GUIs, or any other OSS coding, please point them to our SoC 
page and encourage them to apply right away!


If you are a student, and you've been trying to perfect your 
application, please go ahead and submit it ... we can't help you if you 
miss the deadline, but we can help you fix an incomplete application.


--Josh Berkus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] Effects of GUC settings on automatic replans

2007-03-20 Thread Tom Lane

Now that there's a mechanism in the backend that will automatically replan
queries whenever anything changes about the referenced tables, we have to
worry about whether an automatic replan might cause surprising changes in
the behavior of a query.  I looked through the available GUC settings to
see what would affect a replan, and came up with just four that would
potentially affect the semantics of the query:

search_path
add_missing_from
transform_null_equals
sql_inheritance

As I've already mentioned, I think we must address search_path by saving
the path at time of first plan and using that same path during any replan.
However, I'm not excited about adding mechanism to similarly save and
restore the others.  They're all for legacy-app compatibility and so
seem unlikely to be changed on-the-fly within a session.  Also,
add_missing_from and transform_null_equals aren't going to affect sanely
written queries in the first place.  sql_inheritance is a little bit
bigger deal, but I wonder whether we shouldn't just remove that variable
altogether --- it's been default ON since 7.1 and I've not heard anyone
complain about that in a long time.

There are a boatload of other GUCs that could potentially result in
changes of planner choices:

enable_bitmapscan
enable_hashagg
enable_hashjoin
enable_indexscan
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan
constraint_exclusion
from_collapse_limit
join_collapse_limit
geqo
geqo_effort
geqo_generations
geqo_pool_size
geqo_selection_bias
geqo_threshold
seq_page_cost
random_page_cost
cpu_tuple_cost
cpu_index_tuple_cost
cpu_operator_cost
effective_cache_size
work_mem

I'm inclined not to worry about these, since changing them can't affect
the semantics of the query, at worst its performance.

One other question is exactly what saving and restoring search_path
should mean.  We could do it textually and thus need to re-interpret
the string on each replan, or we could save the actual list of schema
OIDs.  The main disadvantage of the textual way is that without some
special hack, it's possible that a replan would see the temp-table
schema as being frontmost when it had not been active at all originally;
that seems bad.  OTOH if we save the OID list then it would not work
to drop a schema and rename another into its place, which is a bit
inconsistent with the fact that that does work for an individual table.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Josh Berkus


Tom,


Actually, I think you don't particularly need stats for that in most
cases --- if the planner simply took note that the FK relationship
exists, it would know that each row of the FK side joins to exactly
one row of the PK side, which in typical cases is sufficient.


Is it?  What about the other direction?  Currently, doesn't the planner 
assume that the rowcount relationship is 1 to ( child total rows / 
parent total rows) ?  That's ok for tables with relatively even 
distribution, but not for skewed ones.


--Josh Berkus

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Jeff Davis

On Tue, 2007-03-20 at 09:03 +, Simon Riggs wrote:
 We should allow the DBA to specify which groups of cols to keep
 statistics on, if there is no index on that group.
 
 That solves the combinatorial explosion problem.
 

I think it would be a good first step if we could just keep stats on
multiple columns in the same table. If we can do more than that, great.

We could probably keep stats on multiple columns across different
tables, but I don't know how those statistics should be used. Using
statistics to estimate joins seems like a tricky problem. Maybe it's
already solved with known algorithms?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Martijn van Oosterhout

On Tue, Mar 20, 2007 at 11:36:09AM -0400, Andrew Dunstan wrote:
 My biggest worry apart from maintenance (which doesn't matter that much 
 - if people don't enter the regexes they don't get the tags they want) 
 is that the regexes will not be specific enough, and so give false 
 positives on the tags. Then if you're looking for things that aren't 
 tagged you be even more likely than today to miss the outliers. Lord 

I think you could solve that by displaying the text that matched the
regex. If it starts matching odd things it'd be visible.

But I'm just sprouting ideas here, the proof is in the pudding. If the
logs are easily available (or a subset of, say the last month) then
people could play with that and see what happens...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Question about tuplestore clients

2007-03-20 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes:
 However I've run into something I didn't expect. It seems merge joins keep a
 reference to a tuple *after* they set the mark beyond it. I'm trying to figure
 out why this is necessary but I haven't absorbed all of nodeMergejoin yet.

I think at the instant that ExecMarkPos is called, there are likely to
still be tuple slots holding references to the previously marked tuple.
It might work if you swap the two lines

ExecMarkPos(innerPlan);

MarkInnerTuple(node-mj_InnerTupleSlot, node);

However, the whole thing sounds a bit fragile.  If tuplestore_gettuple
returns a tuple with shouldfree = false, I think you had better assume
that that tuple can be referenced until after the next
tuplestore_gettuple call, independently of mark/restore calls.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Patch for pg_dump

2007-03-20 Thread Tom Lane

Dany DeBontridder [EMAIL PROTECTED] writes:
 Usage:
  pg_dump -Q function_name  DATABASE export function_name
  pg_dump -QDATABASE  export all the functions

What of overloading?  And your switch syntax seems ambiguous anyway.

btw, I see no patch here...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Tom Lane

Martijn van Oosterhout kleptog@svana.org writes:
 But I'm just sprouting ideas here, the proof is in the pudding. If the
 logs are easily available (or a subset of, say the last month) then
 people could play with that and see what happens...

Anyone who wants to play around can replicate what I did, which was to
download the table that Andrew made available upthread, and then pull
the log files matching interesting rows.  I used the attached functions
to generate URLs for the failing stage logs, and then a shell script
looping over lwp-download ...

CREATE FUNCTION lastfile(mfailures) RETURNS text
AS $$
select replace(
'show_stage_log.pl?nm=' || $1.sysname || 'dt=' || $1.snapshot ||
'stg=' ||
replace($1.log_archive_filenames[array_upper($1.log_archive_filenames, 1)],
'.log', ''),
  ' ', '%20')
$$
LANGUAGE sql;

CREATE FUNCTION lastlog(mfailures) RETURNS text
AS $$
select 'http://www.pgbuildfarm.org/cgi-bin/' || lastfile($1)
$$
LANGUAGE sql;


regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Jeff Davis

On Tue, 2007-03-20 at 18:12 +0100, Josh Berkus wrote:
 Tom,
 
  Actually, I think you don't particularly need stats for that in most
  cases --- if the planner simply took note that the FK relationship
  exists, it would know that each row of the FK side joins to exactly
  one row of the PK side, which in typical cases is sufficient.
 
 Is it?  What about the other direction?  Currently, doesn't the planner 
 assume that the rowcount relationship is 1 to ( child total rows / 
 parent total rows) ?  That's ok for tables with relatively even 
 distribution, but not for skewed ones.
 

In theory, the PK constrains the available values of the FK, but doesn't
provide any additional information about the relationship between the
columns. 

However, in practice there is limited space to store MCVs and limited
accuracy to n_distinct. So there may be a reason to store more
information, but I don't know what we'd store. Do we have reports of bad
estimates by the planner in this situation?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Tom Lane

Josh Berkus josh@agliodbs.com writes:
 Actually, I think you don't particularly need stats for that in most
 cases --- if the planner simply took note that the FK relationship
 exists, it would know that each row of the FK side joins to exactly
 one row of the PK side, which in typical cases is sufficient.

 Is it?  What about the other direction?

I recall that we had decided at the Greenplum meeting last year that
we could use a better heuristic if we noted that a join was being done
on an FK-and-PK combination, but I don't recall the details right at
the moment.  Did anyone take notes?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Interaction of PITR backups and Bulkoperationsavoiding WAL

2007-03-20 Thread Jim C. Nasby

On Fri, Mar 09, 2007 at 04:57:18PM +, Simon Riggs wrote:
 On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:
   It strikes me that allowing archive_command to be changed on the fly
   might not be such a good idea though, or at least it shouldn't be
   possible to flip it from empty to nonempty during live operation.
  
   I'd rather fix it the proposed way than force a restart. ISTM wrong to
   have an availability feature cause downtime.
  
  I don't think that people are very likely to need to turn archiving on
  and off on-the-fly.  Your proposed solution introduces a great deal of
  complexity (and risk of future bugs-of-omission, to say nothing of race
  conditions) to solve a non-problem.  We have better things to be doing
  with our development time.
 
 It's certainly a quicker fix. Unless others object, I'll set
 archive_command to only be changeable at server startup.

I think the docs should also explain why it's server-start only, since
if someone wanted to they could circumvent the behavior by having
archive_command call a shell script that changes it's behavior.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Money type todos?

2007-03-20 Thread Joshua D. Drake

Hello,

The money type is considered deprecated. I was also under the impression
it would be eventually removed. Why are we accumulating TODOs for it?

# -Make 64-bit version of the MONEY data type
# Add locale-aware MONEY type, and support multiple currencies

http://archives.postgresql.org/pgsql-general/2005-08/msg01432.php

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Interaction of PITR backups and Bulkoperationsavoiding WAL

2007-03-20 Thread Tom Lane

Jim C. Nasby [EMAIL PROTECTED] writes:
 On Fri, Mar 09, 2007 at 04:57:18PM +, Simon Riggs wrote:
 It's certainly a quicker fix. Unless others object, I'll set
 archive_command to only be changeable at server startup.

 I think the docs should also explain why it's server-start only, since
 if someone wanted to they could circumvent the behavior by having
 archive_command call a shell script that changes it's behavior.

Um, what's the problem with that?  The concern was about whether PG
would produce consistent WAL output, not whether the archive_command
actually needed to do anything.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Indexam interface proposal

2007-03-20 Thread Heikki Linnakangas


Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Martijn van Oosterhout wrote:

IIRC indexes can already ask to have the system recheck conditions on
returned tuples. For example GiST can return more tuples than actually
match. That's what the amopreqcheck column is for in pg_amop.


Right, except that flag is per operator in operator class, and what I'm 
proposing is that the index could pass a flag per tuple in the scan. 


The reason for attaching the flag to operators is so that the system
(particularly the planner) can tell *which* conditions need to be
rechecked, and can prepare the necessary expression infrastructure.
I dislike the idea of having to be prepared to do that every time
for every indexscan.  


I don't see any big down-side in preparing for that. We'd need to always 
store the original index quals in the executor node, like we do now with 
recheck-flagged operators, but that doesn't seem too bad to me.


I suppose we would want to keep the existing per-operator recheck-flag 
and quals as it is, and add another field like indexqualorig to be used 
to recheck tuples amgetnext flags as candidates.



The notion of having to be prepared to sort
(according to what?) is even worse.


That we wouldn't need for clustered indexes, if we change the current 
design a bit. Either:

* store a sorted list of offsetnumbers for each group, instead of a bitmap,
* or store a bitmap like now, but require that heap tuples in a grouped 
index tuple are in cluster order within the heap page.


The first option eats away some of the space savings, the second option 
makes clustered indexes to become declustered quicker if there's 
out-of-order updates or inserts.


Choosing either option would also reduce the CPU overhead of index 
scans, because we could use binary search within a grouped index tuple.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[HACKERS] SoC Ideas for people looking for projects

2007-03-20 Thread Benjamin Arai


Hi,

If you are looking for a SoC idea, I have listed a couple below.  I  
am not sure how good of an idea they are but I have ran into the  
following limitations and probably other people have as well in the  
past.


1. Can user based priorities be implemented as a summer project?  To  
some extent it has already been implemented in research (http:// 
www.cs.cmu.edu/~bianca/icde04.pdf), so it is definitely possible and  
scalable.


2. Distributed full-text indexing.  This one I am really not sure how  
possible it is but  (TSearch2) very scalable (cannot do multi  
terabyte fulltext indexes).  Maybe some sort system could be devised  
to perform fulltext searches over multiple systems and merge the  
ranked results at some root node.


Benjamin

On Mar 20, 2007, at 10:07 AM, Josh Berkus wrote:


Students  Professors,

There are only 5 days left to submit your PostgreSQL Google Summer  
of Code Project:

http://www.postgresql.org/developer/summerofcode.html

If you aren't a student, but know a CS student interested in  
databases, testing, GUIs, or any other OSS coding, please point  
them to our SoC page and encourage them to apply right away!


If you are a student, and you've been trying to perfect your  
application, please go ahead and submit it ... we can't help you if  
you miss the deadline, but we can help you fix an incomplete  
application.


--Josh Berkus

---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[HACKERS] Bitmapscan changes - Requesting further feedback

2007-03-20 Thread Joshua D. Drake

Hackers et al... I was wondering if there are any outstanding issues
that need to be resolved in terms of the clustered index/bitmap changes?

From the testing that I have done, plus a couple of others it is a net
win (at least from DBA space).

Sincerely,

Joshua D. Drake



-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Andrew Dunstan


Tom Lane wrote:

Martijn van Oosterhout kleptog@svana.org writes:
  

But I'm just sprouting ideas here, the proof is in the pudding. If the
logs are easily available (or a subset of, say the last month) then
people could play with that and see what happens...



Anyone who wants to play around can replicate what I did, which was to
download the table that Andrew made available upthread, and then pull
the log files matching interesting rows.  
  

[snip]


To save people this trouble, I have made an extract for the last 3 
months, augmented by log field, which is pretty much the last stage log. 
The dump is 27Mb and can be got at


http://www.pgbuildfarm.org/tfailures.dmp

cheers

andrew




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Joshua D. Drake

Andrew Dunstan wrote:
 Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  
 But I'm just sprouting ideas here, the proof is in the pudding. If the
 logs are easily available (or a subset of, say the last month) then
 people could play with that and see what happens...
 

 Anyone who wants to play around can replicate what I did, which was to
 download the table that Andrew made available upthread, and then pull
 the log files matching interesting rows.
 [snip]
 
 
 To save people this trouble, I have made an extract for the last 3
 months, augmented by log field, which is pretty much the last stage log.
 The dump is 27Mb and can be got at
 
 http://www.pgbuildfarm.org/tfailures.dmp

Should we just automate this and make it a weekly?

 
 cheers
 
 andrew
 
 
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Jim C. Nasby

On Mon, Mar 19, 2007 at 06:55:56PM -0700, Jeff Davis wrote:
 On Mon, 2007-03-19 at 21:24 -0400, Tom Lane wrote:
  Jeff Davis [EMAIL PROTECTED] writes:
   We can already keep stats for a functional index. Is there a reason we
   can't keep stats for a multi-column index?
  
  The questions that need to be answered are (1) what stats are you gonna
  collect, and (2) exactly what are you going to do with them when you
  have 'em?
  
  All the previous discussions have stalled on the question of how to
  avoid trying to collect stats about an exponentially large number of
  column combinations; we've never even reached the question of what
  stats we'd actually want given that a particular combination has been
  determined to be interesting.  Perhaps that's a trivial question,
  but it's been a mighty long time since I took statistics ...
  
 
 I know we can't keep stats on every combination of columns. My initial
 idea would be to only keep stats about a multi-column index (and
 probably optional for those, too).
 
 My thinking was that we could keep a histogram (and MCVs, etc.) of the
 non-scalar key in the multi-column index. That would provide the data
 the planner needs to answer a query like WHERE a = 1 and b  1000 if a
 and b are dependent and you have an index on (a,b).
snip 
 AndrewSN pointed out on IRC that keeping a histogram of non-scalar
 values is not as easy as I thought, because PostgreSQL doesn't allow
 arrays of composite types, among other problems.

I don't think the array problem is that big a deal, since PostgreSQL
doesn't enforce array dimensions at all. You can just make the arrays
for multi-column stats 2 dimensional, though handling indexes with
different data types among the columns would be a bit tricky... right
now the only choice I can think of would be to require that values could
be cast to and from text and just store text in the array. Though
obviously it'd be better to just allow arrays of composite types...

The other challenge is that you can't make all the same assumptions with
a multi-field histogram that you can with a single-field one. For
example, if this is our index:

a   b
-   -
1   1
1   2
...
1   1000
2   500
2   501
...
3   5000

The histogram would likely position the buckets such that 1,1000 and
2,500 would fall within one bucket, which means the planner has no idea
that b doesn't exceed 1000 when a is 1. I'm not sure how big of an issue
that is in reality, though, because the planner does know that the
bucket can only represent so many rows.

It might be worth coming up with a different means to store the
histogram for the multi-column case.

 Is this a worthwhile area of exploration?

ISTM it trips people up often enough to make it worth at least
exploring...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Andrew Dunstan


Joshua D. Drake wrote:

Andrew Dunstan wrote:
  

Tom Lane wrote:


Martijn van Oosterhout kleptog@svana.org writes:
 
  

But I'm just sprouting ideas here, the proof is in the pudding. If the
logs are easily available (or a subset of, say the last month) then
people could play with that and see what happens...



Anyone who wants to play around can replicate what I did, which was to
download the table that Andrew made available upthread, and then pull
the log files matching interesting rows.
  

[snip]


To save people this trouble, I have made an extract for the last 3
months, augmented by log field, which is pretty much the last stage log.
The dump is 27Mb and can be got at

http://www.pgbuildfarm.org/tfailures.dmp



Should we just automate this and make it a weekly?

  


Sure. Talk to me offline about it - very simple to do.

cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Stats for multi-column indexes

2007-03-20 Thread Tom Lane

Jim C. Nasby [EMAIL PROTECTED] writes:
 It might be worth coming up with a different means to store the
 histogram for the multi-column case.

A separate array for each column involved seems a whole lot less
fragile than pretending we can handle mixed-type arrays.

We probably need a different catalog anyway, or at least a reimagining
of pg_statistic, since it doesn't hold more than one value of staattnum
per row.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Money type todos?

2007-03-20 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 The money type is considered deprecated. I was also under the impression
 it would be eventually removed. Why are we accumulating TODOs for it?

Because doing the TODOs would remove the reasons for deprecating it.

Whether it is actually ever going to disappear is not agreed upon.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Fixing hash index build time

2007-03-20 Thread Tom Lane

There are several reasons why Postgres' hash indexes currently suck,
but one of the bigger ones is that the time to build an index on a
large existing table is excessive, eg
http://archives.postgresql.org/pgsql-novice/2007-03/msg00064.php

I'm not sure if this has been discussed before, but I suddenly realized
while responding to the above message that the reason for the awful
performance is pretty obvious: hashbuild starts with a minimum-size
index (two buckets) and repeatedly splits buckets as insertions are
done, exactly the same as ordinary dynamic growth of the index would do.
This means that for an N-row table, approximately N/entries-per-bucket
splits need to occur during index build, which results in roughly O(N^2)
performance because we have to reprocess already-inserted entries over
and over.  This explains the empiric observation I made a long time ago:
http://archives.postgresql.org/pgsql-hackers/2002-04/msg01379.php

This could be fixed with a relatively small amount of new code: when
beginning hashbuild, estimate the parent table's rowcount (the same
method used by the planner will do fine, viz RelationGetNumberOfBlocks
times an estimated tuple density) and construct the appropriate number
of buckets immediately.  No splits, just write out empty pages as fast
as we can.  *Then* do the insertions.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Money type todos?

2007-03-20 Thread D'Arcy J.M. Cain

On Tue, 20 Mar 2007 11:24:00 -0700
Joshua D. Drake [EMAIL PROTECTED] wrote:
 The money type is considered deprecated. I was also under the impression
 it would be eventually removed. Why are we accumulating TODOs for it?
 
 # -Make 64-bit version of the MONEY data type

Actually, this TODO is DONE.  It's in HEAD now.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Money type todos?

2007-03-20 Thread Neil Conway


D'Arcy J.M. Cain wrote:

On Tue, 20 Mar 2007 11:24:00 -0700
Joshua D. Drake [EMAIL PROTECTED] wrote:
  

# -Make 64-bit version of the MONEY data type



Actually, this TODO is DONE.  It's in HEAD now.
  


That is what the - prefix denotes.

-Neil


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] modifying the tbale function

2007-03-20 Thread Florian G. Pflug


Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

Here is a paper about portable userspace threads that I just googled.
http://www.gnu.org/software/pth/rse-pmt.ps


I'm not impressed --- the list of unsolved problems is a bit too long.
One that seems a showstopper for our purposes is lack of ability to
deal reliably with stack overflow on alternate stacks.  If we're going
to be doing anything as loosely defined as running a third-party
language interpreter in these stacks, that just won't do.

Another little problem is it's LGPL.


I don't think using that GnuPT library the paper mentions is a viable
approach for postgres. I just posted that link because it shows
that this is not impossible to do.

What would actually be interesting is to find out of the
ucontext/getcontext/makecontext/swapcontext/setcontext
functions mentioned in the paper are available on the
plattform where postgres is used in production for larger
dataset. I'd assume that people who need this PL optimization
will run a quite update-to-date version of their particular OS,
so maybe the portability problems of those functions wouldn't
be a problem for postgres - the PL optimization would just be
disabled at configure time if they are not available.

The main question is: IF the stability problems like stack overflow
can be addressed, would this be in principle considered to be feature
that people would like to have? Or is it considered not worth the effort?

greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] Question about the TODO, numerics, and division

2007-03-20 Thread Chris Travers


Hi all;

I have been looking at the TODO and have found something that I find 
sort of odd and we should probably reconsider:


One of the items under data types is:

   * Add NUMERIC division operator that doesn't round?

 Currently NUMERIC _rounds_ the result to the specified precision.
 This means division can return a result that multiplied by the
 divisor is greater than the dividend, e.g. this returns a value  10:

SELECT (10::numeric(2,0) / 6::numeric(2,0))::numeric(2,0) * 6;

This does not seem to me to be an division op issue but rather a simple 
casting mistake.  Note that the result of 10/6 is cast as numeric(2,0) 
and then multiplied by 6.


The following example shows that the problem is with the query and 
casting, not with the division op:

SELECT ((10::numeric(2,0) / 6::numeric(2,0)) * 6)::numeric(2,0);
numeric
-
 10
(1 row)


Am I missing something?

Best Wishes,
Chris Travers


begin:vcard
fn:Chris Travers
n:Travers;Chris
email;internet:[EMAIL PROTECTED]
tel;work:509-888-0220
tel;cell:509-630-7794
x-mozilla-html:FALSE
version:2.1
end:vcard


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Bitmapscan changes - Requesting further feedback

2007-03-20 Thread Gavin Sherry

On Tue, 20 Mar 2007, Joshua D. Drake wrote:

 Hackers et al... I was wondering if there are any outstanding issues
 that need to be resolved in terms of the clustered index/bitmap changes?

 From the testing that I have done, plus a couple of others it is a net
 win (at least from DBA space).

Not sure if you're talking about bitmap indexes here. If so, I'm working
on VACUUM support.

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Make TIMESTAMP + TIME in the source code

2007-03-20 Thread Martijn van Oosterhout

On Mon, Mar 19, 2007 at 06:16:57PM -0400, Luis D. García wrote:
 Hello, I'm writinng again because I need to make some validation for
 the incoming data (VALUES... on the INSERT-Stmt). This validation
 is based on adding a TIMESTAMP and a TIME kind of data.
 
 From the Shell I can make:
 
 template1=# select timestamp '02/03/85 7:00pm'+'00:00:02.3450' as
 TIMESTAMP+TIME;

You're confused. You're not adding TIME, you're adding an INTERVAL,
which is something quite different.

 That's exactly wath I need to do, obtain the result from adding a TIMESTAMP
 and a TIME, but I must do it from the Postgre's source code, since I'm
 working
 on modifying Postgres for a University Project (here in Venezuela we call it
 Tesis or Graduation Special Work.

Well, any function you can call from SQL you can call from C, see the
OidFunctionCall functions.

 PD: in adition, both data (TIMESTAMP and TIME) are stored in strings (char
 *).

Err, why, when there are perfectly good datatypes to store that type of
data.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Bitmapscan changes - Requesting further feedback

2007-03-20 Thread Joshua D. Drake

Gavin Sherry wrote:
 On Tue, 20 Mar 2007, Joshua D. Drake wrote:
 
 Hackers et al... I was wondering if there are any outstanding issues
 that need to be resolved in terms of the clustered index/bitmap changes?

 From the testing that I have done, plus a couple of others it is a net
 win (at least from DBA space).
 
 Not sure if you're talking about bitmap indexes here. If so, I'm working
 on VACUUM support.

I was talking about the patch for Clustered indexes and I realize now I
might have used the wrong thread. ;

Joshua D. Drake

 
 Gavin
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixing hash index build time

2007-03-20 Thread Tom Lane

I wrote:
 I'm not sure if this has been discussed before, but I suddenly realized
 while responding to the above message that the reason for the awful
 performance is pretty obvious: hashbuild starts with a minimum-size
 index (two buckets) and repeatedly splits buckets as insertions are
 done, exactly the same as ordinary dynamic growth of the index would do.
 This means that for an N-row table, approximately N/entries-per-bucket
 splits need to occur during index build, which results in roughly O(N^2)
 performance because we have to reprocess already-inserted entries over
 and over.

Well, unfortunately this theory seems to be all wet.  Given that the
bucket loading is reasonably even, the time to split a bucket is about
constant and so there's no O(N^2) effect.  (The multiplier hidden inside
O(N) is pretty awful, but it doesn't change with N.)

The real reason why performance falls off a cliff for building large
hash indexes seems to be much harder to fix: basically, once the size
of your index exceeds working memory, it's nap time.  Given that the
incoming data has randomly distributed hash values, each bucket is about
as likely to be touched next as any other; there is no locality of
access and so the working set is the same size as the index.  Once it
doesn't fit in RAM anymore you're into swap hell.

The only way I can see to fix that is to try to impose some locality of
access during the index build.  This is not impossible: for example,
given a choice for the number of buckets, we could sort all the index
tuples by hashed bucket number and then start inserting.  btree does a
preliminary sort, and its index build times are way more reasonable
than hash's currently are, so the cost of the sort isn't outrageous.
(I note this is mainly because we know how to do sorting with locality
of access...)  Before we start inserting we will know exactly how many
tuples there are, so we can pre-create the right number of buckets and
be sure that no on-the-fly splits will be needed for the rest of the
build.  If we guessed wrong about the number of buckets there will be
some places in the process where we concurrently insert into several
buckets not just one, or perhaps come back to a bucket that we touched
earlier, but that's still maintaining plenty of locality of access.

This is looking like more work than I want to do in the near future,
but I thought I'd put it into the archives for someone to tackle.
Bruce, would you add a TODO item linking to this:

* Improve hash index build time by sorting

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Question about the TODO, numerics, and division

2007-03-20 Thread Tom Lane

Chris Travers [EMAIL PROTECTED] writes:
 I have been looking at the TODO and have found something that I find 
 sort of odd and we should probably reconsider:

 One of the items under data types is:

 * Add NUMERIC division operator that doesn't round?

   Currently NUMERIC _rounds_ the result to the specified precision.
   This means division can return a result that multiplied by the
   divisor is greater than the dividend, e.g. this returns a value  10:
   SELECT (10::numeric(2,0) / 6::numeric(2,0))::numeric(2,0) * 6;

I agree that the TODO item is pretty bogus as worded.  A closer look
at what's going on is:

regression=# SELECT (10::numeric(2,0) / 6::numeric(2,0)) ;
  ?column?

 1.6667
(1 row)

and of course if you multiply that by 6 you get

regression=# SELECT (10::numeric(2,0) / 6::numeric(2,0)) * 6;
  ?column?
-
 10.0002
(1 row)

However this seems basically insoluble.  The TODO item seems to imagine
that it would be better if the division returned 1.,
but AFAICS that answer is actually *less* accurate:

regression=# select 1. * 6;
  ?column?

 9.9996
(1 row)

regression=#

The only way to make it more accurate is to return more decimal places,
but you'll never get an exact result, because this is a nonterminating
fraction.

There may be a use for a division operator that rounds the last returned
digit towards minus infinity instead of to nearest, but the TODO entry
is utterly unconvincing as an argument for that.  Does anyone recall
what the original argument was for it?  Perhaps the TODO entry is
just mis-summarizing the discussion.

A separate question is whether the division operator chooses a good
default for the number of digits to return.  You can make it compute
more digits by increasing the scale values of the inputs:

regression=# SELECT (10::numeric(32,30) / 6::numeric(2,0)) ;
 ?column?
--
 1.67
(1 row)

but I wouldn't want to defend the details of the rule about how many
fractional digits out given so many fractional digits in.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Patch for pg_dump

2007-03-20 Thread Bruce Momjian


And the patch is so small, it is invisible (missing).  ;-)

---

Dany DeBontridder wrote:
 Here is a (small) patch to give the ability to pg_dump to export only the
 functions (or only one), very useful when you often develop with psql (
 postgresql.8.2.3)
 
 Usage:
  pg_dump -Q function_name  DATABASE export function_name
  pg_dump -QDATABASE  export all the functions
 
 This patch is distributed under the BSD licence
 
 
 Regards,
 
 
 
 D.
 
 
 PS: I hope it is the correct ml, otherwise excuse me, it is the first time I
 propose a patch for postgresql

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Caught up on email

2007-03-20 Thread Bruce Momjian


I have read all the email from when I was on vacation.  I am now fully
back at work.

---

bruce wrote:
 I am back from vacation and reading email again.  I should be caught up
 in less than a week (2.6k emails), and easily ready for feature freeze
 April 1.
 
 For the curious, I took a 10-day family vacation in Costa Rica.  A
 private tour guide took us to three areas of Costa Rica.  We saw an
 active volcano, water falls, animals, plants, and the ocean.  We went
 horseback riding, hiking, white-water rafting, and swimming.  One
 interesting thing was that our tour guide was named Alvaro Herrera, not
 our Alvaro Herrera from Chile, but another one from Costa Rica.  I am
 sure their middle names differed.
 
 -- 
   Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
   EnterpriseDB   http://www.enterprisedb.com
 
   + If your life is a hard drive, Christ can be your backup. +

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] GSoC's possible project

2007-03-20 Thread Germán Poó Caamaño

I'm a student and I'm planning to submit a project for Google Summer of
Code.  I would like to receive feedback about to implement the vacumm
scheduling in order to allow maintenance's windows.

I have read the whole discussion about Autovacuum improvements[1]
taken in January.  Alvaro Herrera suggested two improvements,
(probably following an idea of Matthew T. O'Connor), which were:
- Scheduling (a maintenance window)
- Process Handling.

As fas as I know, the process handling is a work in progress[2], while
the scheduling is defined (almost complete) but not implemented yet.

Which is not clear to me (yet), if it would required collecting enough
information through implementing a sort of 'VACUUM SUMMARY'[3].

I would like to receive opinions or suggestions about picking this task.

I haven't explained in more detail what is this scheduling, because I
think, it was clear enough, as far as I understood, in the thread.

Thanks in advance, 

[1] http://archives.postgresql.org/pgsql-hackers/2007-01/msg00684.php
[2] http://archives.postgresql.org/pgsql-hackers/2007-03/msg00639.php
[3] http://archives.postgresql.org/pgsql-hackers/2005-07/msg00409.php

-- 
Germán Poó Caamaño
Concepción - Chile


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] CREATE INDEX and HOT - revised design

2007-03-20 Thread Bruce Momjian


I have read the HOT discussion and wanted to give my input.  The major
issue is that CREATE INDEX might require a HOT chain to be split apart
if one of the new indexed columns changed in the HOT chain.

As for the outline below, there is no way we are going to add new ALTER
TABLE and CHILL commands to make this work.  Can you imagine what kind
of warts we would have in the system if we added such things every time
we got stuck?  And once the warts are in, it is hard to educate people
once they are removed.  We need to keep going until we have a solution
that is as user-invisible as possible.  While I understand the
frustration that we have not discussed this enough, I don't want us
rushing to a solution either until it has been totally thought through.

HOT is not a feature only a few people are going to want to use ---
everyone will want it, and if the user process is cumbersome, we will get
never-ending questions about how to make it work.

Let's all think about this for the next few days.

---

Pavan Deolasee wrote:
 
 
 There are  few things I realized over the weekend while going
 through the code:
 
 1. It looks like a bad idea to use ALTER TABLE ..  to chill a table
 becuase ALTER TABLE takes AccessExclusive lock on the table.
 But it would still be a good idea to have ALTER TABLE .. to turn
 HOT-updates ON/OFF.
 
 2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
 its lock anyways and is prone to deadlock. So as long as we don't
 create new deadlock scenarios, we should be fine.
 
 3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
 So its should be acceptable if we run CHILL as a seperate transaction.
 
 4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
 thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
 VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
 race conditions between all of these.
 
 
 So here is my plan. Let me know your suggestions/comments/objections.
 
 
 Changes to pg_class and new DDLs:
 -
 
 We add two boolean attributes to pg_class: hot_update and hot_fetch.
 
 We introduce two DDLs to turn HOT on/off. Note that the DDL
 itself does not CHILL the table, but only affects the
 subsequent UPDATEs.
 
 postgres=# ALTER TABLE tblname ENABLE HOT;
 postgres=# ALTER TABLE tblname DISABLE HOT;
 
 These DDLs would acquire AccessExclusive lock on the table and
 set hot_update to true/false using simple_heap_update()
 
 
 CREATE INDEX [CONCURRENTLY]:
 
 
 If a HEAP_ONLY tuple is found, error out with a HINT to run
 CHILL on the table and then retry.
 
 If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
 the table.
 
 
 CHILL utility:
 --
 
 We introduce a new command to chill a table. The syntax for the
 same could be:
 
 postgres=# CHILL [VERBOSE] tblname;
 
 UPDATE/INSERT/SELECT would work while the table is being chilled.
 But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL
 would be locked out. As a side-effect, HOT-updates are turned off on the
 table and explicit ALTER TABLE ENABLE HOT is required to turn
 HOT-updates on again.
 
 Here is the algoirthm to CHILL table.
 
 1. Check if CHILL is running inside a transaction block, error
 out if so.
 
 2. Start a new transaction
 
 3. Acquire ShareUpdateExclusiveLock on the relation. This would
 allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent
 CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY]
 
 4. Set hot_update to false and update pg_class using
 simple_heap_update()
 
 5. Acquire ShareUpdateExclusiveLock for the entire session.
 
 6. Commit the transaction
 
 7. Start a new transaction
 
 8. Wait for all transactions in the current snapshot to finish.
 This would ensure that there are no HOT-updates possible further
 
 9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting
 appropriate index entries and setting CHILL_IN_PROGRESS flag.
 WAL log the operation
 
 10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't
 be sure whether the corresponding index entry already exists
 or not. One option is to error out and force VACUUM on the table.
 Alternatively, the index_insert can be enhanced to check if a
 the same entry already exists.
 
 11. When the entire heap is chilled, set hot_fetch to false
 and update pg_class using simple_heap_update()
 
 12. Commit the transaction
 
 13. Start a new transaction
 
 14. Wait for all transactions in the current snapshot to finish.
 This would ensure that all the subsequent index scans would
 only use direct path from the index.
 
 15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY
 and HOT_UPDATED flags. WAL log the operations. We may not need
 this, but we can revisit this later to optimize WAL logging.
 
 16. When the second scan is complete, set hot_fetch to
 true and update pg_class using

Re: [HACKERS] Money type todos?

2007-03-20 Thread Dennis Bjorklund


Tom Lane skrev:

The money type is considered deprecated. I was also under the impression
it would be eventually removed. Why are we accumulating TODOs for it?


Because doing the TODOs would remove the reasons for deprecating it.

Whether it is actually ever going to disappear is not agreed upon.



What is the reason to keep it? The arguments I've seen is that numeric 
is too slow to use when you have a lot of money calculations to perform.


But with that argument we should instead make a general artitmetic type 
that is fast and useful to more things than just money. Just drop the 
currency from money and we have one such type.


Would we accept other money-like types, with other units? Like kilogram, 
 liter, yards, square meters, and so on? And what use is the unit in 
money? It's not like it will do currency conversion or anything like that.


I think money should go away and the database should provide more 
general types.


/Dennis

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[HACKERS] HOT WIP Patch - Version 5.0

2007-03-20 Thread Pavan Deolasee



The version 5.0 of HOT WIP patch is posted on pgsql-patches. This
fixes the VACUUM FULL issue with HOT. In all the earlier versions,
I'd disabled VACUUM FULL.

When we move the HOT-chain, we move the chains but don't carry
the HOT_UPDATED or HEAP_ONLY flags and insert as many index
entries as there are tuples in the chain. IOW the HOT-update
is actually turned into a COLD chain.

Apart from this, I'd to make some changes to the VACUUM FULL
code so that the number of indexed tuples is counted
correctly. With HOT, whenever a HEAP_ONLY tuple is moved, an
additional index entry is generated and this needs to be
taken into account.

Please let me know comments/suggestions.


Thanks,
Pavan

--


EnterpriseDBhttp://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Money type todos?

2007-03-20 Thread Tom Lane

Dennis Bjorklund [EMAIL PROTECTED] writes:
 Tom Lane skrev:
 Whether it is actually ever going to disappear is not agreed upon.

 What is the reason to keep it?

The words-of-one-syllable answer is that D'Arcy Cain is still willing
to put work into supporting the money type, and if it still gets the
job done for him then it probably gets the job done for some other
people too.

Personally, as a former currency trader I've not seen any proposals on
this list for a money type that I'd consider 100% feature complete.
The unit-identification part of it is interesting, but pales into
insignificance compared to the problem that the unit values vary
constantly; what's more, that variance is not to be swept under the rug
but is exactly the data that you are interested in.  Next, the units
themselves change from time to time (euro? what's that?); next, the
interconversion rates aren't all exactly equivalent, and that's not
noise either but rather very interesting data (see arbitrage).

So I'm not feeling inclined to try to prescribe that datatype X is
good while datatype Y is bad.  It's more about whether there's an
audience for any particular datatype definition.  The present money
code gets the job done for D'Arcy and probably some other people,
and we see some straightforward ways to improve it to serve some
more cases, so what's wrong with pursuing that path?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

66 matches

Mail list logo