Re: [HACKERS] IN vs EXISTS equivalence

2008-09-03 Thread daveg
On Thu, Aug 14, 2008 at 06:50:09PM +0100, Simon Riggs wrote:
 
 On Fri, 2008-08-08 at 16:23 -0400, Tom Lane wrote:
 
  NOT IN is a lot trickier,
  condition: you must also assume that the comparison operator involved
  never yields NULL for non-null inputs.  That might be okay for btree
  comparison functions but it's not a very comfy assumption in general;
  we certainly haven't got any explicit knowledge that any functions are
  guaranteed to act that way.  So this case might be worth doing later
... 
 Just found this comment, after reading what you said on other thread
 about NOT IN.
 
 NOT IN is a serious performance issue for most people. We simply can't
 say to people you were told not to.
 
 If we can fix it easily for the majority of cases, we should. We can't
 let the it won't work in certain cases reason prevent various

A suggestion: what about adding an attribute to functions to declare that
they never return null? 

   declare foo(int, int) returns int immutable not null as ...


-dg


-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-03 Thread Heikki Linnakangas

Simon Riggs wrote:

On Tue, 2008-09-02 at 15:51 +0300, Heikki Linnakangas wrote:


The needs of access to the rows are so different that it seems best to
me to delegate the buffering to the window function.


That seems sensible in some ways, not others.


In the API I proposed later in that mail, the buffering is actually done 
by the executor node, not by the window function. Instead, the window 
function can request abitrary rows of the frame from the executor, and 
can signal that some rows are no longer required, allowing them to be 
discarded.



Some of the window functions, like lead and lag merely specify window
size and shape for other functions to act upon.


I don't  understand that. LEAD/LAG return a value. They don't affect the 
size or shape of the window in any way. It doesn't affect other functions.



For those types of
request I don't see any need for custom functions, whereas for the
comparison/calculation functions there might be a need.

We don't need to implement all the things the SQL Standard calls window
functions with a 1:1 mapping to Postgres functions.


Sure, we have special hacks for things like MIN/MAX already. But using 
PostgreSQL functions does seem like the simplest solution to me, as the 
backend code can get quite complex if we have to add special handling 
for different window functions. LEAD/LAG fall quite nicely into the 
framework I proposed, but if something comes along that doesn't, then 
we'll have to extend the framework or add a special case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-03 Thread Heikki Linnakangas

Hitoshi Harada wrote:

2008/9/2 Heikki Linnakangas [EMAIL PROTECTED]:

Hitoshi Harada wrote:

2008/9/2 Heikki Linnakangas [EMAIL PROTECTED]:
In my understanding, the Window Frame is defined
by clauses such like ROWS BETWEEN ... , RANGE BETWEEN ...  or so,
contrast to Window Partition defined by PARTITION BY clause. A
frame slides within a partition or there's only one frame if those
clauses are not specified. The current patch has not implemented this
yet. I'll update the docs.

Yes, that's how I read it as well. Another way to think of it is that
there's always a window frame, but if no ROWS BETWEEN or similar clause is
given, the window frame spans the whole partition (or from the 1st row of
the partition, up to the current row, if ORDER BY is given).


I don't like to call the second type ranking aggregates because it
may refer to only ranking functions though there are more types of
function like ntile(), lead() and lag(). But window functions
doesn't seem appropriate either since it's ambiguous with the general
name of window expressions.

Yep, confusing :-(. The SQL standard says that a window function is one of:
a rank function, a distribution function, a row number function, a window
aggregate function, the ntile function, the lead function, the lag function,
the first-value function, the last-value function, the nth-value function.
So, window aggregate functions are a special class of window functions, and
there's no term to refer to all the rest of the window functions excluding
window aggregate functions.

Your docSQL spec
Window expression   Window function
Window function Any window function other than a window aggregate
function
Window aggregateWindow aggregate function

I tried to coin the term ranking aggregate for the SQL2008 term Any
window function other than a window aggregate function, but you're right
that that's still confusing, because the SQL2008 term rank function
includes only RANK() and DENSE_RANK().

The spec calls them group aggregate functions, when they're used with
GROUP BY, rather than as a window function. I think we could use that term.


Agree. So from now on, we use window functions for all kinds of
functions including window aggregates. Window expression is
discarded. Window functions also means the mechanism to support
these functions to process and this project.


Your proposal is smarter than the current implementation. But it
doesn't seem complete in some way. From logical point of view, the
window functions should be allowed to access whole of rows in the
frame the current row belongs to (c.f. inverse distribution
functions).

By the whole of rows, do you mean
a) the chosen value or expression of all rows, or
b) all columns of the input rows?


a). I mean all input rows in a window frame. But later I found
inverse distribution function is not one of window functions. That
is actually one of aggregate functions. Forget about it.


Different window functions have different needs. RANK() for example does
need to see all columns, to compare them, but it only needs access to the
previous and current row. CUME_DIST on the other hand needs access to all
columns of all rows, and LAG needs access to a specific column of a fixed
number of rows. And row_number needs nothing.

The needs of access to the rows are so different that it seems best to me to
delegate the buffering to the window function.


Delegating optimization to them depending on functions' needs is a
good idea. So executor can concentrate on the window function process
flow. Let's unify it in executor and let trivial optimizations get
into individual functions.


Actually, looking closer to the ranking functions, they don't really need
access to all columns. They just need to be able to compare them, according
to the window ordering, so we should probably only provide access to the
arguments of the aggregate, evaluated for any row in the frame, and a
comparator function, that can determine if any given rows in the frame are
, = or .


That is kind of problem. If your task is only to define that window
node executor simply stores window frame rows and pass them to window
functions as they need, the rank functions' needs don't come. As you
point out, rank functions need ordering key columns and its
comparators. So you have to imagine what comes next? What will be
wanted other than ordering key columns, if we think about universe
window functions much more than SQL spec says?


It might be a good idea to google around what window functions other 
DBMSs support, and see if this scheme could support all of them. I 
couldn't find any that it couldn't, but I didn't look very hard.



Let's look at the trivial, generic, and slow implementation first, and then
figure out what tricks we can do to make it faster. I gather that that
generic algorithm, following how the SQL spec defines the window frame,
looks like this:


So as to satisfy all of window functions' needs, Window object 

Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-03 Thread Simon Riggs

On Wed, 2008-09-03 at 09:51 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Tue, 2008-09-02 at 15:51 +0300, Heikki Linnakangas wrote:
  
  The needs of access to the rows are so different that it seems best to
  me to delegate the buffering to the window function.
  
  That seems sensible in some ways, not others.
 
 In the API I proposed later in that mail, the buffering is actually done 
 by the executor node, not by the window function. Instead, the window 
 function can request abitrary rows of the frame from the executor, and 
 can signal that some rows are no longer required, allowing them to be 
 discarded.

I'm happy with that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Heikki Linnakangas

Marko Kreen wrote:

On 9/2/08, Peter Eisentraut [EMAIL PROTECTED] wrote:

Marko Kreen wrote:

In the meantime, here is simple patch for case-insensivity.

 You might be able to talk me into accepting various unambiguous, common
alternative spellings of various units.  But for instance allowing MB and Mb
to mean the same thing is insane.


How would the docs for that look like?  And anyway, what is wrong with
Mb for megabytes?


I doesn't seem completely unreasonable to me that we'd want to express 
something in megabits/second in the future. For example, instead of 
vacuum_cost_delay, it would be cool to specify a bandwidth allowance. 
Megabits/second is a completely reasonable unit for that. Or a limit on 
network bandwidth.


FWIW, I don't feel very strongly either way. I'm more than happy with 
the status quo. The hint in the error message very clearly spells out 
what the valid values are, so it's immediately clear what you need to 
fix if you get that wrong.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IN vs EXISTS equivalence

2008-09-03 Thread Asko Oja
On Wed, Sep 3, 2008 at 9:17 AM, daveg [EMAIL PROTECTED] wrote:

 On Thu, Aug 14, 2008 at 06:50:09PM +0100, Simon Riggs wrote:
 
  On Fri, 2008-08-08 at 16:23 -0400, Tom Lane wrote:
 
   NOT IN is a lot trickier,
   condition: you must also assume that the comparison operator involved
   never yields NULL for non-null inputs.  That might be okay for btree
   comparison functions but it's not a very comfy assumption in general;
   we certainly haven't got any explicit knowledge that any functions are
   guaranteed to act that way.  So this case might be worth doing later
 ...
  Just found this comment, after reading what you said on other thread
  about NOT IN.
 
  NOT IN is a serious performance issue for most people. We simply can't
  say to people you were told not to.
 
  If we can fix it easily for the majority of cases, we should. We can't
  let the it won't work in certain cases reason prevent various

 A suggestion: what about adding an attribute to functions to declare that
 they never return null?

And if function still returns null then error will be raised?
Then you will end up adding NOT NULL also to IN and OUT parameters.
IIRC it was possible in Oracle to declare local variables NOT NULL.


   declare foo(int, int) returns int immutable not null as ...


 -dg


 --
 David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
 If simplicity worked, the world would be overrun with insects.

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Asko Oja
On Wed, Sep 3, 2008 at 11:20 AM, Heikki Linnakangas 
[EMAIL PROTECTED] wrote:

 Marko Kreen wrote:

 On 9/2/08, Peter Eisentraut [EMAIL PROTECTED] wrote:

 Marko Kreen wrote:

 In the meantime, here is simple patch for case-insensivity.

  You might be able to talk me into accepting various unambiguous, common
 alternative spellings of various units.  But for instance allowing MB and
 Mb
 to mean the same thing is insane.


 How would the docs for that look like?  And anyway, what is wrong with
 Mb for megabytes?

 From infamous wikipedia: A *megabit* is a unit of 
 informationhttp://en.wikipedia.org/wiki/Informationor computer storage, 
 abbreviated
*Mbit* (or *Mb*).
To me playing with case of acronyms and even depending on it seems more
insane. It would make much more sense to have case insensitive set of
acronyms and (thanks Tom for pointing out) some sanity checks when
configuration is loaded to notify user when wrong ones are used for some
context.


 I doesn't seem completely unreasonable to me that we'd want to express
 something in megabits/second in the future. For example, instead of
 vacuum_cost_delay, it would be cool to specify a bandwidth allowance.
 Megabits/second is a completely reasonable unit for that. Or a limit on
 network bandwidth.

There are less confusing (better) acronyms kbit/s and mbit/s available for
that.


 FWIW, I don't feel very strongly either way. I'm more than happy with the
 status quo. The hint in the error message very clearly spells out what the
 valid values are, so it's immediately clear what you need to fix if you get
 that wrong.

Is the database down during that time?


 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Marko Kreen
On 9/3/08, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 Marko Kreen wrote:
  On 9/2/08, Peter Eisentraut [EMAIL PROTECTED] wrote:
   Marko Kreen wrote:
In the meantime, here is simple patch for case-insensivity.
   
You might be able to talk me into accepting various unambiguous, common
   alternative spellings of various units.  But for instance allowing MB
 and Mb
   to mean the same thing is insane.
  
 
  How would the docs for that look like?  And anyway, what is wrong with
  Mb for megabytes?
 

  I doesn't seem completely unreasonable to me that we'd want to express
 something in megabits/second in the future. For example, instead of
 vacuum_cost_delay, it would be cool to specify a bandwidth allowance.
 Megabits/second is a completely reasonable unit for that. Or a limit on
 network bandwidth.

While it sounds theoretically useful, as a UI it's combines worst
from both worlds.  You now can confuse user with both case-sensitivity
and unit-mixup.  Even if we keep the current case-sensitivity, we should
set policy that units that differ only with case will never be accepted.

And the best way to set the policy in stone would be to make
units case-insensitive.

I like Asko's proposal of 'kbit/s' and 'mbit/s' for those - clear
and no chance of confusion.

  FWIW, I don't feel very strongly either way. I'm more than happy with the
 status quo. The hint in the error message very clearly spells out what the
 valid values are, so it's immediately clear what you need to fix if you get
 that wrong.

Well, the problem is that while database may come up with wrong values,
it may be unusable for actual loads, until admin browses the logs,
re-edits the config file and finally restarts in again.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Hannu Krosing
On Tue, 2008-09-02 at 16:50 +0300, Peter Eisentraut wrote:
 Gregory Stark wrote:
  Peter Eisentraut [EMAIL PROTECTED] writes:
  
  Marko Kreen wrote:
  In the meantime, here is simple patch for case-insensivity.
  You might be able to talk me into accepting various unambiguous, common
  alternative spellings of various units.  But for instance allowing MB and 
  Mb to
  mean the same thing is insane.
  
  Because you think some user will be trying to specify their shared_buffers 
  in
  bits?

If somebody wants then somebody can still claim that the units are
wrong, as SI specifies 1k = 1000 and 1M=100, etc

The binary should use IEC multipliers for kibi-, mebi-, tebi-, gibi-,
etc.

http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci825099,00.html

So we could beat correctness into DBA-s by forcing them to write KiB
MiB and so on for things that they have always written kb and mb (unless
they are communication engineers, in which case they may be used to
writing these as kB and MB)


 My concern is that this information does not stay in the configuration 
 files.  It will invariably leak out into whitepapers, presentations, 
 product documentation, and before long there will be confusion about why 
 you can't stuff N Mb over an N Mb connection.  

You can't even put N Mbits of data in one sec over N Mbits/sec
connection as there is some overhead from all protocol levels.

Clueless people will also be confused about why they can't fit 1Mbyte of
CVS data into a 1Mbyte database file or why you can't store 1024 MB of
data on a 1GB disk.

 I am not making this up.

Is there anything in conf files that can be reasonably specified in
bits ?

 Mb does not add any typing ease (as KB might) or readability (as sec 
might), and there is no respectable source that will claim it is an 
 acceptable alias for MB.

Are you really afraid that someone would want to use mb to mean
millibits ?

As SQL is generally case insensitive, it is quite surprising to most
people that GUC units are not.

-
Hannu




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-09-03 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 BTW, there are at least two copies of that code to be changed.  I'd
 suggest grepping for assignments to t_hoff to be sure there aren't more.

 I did send in a patch a while ago to get rid of the old HeapFormTuple() and
 friends.

 I remember discussing that idea, but I don't recall seeing an actual
 patch?  It would have to be quite large because of the number of places
 using the old way.  I'd also be a bit worried about breaking add-on
 modules to little purpose ...

Huh, apparently I did it but didn't actually send in the patch:

http://archives.postgresql.org/pgsql-hackers/2007-10/msg00851.php

I looked around and I don't seem to have it lying around any more. (Kind of
mystifying since I have tons of old source trees and patches, just not that
one.)

I could do the janitorial work again if we're interested.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window functions patch v04 for the September commit fest

2008-09-03 Thread Hitoshi Harada
2008/9/3 Heikki Linnakangas [EMAIL PROTECTED]:
 Hitoshi Harada wrote:

 I'd suggest:

 1. Implement Window node, with the capability to invoke an aggregate
 function, using the above API. Implement required parser/planner changes.
 Implement a few simple ranking aggregates using the API.
 2. Implement glue to call normal aggregates through the new API
 3. Implement the optimization to drop rows that are no longer needed
 (signal_cutoff can be a no-op until this phase)
 4. Implement window framing (the frame can always be all rows in the
 partition, or all rows until the current row, until this phase)
 5. Expose the new API to user-defined aggregates. It can be an internal
 API
 only used by built-in functions until this phase

 I believe you already have phase 1 in your patch, except for the API
 changes.

 I am willing to challenge to implement the API above, after maintain
 the current patch adding docs and tests. Since the API includes
 changes much more like Aggregate syntax than current patch, I'm not
 sure if I can finish it by next commit fest, which is said to be
 feature freeze. For safety, remain the current patch to review
 excluding API and executor then if I fail to finish use it for next
 release. Git helps it by cutting a branch, does it? How do you think?

 We do allow changes to the user manual after the feature freeze, so I'd
 suggest concentrating on the code and tests first. Code comments and
 internal docs are important, though, for easy review.

 I'm sure we won't get all the way to phase 5 for 8.4, but if we can even get
 1-3, plus some of the most important window functions, this this will be a
 great release!

OK, so first tests and internal docs/comments, then comes trying to
catch API , finally docs.

BTW, I think it is better to put together the discussion points we
have done as general roadmap to complete window functions. It is not
about the features for the next release but is the complete tasks.
Where to go? Wiki, or my design docs?

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Function call order dependency

2008-09-03 Thread pgsql
Is there a knowable order in which functions are called within a query in
PostgreSQL?

For example I'll use the Oracle contains function, though this is not
exactly what I'm doing, it just illustrates the issue clearly.

select *, score(1) from mytable where contains(mytable.title, 'Winding
Road', 1) order by score(1);

The contains function does a match against mytable.title for the term
'Winding Road' and both returns and saves an integer score which may be
retrieved later using the score(...) function. The integer used as a
parameter in score(...) and contains(...) is an index to reference which
score you need as more than one contains(...) call may be used in single
query.

This sets up an interesting issue, how can one ensure that contains() is
called prior to any score() function on each row? Is this possible? Is
there a specific order on which you can count?

Would it be something like: where clause first, left to right, followed
by select terms, left to right, and lastly the order by clause?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq object hooks (libpq events)

2008-09-03 Thread Andrew Chernow

Alvaro Herrera wrote:


There's one thing that seems a bit baroque, which is the
PG_COPYRES_USE_ATTRS stuff in PQcopyResult.  I think that flag
introduces different enough behavior that it should be a routine of its
own, say PQcopyResultAttrs.  That way you would leave out the two extra
params in PQcopyResult.

Oh -- one last thing.  I am not really sure about the flags to
PQcopyResult.  Should there really be flags to _remove_ behavior,
instead of flags that add?  i.e. instead of having 0 copy everything,
and have to pass flags for things not to copy, wouldn't it be cleaner to
have 0 copy only base stuff, and require flags to copy extra things?

a name is attached to every event proc, so that it can be 
reported in error messages




Can someone confirm that an event 'name' should be re-introduced, as 
suggested by Alvaro?


Can I get a happy or sad face in regards to below?

New options which add instead of remove.

#define PG_COPYRES_ATTRS  0x01
#define PG_COPYRES_TUPLES 0x02 /* Implies PG_COPYRES_ATTRS */
#define PG_COPYRES_EVENTS 0x04
#define PG_COPYRES_NOTICEHOOKS0x08

// tuples implies attrs, you need the attrs to copy the tuples.
if(options  PG_COPYRES_TUPLES)
  options |= PG_COPYRES_ATTRS; // auto set option

In regards to copying the attrs, the PQcopyResult still needs the 
ability to copy the source result's attrs.  Although, it doesn't need 
the ability to provide custom attrs (which I removed).  New prototype 
for copyresult:


PGresult *
PQcopyResult(const PGresult *src, int options);

I then added a PQsetResultAttrs.  copyresultattrs didn't seem like the 
correct name because you are no longer copying attrs from a source 
result.  You are providing the attrs to 'set'.


int
PQsetResultAttrs(PGresult *res, int numAttributes,
  PGresAttDesc *attDescs);

If the result provided to setattrs already contains attributes, I have 
the function failing (can't overwrite existing attrs).  I think this is 
good behavior


When PQcopyResult needs to copy the source result's attrs, it calls 
PQsetResultAttrs.


/* Wants attrs */
if((options  PG_COPYRES_ATTRS) 
   !PQsetResultAttrs(dest, src-numAttributes, src-attDescs))
{
  PQclear(dest);
  return NULL;
}

So, there is some nice code reuse which indicates to me the code is 
segmented well (copyres  setattrs).


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Joshua D. Drake

Hannu Krosing wrote:

On Tue, 2008-09-02 at 16:50 +0300, Peter Eisentraut wrote:

Gregory Stark wrote:

Peter Eisentraut [EMAIL PROTECTED] writes:



Are you really afraid that someone would want to use mb to mean
millibits ?

As SQL is generally case insensitive, it is quite surprising to most
people that GUC units are not.


We have had this discussion before, I even submitted a patch to make 
them case insensitive. In retrospect I was wrong to submit that patch. 
SQL may be case insensitive but units are not. MB != Mb != mb , I don't 
think we should encourage in any way for users to do the wrong thing.


Sincerely,

Joshua D. Drake




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function call order dependency

2008-09-03 Thread Tom Lane
[EMAIL PROTECTED] writes:
 For example I'll use the Oracle contains function, though this is not
 exactly what I'm doing, it just illustrates the issue clearly.

 select *, score(1) from mytable where contains(mytable.title, 'Winding
 Road', 1) order by score(1);

 The contains function does a match against mytable.title for the term
 'Winding Road' and both returns and saves an integer score which may be
 retrieved later using the score(...) function.

This is just a bad, bad idea.  Side-effects in a WHERE-clause function
are guaranteed to cause headaches.  When (not if) it breaks, you get
to keep both pieces.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function call order dependency

2008-09-03 Thread Stephen R. van den Berg
[EMAIL PROTECTED] wrote:
Would it be something like: where clause first, left to right, followed
by select terms, left to right, and lastly the order by clause?

I don't know what ANSI says, nor do I know what PostgreSQL exactly does
at the moment, but, the only thing you can reasonably count on is that
the WHERE clause is evaluated before the SELECT-result-rows and the
ORDER BY clause (in any SQL database).

You cannot depend on any left to right order, and you cannot depend on
ORDER BY being evaluated after the SELECT-result-rows.
-- 
Sincerely,
   Stephen R. van den Berg.

Clarions sounding *No one* expects the Spanish inquisition!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Joshua D. Drake

Greg Stark wrote:
I don't think worrying about the message we send to users is reasonable. 
We can take responsibilty for the messages we output but punishing our 
users to teach them a lesson is being actively user-hostile


There is no arguing that MB != Mb; nor is there anything user-hostile 
behind the idea of doing it the right way.


Sincerely,

Joshua D. Drake


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-09-03 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I could do the janitorial work again if we're interested.

I think it'd make more sense to do it incrementally rather than in one
big-bang patch ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Window functions doc patch

2008-09-03 Thread Erikj
Dear Hitoshi,

I noticed the folowing typo in the doc sgml:

  'rownumber()', instead of 'row_number()' ( 2x )

hth

Erik Rijkers



*** doc/src/sgml/func.sgml.orig 2008-09-03 17:20:28.130229027 +0200
--- doc/src/sgml/func.sgml  2008-09-03 17:21:01.331907454 +0200
***
*** 10092,10100 
   row
entry
 indexterm
! primaryrownumber()/primary
 /indexterm
!functionrownumber() OVER (ORDER BY replaceable
class=parameterexpression/replaceable)/function
/entry
entry
 typebigint/type
--- 10092,10100 
   row
entry
 indexterm
! primaryrow_number()/primary
 /indexterm
!functionrow_number() OVER (ORDER BY replaceable
class=parameterexpression/replaceable)/function
/entry
entry
 typebigint/type



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Hannu Krosing
On Wed, 2008-09-03 at 07:52 -0700, Joshua D. Drake wrote:
 Hannu Krosing wrote:
  On Tue, 2008-09-02 at 16:50 +0300, Peter Eisentraut wrote:
  Gregory Stark wrote:
  Peter Eisentraut [EMAIL PROTECTED] writes:
 
  Are you really afraid that someone would want to use mb to mean
  millibits ?
  
  As SQL is generally case insensitive, it is quite surprising to most
  people that GUC units are not.
 
 We have had this discussion before, I even submitted a patch to make 
 them case insensitive. In retrospect I was wrong to submit that patch. 
 SQL may be case insensitive but units are not. MB != Mb != mb , 

For most people they are equal, and all mean MEGABYTE(S) though
http://en.wikipedia.org/wiki/MB has lots of other possible meanings for
each.

 I don't think we should encourage in any way for users to do the wrong thing.

Can you see any scenario where accepting case insensitive units does
more damage than just ignoring the conf line with incorrect casing ?

Or do you mean we should discourage people from editing postgresql.conf
manually and have them use some tool which prevents them entering kb ?


Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Hannu Krosing
On Wed, 2008-09-03 at 08:20 -0700, Joshua D. Drake wrote:
 Greg Stark wrote:
  I don't think worrying about the message we send to users is reasonable. 
  We can take responsibilty for the messages we output but punishing our 
  users to teach them a lesson is being actively user-hostile
 
 There is no arguing that MB != Mb; 

The whole point of this discussion is, that mostly people expect 
MB == Mb = mb == mB, especially if they see weird constructs like kB
used (k for Kilo, or actually Kibi).

 nor is there anything user-hostile behind the idea of doing it the right way.

I was not trying to expose some sinister plan, just pointing out that
users seldom expect that kind of surprise.

--
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Window functions doc patch

2008-09-03 Thread Hitoshi Harada
2008/9/4 Erikj [EMAIL PROTECTED]:
 Dear Hitoshi,

 I noticed the folowing typo in the doc sgml:

  'rownumber()', instead of 'row_number()' ( 2x )

 hth

 Erik Rijkers



 *** doc/src/sgml/func.sgml.orig 2008-09-03 17:20:28.130229027 +0200
 --- doc/src/sgml/func.sgml  2008-09-03 17:21:01.331907454 +0200
 ***
 *** 10092,10100 
   row
entry
 indexterm
 ! primaryrownumber()/primary
 /indexterm
 !functionrownumber() OVER (ORDER BY replaceable
 class=parameterexpression/replaceable)/function
/entry
entry
 typebigint/type
 --- 10092,10100 
   row
entry
 indexterm
 ! primaryrow_number()/primary
 /indexterm
 !functionrow_number() OVER (ORDER BY replaceable
 class=parameterexpression/replaceable)/function
/entry
entry
 typebigint/type


Ah, thanks. It's my mistake. Both of SQL spec and my implementation in
pg_proc say it is row_number, not rownumber.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Joshua Drake
On Wed, 03 Sep 2008 18:32:16 +0300
Hannu Krosing [EMAIL PROTECTED] wrote:

  We have had this discussion before, I even submitted a patch to
  make them case insensitive. In retrospect I was wrong to submit
  that patch. SQL may be case insensitive but units are not. MB !=
  Mb != mb , 
 
 For most people they are equal, and all mean MEGABYTE(S) though
 http://en.wikipedia.org/wiki/MB has lots of other possible meanings
 for each.

O.k. there is an interesting point being made here, For most people.

Which people exactly? Are we talking about the general populous? If so
I would argue that most people don't have a clue what MB, Mb, or mb
is except to say I think that means some kind of speed or capacity.

The above is not our target.

If our definition of most people is, those who are reasonably
technically adept and will be deploying PostgreSQL in production on
some level.

If someone doesn't know the difference between Mb and MB on a
production system, I would not want them anywhere near any instance of
a production system.

If we are going to make sweeping statements (anyone on this thread)
about user-hostile and most people, then we better define what those
mean. This whole argument about making something easier (and incorrect)
for someone who doesn't exist and has not been defined.

I would be hung on this list if I made a similar argument about any
other feature.

 
  I don't think we should encourage in any way for users to do the
  wrong thing.
 
 Can you see any scenario where accepting case insensitive units does
 more damage than just ignoring the conf line with incorrect casing ?

Generally speaking, no I can't think of any damage that could be done
from mixed casing. Especially since we would only accept certain
possibilities, e.g; Mb would equal MB.

It just strikes me as really bad that a project that prides itself on
doing it right is willing to make this type of sacrifice. This isn't
about usability. This is about doing it wrong and actively encouraging
our users that wrong is ok. It could also misinform the user about
what the meaning of the value means.

 
 Or do you mean we should discourage people from editing
 postgresql.conf manually and have them use some tool which prevents
 them entering kb ?

Well that is a whole other argument :P. I would be happy to have that
one on another thread.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function call order dependency

2008-09-03 Thread pgsql
 [EMAIL PROTECTED] writes:
 For example I'll use the Oracle contains function, though this is not
 exactly what I'm doing, it just illustrates the issue clearly.

 select *, score(1) from mytable where contains(mytable.title, 'Winding
 Road', 1) order by score(1);

 The contains function does a match against mytable.title for the term
 'Winding Road' and both returns and saves an integer score which may be
 retrieved later using the score(...) function.

 This is just a bad, bad idea.  Side-effects in a WHERE-clause function
 are guaranteed to cause headaches.  When (not if) it breaks, you get
 to keep both pieces.

I was kind of afraid of that. So, how could one implement such a function
set?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function call order dependency

2008-09-03 Thread Robert Haas
 I was kind of afraid of that. So, how could one implement such a function
 set?

Write a function (say, score_contains) that returns NULL whenever
contains would return false, and the score otherwise.

SELECT * FROM (
SELECT *, score_contains(mytable.title, 'Winding Road', 1) AS
score FROM mytable
) x WHERE x.score IS NOT NULL
ORDER BY x.score

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq object hooks (libpq events)

2008-09-03 Thread Alvaro Herrera
Andrew Chernow escribió:
 Alvaro Herrera wrote:

 (I also removed PQresultAlloc.)

 Nooo ... removing PQresultAlloc breaks libpqtypes!  It also removes 
 some of the use cases provided by PQsetvalue, which allows one to add to 
 a result (in our case from scratch).

I don't really see the point -- it's the same as pqResultAlloc, except
that you have to pass an extra argument.  There's no actual
functionality loss.

  However, the only change of any significance that I introduced was that
  a name is attached to every event proc, so that it can be reported in
  error messages, as reporting only %p seems very useless.  (I also
  removed PQresultAlloc.)

 I don't mind re-introducing the name, but Tom seemed very against this 
 due to conflicts.  If 2 different libraries register the same name, 
 debugging would be painful.

Hmm, is that really a good argument?  I don't see how providing a
pointer is a better answer than a name -- it's far less user friendly.
Let's start assuming that no duplicate names would be used, and if there
are any conflicts in the real world, we can just ask the user to fire up
GDB to figure out where each name is pointing to.  My guess is that
conflicting names will be a rarity, if we ever get to see them.  (Would
two different libraries call themselves pqtypes, for example?)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function call order dependency

2008-09-03 Thread pgsql
 [EMAIL PROTECTED] writes:
 For example I'll use the Oracle contains function, though this is not
 exactly what I'm doing, it just illustrates the issue clearly.

 select *, score(1) from mytable where contains(mytable.title, 'Winding
 Road', 1) order by score(1);

 The contains function does a match against mytable.title for the term
 'Winding Road' and both returns and saves an integer score which may be
 retrieved later using the score(...) function.

 This is just a bad, bad idea.  Side-effects in a WHERE-clause function
 are guaranteed to cause headaches.  When (not if) it breaks, you get
 to keep both pieces.

Well, I guess I need to alter the question a bit.

I need to perform an operation during query time and there are multiple
results based on the outcome. For instance: (Lets try this)

select myrank(t1.column1, t2.column2, 1) as rank,
myscore(t1.column1,t2.column2, 1) as score from t1, t2 where
myrank(t1.column1,t2.column2)  10 order by myscore(t1.column1,
t2.column2, 1) desc;

This is a bit messier, and I wanted to resist this approach as it is ugly.
The underlying code will check the values of the first and second
parameters and only perform the operation if a previous call did not
already act on the current parameters.

Now, can I assume that in the above select statement, that each
permutation of t1.column1 and t2.column2 will only be evaluated once and
that myscore(...) and myrank(...) will all be called before the next
permutation is evaluated?

So, basically, I don't want to recalculate the values for each and every
function call as that would make the system VERY slow.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Andrew Sullivan
On Wed, Sep 03, 2008 at 06:37:29PM +0300, Hannu Krosing wrote:
 On Wed, 2008-09-03 at 08:20 -0700, Joshua D. Drake wrote:
  There is no arguing that MB != Mb; 
 
 The whole point of this discussion is, that mostly people expect 
 MB == Mb = mb == mB, especially if they see weird constructs like kB
 used (k for Kilo, or actually Kibi).

Note that in the networked computer world, MB and Mb are
importantly different.  The latter is relevant for the speed of your
network interface, for instance.  People often get this wrong when
speaking carelessly, but a mistake of this sort is a serious one,
given the orders of magnitude difference.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq object hooks (libpq events)

2008-09-03 Thread Andrew Chernow

Alvaro Herrera wrote:

Andrew Chernow escribió:

Alvaro Herrera wrote:



(I also removed PQresultAlloc.)
Nooo ... removing PQresultAlloc breaks libpqtypes!  It also removes 
some of the use cases provided by PQsetvalue, which allows one to add to 
a result (in our case from scratch).


I don't really see the point -- it's the same as pqResultAlloc, except
that you have to pass an extra argument.  There's no actual
functionality loss.



libpqtypes uses it.  libpqtypes doesn't have access to any internals of 
libpq, including pqResultAlloc.  So, I made a public wrapper to the 
internal version.  The point is to provide public access to the result 
allocator.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function call order dependency

2008-09-03 Thread Tom Lane
[EMAIL PROTECTED] writes:
 I need to perform an operation during query time and there are multiple
 results based on the outcome. For instance: (Lets try this)

 select myrank(t1.column1, t2.column2, 1) as rank,
 myscore(t1.column1,t2.column2, 1) as score from t1, t2 where
 myrank(t1.column1,t2.column2)  10 order by myscore(t1.column1,
 t2.column2, 1) desc;

Why not have one function that produces multiple output columns?

 Now, can I assume that in the above select statement, that each
 permutation of t1.column1 and t2.column2 will only be evaluated once and
 that myscore(...) and myrank(...) will all be called before the next
 permutation is evaluated?

You can assume that functions in the SELECT target list are evaluated
exactly once per output row (at least as long as no
SRFs-in-the-targetlist are involved).  I don't think it'd be wise to
assume anything about order of evaluation, though it's probably true
that it's left-to-right at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function call order dependency

2008-09-03 Thread pgsql
 I was kind of afraid of that. So, how could one implement such a
 function
 set?

 Write a function (say, score_contains) that returns NULL whenever
 contains would return false, and the score otherwise.

 SELECT * FROM (
 SELECT *, score_contains(mytable.title, 'Winding Road', 1) AS
 score FROM mytable
 ) x WHERE x.score IS NOT NULL
 ORDER BY x.score

That could work, and while it fits my example, my actual need is a bit
more complex. For instance, say I have two variables (I actually have a
few that I need)

select myvar1(1), myvar2(1), myvar3(1) from mytable where
myfunction(mytable.column, 'some text to search for', 1)  2;

How could I ensure that (1) myfunction is called prior to myvar1(),
myvar2(), and myvar3()? I think the answer is that I can't. So, the
obvious solution is to pass all the variables to all the functions and
have it first come first served.

The next issue is something like this:

select *, myvar1(t1.col1,t2.col2,1), myvar2(t1.col1.t2.col2,1) from t1,t2
where myfunction(t1.col1,t2.col2,1)  10 order by
myvar3(t1.col1,t2.col2,1) desc;

Using a first come first served strategy, is there any discontinuity
between the function calls for t1.col1 and t2.col2. Will they all be
called for a particular combination of t1.col1 and t2.col2, in some
unpredictable order before the next row(s) combination is evaluated or
will I have to execute the underlying algorithm for each and every call?



 ...Robert



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Hannu Krosing
On Wed, 2008-09-03 at 09:10 -0700, Joshua Drake wrote:

 If someone doesn't know the difference between Mb and MB on a
 production system, I would not want them anywhere near any instance of
 a production system.

I for one can make the difference, once I can zen that we are in a
domain, where Mbit makes sense. For me it does not make any sense to
define shared_buffer in Mbit's.

 If we are going to make sweeping statements (anyone on this thread)
 about user-hostile and most people, then we better define what those
 mean. 

Not user-hostile but rather hostile to an overworked DBA, who tries to
change some .conf param from 1MB to 512KB at 3AM to save an overloaded
server and then suddenly the server won't start anymore.

 This whole argument about making something easier (and incorrect)
 for someone who doesn't exist and has not been defined.

I still don't get in what non-academic way it would be incorrect to
spell kilobytes as kb or KB or megabytes as mb ?

Do you have any standard source where it specifies one use and forbids
the other ?

We are not correct according to SI (where k = 1000) nor IEC which
specifies Ki for 1024.

 I would be hung on this list if I made a similar argument about any
 other feature.

Its all about making it easier, not incorrect. 

It may be perhaps incorrect in some strictly academic sense, maybe
some article submission guidelines or such.

   I don't think we should encourage in any way for users to do the
   wrong thing.
  
  Can you see any scenario where accepting case insensitive units does
  more damage than just ignoring the conf line with incorrect casing ?
 
 Generally speaking, no I can't think of any damage that could be done
 from mixed casing. Especially since we would only accept certain
 possibilities, e.g; Mb would equal MB.
 
 It just strikes me as really bad that a project that prides itself on
 doing it right is willing to make this type of sacrifice. This isn't
 about usability. 

It is all about usability. Not accepting popular spellings of common
units seems to me as unnecessary nit-picking , similar to if we
required _exactly_ one space on either side on = in SET statements.

And we are already incorrect by allowing both '' and  around the
value, the latter would usually imply an identifier :P

hannu=# set effective_cache_size = '132MB';
SET
hannu=# set effective_cache_size = 132MB;
SET

Maybe we should change this too if we want to champion the right way?

 This is about doing it wrong and actively encouraging
 our users that wrong is ok. It could also misinform the user about
 what the meaning of the value means.

You could issue a warning, something like 

Warning: KB is not a spelling endorsed by PostgeSQL Global Development
Group, in future please use kB, a spelling that many of us feel to be
the only valid one and others don't think important enough to argue with
them.

---
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Commitfest has started, y'all

2008-09-03 Thread Josh Berkus

Folks,

If you hadn't noticed, the CommitFest started this week.  I'm currently 
assigning patches to reviewers, but if there's a patch you especially 
want to tackle, please put your name down right away.


If you can't actually review the patch *this week*, please don't put 
your name down.


Thanks!

--Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function call order dependency

2008-09-03 Thread pgsql
 [EMAIL PROTECTED] writes:
 I need to perform an operation during query time and there are multiple
 results based on the outcome. For instance: (Lets try this)

 select myrank(t1.column1, t2.column2, 1) as rank,
 myscore(t1.column1,t2.column2, 1) as score from t1, t2 where
 myrank(t1.column1,t2.column2)  10 order by myscore(t1.column1,
 t2.column2, 1) desc;

 Why not have one function that produces multiple output columns?

I was sort of trying to make this a fairly generic SQL extension who's
methodology could be moved to other databases if needed. I guess multiple
columns could work. I've got some code in another extension that does
that.


 Now, can I assume that in the above select statement, that each
 permutation of t1.column1 and t2.column2 will only be evaluated once and
 that myscore(...) and myrank(...) will all be called before the next
 permutation is evaluated?

 You can assume that functions in the SELECT target list are evaluated
 exactly once per output row (at least as long as no
 SRFs-in-the-targetlist are involved).  I don't think it'd be wise to
 assume anything about order of evaluation, though it's probably true
 that it's left-to-right at the moment.

But are all the items targeted in close proximity to each other BEFORE
moving on to the next row? What about the where clause? would that be
called out of order of the select target list? I'm doing a fairly large
amount of processing  and doing it once is important.
/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Alvaro Herrera
Hannu Krosing escribió:
 On Wed, 2008-09-03 at 09:10 -0700, Joshua Drake wrote:

  If we are going to make sweeping statements (anyone on this thread)
  about user-hostile and most people, then we better define what those
  mean. 
 
 Not user-hostile but rather hostile to an overworked DBA, who tries to
 change some .conf param from 1MB to 512KB at 3AM to save an overloaded
 server and then suddenly the server won't start anymore.

I think the energy wasted in this discussion would be better spent in
working a the check-the-config-file feature.  That would equally solve
this problem, as well as many others.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Andrew Sullivan
On Wed, Sep 03, 2008 at 01:48:18PM -0400, Alvaro Herrera wrote:

 I think the energy wasted in this discussion would be better spent in
 working a the check-the-config-file feature.  That would equally solve
 this problem, as well as many others.

This seems like a good idea to me.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]

2008-09-03 Thread Greg Smith

On Tue, 2 Sep 2008, Tom Lane wrote:


How about having two new columns reset value and boot value?


Like it better than default value ...


It's being a bit pedantic at the expense of the user, but I don't really 
care that much here.  I exposed the boot_val and described it in the 
documentation as:


Default value if the parameter is not explicitly set

That's the value that people care about--if they comment out a setting 
altogether and restart the server, what will it go back to.  New admins 
and people playing with the config files in a tuning content aren't often 
using sighup in my experience, they just restart the server after changes.


I'm not aware of any specific use case for exposing the reset value other 
than for completeness sake.  Having both exposed with names that don't 
mean anything to new admins is making the user experience more difficult 
than it needs to be.  That was why I just picked the more important one 
and named it default; that makes the case for the average user so easy 
they don't even need to look at the documentation.


I note the ongoing GUC units debate as a reminder that a technically 
correct UI is usually preferred in this project to an easier to use but 
slightly ambiguous one, and I'm not going to argue for default further 
if everyone else is happy with a cryptic naming instead.  The important 
thing is that the boot_val gets exposed somehow so tool writers can 
trivially present it as an option.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Testers: 8.4 snapshot RPMs are available

2008-09-03 Thread Devrim GÜNDÜZ
Hi,

If you are a Fedora-9 or RHEL/CentOS 5 user and want to test new
features of PostgreSQL 8.4 and help development team, you may use the
packages that I have just released, based on today's CVS snapshot. I am
planning to push new packages each weekend during commitfest.

Please note that these packages are not built using the official
tarball. I built it with the help of some PostgreSQL hackers. 

These are not stable packages. Use them at your own risk. They are *not*
production ready.

As usual, the RPMs are available at http://yum.pgsqlrpms.org . 

Please install repository RPMs from here first:

http://yum.pgsqlrpms.org/reporpms/repoview/letter_p.group.html

Remove existing RPMs after taking your backups, and then install 8.4
packages. Current set is labeled as 8.4devel_03092008-1PGDG . You may
want to install -debuginfo RPM, which may help PostgreSQL developers to
get some data in case you find a bug.

Source RPMs are also available:

http://yum.pgsqlrpms.org/srpms/8.4

(and they are installable using yum).

Please report any packaging related errors to me. If you find any
PostgreSQL 8.4 bugs, please post them to [EMAIL PROTECTED] or
fill this form:

http://www.postgresql.org/support/submitbug

Regards,
 
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Greg Stark
I don't think worrying about the message we send to users is  
reasonable. We can take responsibilty for the messages we output but  
punishing our users to teach them a lesson is being actively user- 
hostile


greg

On 3 Sep 2008, at 15:52, Joshua D. Drake [EMAIL PROTECTED] wrote:


Hannu Krosing wrote:

On Tue, 2008-09-02 at 16:50 +0300, Peter Eisentraut wrote:

Gregory Stark wrote:

Peter Eisentraut [EMAIL PROTECTED] writes:



Are you really afraid that someone would want to use mb to mean
millibits ?
As SQL is generally case insensitive, it is quite surprising to most
people that GUC units are not.


We have had this discussion before, I even submitted a patch to make  
them case insensitive. In retrospect I was wrong to submit that  
patch. SQL may be case insensitive but units are not. MB != Mb !=  
mb , I don't think we should encourage in any way for users to do  
the wrong thing.


Sincerely,

Joshua D. Drake





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Greg Stark
Sure if people want to do it the right way more power to them. What  
you're talking about is punishing people when they don't live up to  
your standards.


greg

On 3 Sep 2008, at 16:20, Joshua D. Drake [EMAIL PROTECTED] wrote:


Greg Stark wrote:
I don't think worrying about the message we send to users is  
reasonable. We can take responsibilty for the messages we output  
but punishing our users to teach them a lesson is being actively  
user-hostile


There is no arguing that MB != Mb; nor is there anything user- 
hostile behind the idea of doing it the right way.


Sincerely,

Joshua D. Drake



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Joshua Drake
On Wed, 3 Sep 2008 19:36:19 +0100
Greg Stark [EMAIL PROTECTED] wrote:

 Sure if people want to do it the right way more power to them. What  
 you're talking about is punishing people when they don't live up to  
 your standards.

I think I will defer to Andrew and Alvaro's opinion on the matter.

Sincerely,

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]

2008-09-03 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 On Tue, 2 Sep 2008, Tom Lane wrote:
 How about having two new columns reset value and boot value?
 
 Like it better than default value ...

 It's being a bit pedantic at the expense of the user, but I don't really 
 care that much here.  I exposed the boot_val and described it in the 
 documentation as:

 Default value if the parameter is not explicitly set

If that statement were the truth, the whole truth, and nothing but the
truth, and if it didn't ignore the point about explicitly set WHERE?,
I'd be fine with it.

 That was why I just picked the more important one 
 and named it default;

More important to whom?  You are adopting a very narrow mindset,
which seems to be that only DBAs look at this view.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 You do realize that misspelling unit name can cause downtime of several
 minutes instead of couple seconds?  We can easily do restart in couple of
 seconds but the restart, look logs, launch editor, find value, change,
 save, restart cycle will take quite a lot more.  Why should we increase
 the chance that any config edit causes problems?

Minutes? It's like any other changes made to postgresql.conf: make the change,
ctrl-z out of your editor, reload Postgres. If there's an error, foreground
to the editor, revert the change, and reload Postgres. Not ideal (the
config file checker is what we really want), but certainly quick.

 Secondly, humans don't have byte-exact memory, instead they generalize
 and deduce (eg. from nearby parameters).  Thus remembering KB, MB, GB
 or kb, mb, gb is easier than remembering kB, MB, GB.

They are all listed at the top of the file, in case there is any confusion.
No need to worry about remembering things.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200809031501
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAki+39gACgkQvJuQZxSWSsiTfACgiNPP77YKGKgyBm36ckkKZhGM
b9sAn2JmrpbMkJ8dm0Wbz3TYdLo83h/S
=PQvy
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump roles support

2008-09-03 Thread Tom Lane
=?ISO-8859-1?Q?Benedek_L=E1szl=F3?= [EMAIL PROTECTED] writes:
 pg_dumpall now just passes the --role option to pg_dump. What do you 
 think, is it enough
 or it should issue the SET ROLE TO ... command in its own session too?

I think it would have to, in the general case.  Consider the possibility
that someone has restricted access to the system catalogs, for instance.

You have missed an important component of Stephen's original proposal,
which was the point that something similar is needed on the restore
side.  This is a little bit tricky since the context at restore time
is not necessarily the same as the context at dump time.  When using
an archive file it's not a problem: the behavior can be driven off a
--role switch to pg_restore, and this is independent of what pg_dump
did.  In a dump to plain text, though, I'm not sure what to do.  The
simplest design would have pg_dump's --role switch control both
what it does in its own connection to the source database, and what it
puts into the output script.  I'm not sure that's adequate though.
Is it worth having two different switches for the two cases?  If we
think it's a corner case to need different role IDs, we could just
leave it like that and tell anyone who needs different behaviors that
they have to go through an archive file and pg_restore.  Stephen,
you were the one who wanted this in the first place, what's your
use-cases look like?

Some other review nitpicking:

The documentation part of the patch is well short of acceptable IMHO,
since it gives no hint of what this switch might be good for, and
indeed encourages the user to confuse it with the -U switch by injecting
a mention of it into the middle of a discussion about -U.

It is not normally considered appropriate for individual patches to
edit the release notes; and it's DEFINITELY not appropriate to put
a mention of a feature addition into the wrong section of the release
notes.

 + {role, required_argument, NULL, 'r' + 0x80},

This is not a good choice of option code IMHO ... what if the value is
stored in a signed char on some machines?  If you can't find a free
letter you like, use a small integer code, as you can find being done
elsewhere.

BTW, the patch fails to compile on a strict ANSI C compiler, because
you are using a C++-ism of declaring a variable mid-block.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] GUC source file and line number]

2008-09-03 Thread Greg Smith

On Wed, 3 Sep 2008, Tom Lane wrote:


Default value if the parameter is not explicitly set

If that statement were the truth, the whole truth, and nothing but the
truth, and if it didn't ignore the point about explicitly set WHERE?,
I'd be fine with it.


First question--how about if I changed that description to read:

Default value used at server startup if the parameter is not explicitly 
set?


I could then expose reset-val, named like that and with a description that 
explained the context it applies in.  And then we've give people a way to 
experiment and understand the FAQ of why didn't the value go back to the 
default when I commented it out of the postgresql.conf and HUP'd the 
server?.


Section question:  with those changes, would it then be reasonable to you 
to keep that column named default instead of giving it a less common 
name?


You are adopting a very narrow mindset, which seems to be that only DBAs 
look at this view.


DBAs are the only group I am always getting questions in this area from. 
Everybody else seemed happy with the status quo, where the value wasn't 
exposed at all and you just looked in guc.c to see what it was.


About once a month, somebody asks me how can I tell what the default is 
for *X*?  I want to be able to answer this question with look in 
pg_settings, which is easy enough to remember, and not have to say 
anything else.  That's the source of my mindset here, and I'm sure I'm not 
alone in fielding that so often.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] GUC source file and line number]

2008-09-03 Thread Joshua Drake
On Wed, 3 Sep 2008 16:04:12 -0400 (EDT)
Greg Smith [EMAIL PROTECTED] wrote:


 Section question:  with those changes, would it then be reasonable to
 you to keep that column named default instead of giving it a less
 common name?
 
  You are adopting a very narrow mindset, which seems to be that only
  DBAs look at this view.
 
 DBAs are the only group I am always getting questions in this area
 from. Everybody else seemed happy with the status quo, where the
 value wasn't exposed at all and you just looked in guc.c to see what
 it was.

I guess I would ask, Who else would we be targeting this for?. DBAs
seem to be the only logical choice.

Joshua D. Drake
-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Hannu Krosing
On Wed, 2008-09-03 at 13:48 -0400, Alvaro Herrera wrote:
 Hannu Krosing escribió:
  On Wed, 2008-09-03 at 09:10 -0700, Joshua Drake wrote:
 
   If we are going to make sweeping statements (anyone on this thread)
   about user-hostile and most people, then we better define what those
   mean. 
  
  Not user-hostile but rather hostile to an overworked DBA, who tries to
  change some .conf param from 1MB to 512KB at 3AM to save an overloaded
  server and then suddenly the server won't start anymore.
 
 I think the energy wasted in this discussion would be better spent in
 working a the check-the-config-file feature.

What kind of checks do you have in mind. Would this be something that
works at restart, does the check and continues with old settings if new
file would not load ?

 That would equally solve
 this problem, as well as many others.

AFAIK the config file is checked now, and if the check fails, the
database won't start.

-
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Joshua Drake
On Wed, 03 Sep 2008 23:10:24 +0300
Hannu Krosing [EMAIL PROTECTED] wrote:

  That would equally solve
  this problem, as well as many others.
 
 AFAIK the config file is checked now, and if the check fails, the
 database won't start.

Like apachectl configcheck ... E.g; we have the ability to check if the
config file is valid before we restart and have an extended outage.

Joshua D. Drake

 
 -
 Hannu
 
 


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Alvaro Herrera
Hannu Krosing escribió:
 On Wed, 2008-09-03 at 13:48 -0400, Alvaro Herrera wrote:

  I think the energy wasted in this discussion would be better spent in
  working a the check-the-config-file feature.
 
 What kind of checks do you have in mind. Would this be something that
 works at restart, does the check and continues with old settings if new
 file would not load ?

This was discussed in some other thread.  No, the idea is that before
you do a reload, you run some function or invoke postmaster with certain
arguments, and it checks the config file and it says this is OK or
there are errors here and here.  The check can be run even if the
server is already running.  That way you can just run it just before a
reload or restart.


  That would equally solve
  this problem, as well as many others.
 
 AFAIK the config file is checked now, and if the check fails, the
 database won't start.

... which is not ideal.  Obviously it doesn't make much sense to run the
check if the server is already down, because you'll immediately know
that it won't come up just by trying to start it up.

(However, maybe it would be better for the init script to run the check
anyway, and report the error to stderr where the user running the script
can read it directly instead of having to go check the postmaster log
which could be sitting somewhere else.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IN vs EXISTS equivalence

2008-09-03 Thread Kevin Grittner
 Tom Lane [EMAIL PROTECTED] wrote: 
 If you're still interested in testing CVS HEAD's handling of EXISTS,
 I've about finished what I wanted to do with it.
 
It's been hectic here, but I've managed to let some stuff run in the
background using an old test case from here:
 
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01408.php
 
explain analyze
SELECT A.adjustmentNo, A.tranNo, A.countyNo, H.date,
H.userId, H.time
  FROM Adjustment A
  JOIN TranHeader H ON (H.tranId = A.adjustmentNo AND
H.countyNo = A.countyNo AND H.tranNo = A.tranNo)
  WHERE H.tranType = 'A'
AND A.date  DATE '2006-01-01'
AND H.countyNo = 66
AND A.countyNo = 66
AND EXISTS
(
  SELECT 1 FROM TranDetail D
WHERE D.tranNo = H.tranNo
  AND D.countyNo = H.countyNo
  AND D.caseNo LIKE '2006TR%'
)
;
 
On development machine using 8.3.3:
 
 Nested Loop  (cost=0.00..399190.49 rows=1 width=37) (actual
time=7184.068..3249391.592 rows=12372 loops=1)
   Join Filter: ((A.adjustmentNo)::text = (H.tranId)::text)
   -  Seq Scan on Adjustment A  (cost=0.00..5218.87 rows=247869
width=17) (actual time=9.804..1695.691 rows=248674 loops=1)
 Filter: (((date)::date  '2006-01-01'::date) AND
((countyNo)::smallint = 66))
   -  Index Scan using TranHeader_pkey on TranHeader H 
(cost=0.00..1.57 rows=1 width=37) (actual time=13.056..13.056 rows=0
loops=248674)
 Index Cond: (((H.tranNo)::integer =
(A.tranNo)::integer) AND ((H.countyNo)::smallint = 66))
 Filter: (((H.tranType)::text = 'A'::text) AND (subplan))
 SubPlan
   -  Index Scan using TranDetail_TranDetCaseNo on
TranDetail D  (cost=0.00..1.29 rows=1 width=0) (actual
time=13.017..13.017 rows=0 loops=248674)
 Index Cond: (((caseNo)::text = '2006TR'::text) AND
((caseNo)::text  '2006TS'::text) AND ((tranNo)::integer =
($0)::integer) AND ((countyNo)::smallint = ($1)::smallint))
 Filter: ((caseNo)::text ~~ '2006TR%'::text)
 Total runtime: 3249404.662 ms
 
On the same machine, using the snapshot from this morning:
 
 Nested Loop  (cost=1963.24..38483.54 rows=1 width=37) (actual
time=372.964..986.994 rows=12372 loops=1)
   Join Filter: ((H.tranNo)::integer = (A.tranNo)::integer)
   -  Merge Semi Join  (cost=1963.24..31012.28 rows=21317 width=37)
(actual time=372.926..839.298 rows=12372 loops=1)
 Merge Cond: ((H.tranNo)::integer =
(D.tranNo)::integer)
 Join Filter: ((D.countyNo)::smallint =
(H.countyNo)::smallint)
 -  Index Scan using TranHeader_pkey on TranHeader H 
(cost=0.00..27848.57 rows=322517 width=37) (actual time=3.722..526.124
rows=311963 loops=1
)
   Index Cond: ((countyNo)::smallint = 66)
   Filter: ((tranType)::text = 'A'::text)
 -  Sort  (cost=1963.17..2027.08 rows=25565 width=6) (actual
time=171.512..191.688 rows=76597 loops=1)
   Sort Key: D.tranNo
   Sort Method:  quicksort  Memory: 6663kB
   -  Index Scan using TranDetail_TranDetCaseNo on
TranDetail D  (cost=0.00..91.57 rows=25565 width=6) (actual
time=0.031..100.688 rows=7659
7 loops=1)
 Index Cond: (((caseNo)::text = '2006TR'::text)
AND ((caseNo)::text  '2006TS'::text) AND ((countyNo)::smallint =
66))
 Filter: ((caseNo)::text ~~ '2006TR%'::text)
   -  Index Scan using Adjustment_pkey on Adjustment A 
(cost=0.00..0.34 rows=1 width=17) (actual time=0.009..0.010 rows=1
loops=12372)
 Index Cond: (((A.adjustmentNo)::text =
(H.tranId)::text) AND ((A.countyNo)::smallint = 66))
 Filter: ((A.date)::date  '2006-01-01'::date)
 Total runtime: 991.097 ms
 
The chosen plan looks very reasonable, and performs very well.  Nice!
 
After converting the database I originally forgot to run VACUUM
ANALYZE.  Even planning blind and doing hint-bit rewrites it picked
a plan which ran in under 10 seconds.
 
I'll be running other tests as I get the chance.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Hannu Krosing
On Wed, 2008-09-03 at 11:45 -0700, Joshua Drake wrote:
 On Wed, 3 Sep 2008 19:36:19 +0100
 Greg Stark [EMAIL PROTECTED] wrote:
 
  Sure if people want to do it the right way more power to them. What  
  you're talking about is punishing people when they don't live up to  
  your standards.
 
 I think I will defer to Andrew and Alvaro's opinion on the matter.

So Andrews opinion was that Mb (meaning Mbit) is different from MB (for
megabyte) and that if someone thinks that we define shared buffers in
megabits can get confused and order wrong kind of network card ?

I can understand Alvaros stance more readily - if we have irrational
constraints on what can go into conf file, and people wont listen to
reason, then build better tools for helping people to compli to these
irrational demands. It has the added benefit of helping to catch reall
conf file errors.

I did not realize earlier that KB vs kb vs kB vs Kb is a religious
issue .

I mean, there is no known written standard, which says that Mb is
megabit, not megabyte or that you can (or can't) write kilo as K, but
some people just believe that kB is the Way and allowing people to
write kilobytes as KB or kb is evil and should be punished.

To me this sounds stupid, but I understand that this is a thing that
can't be argued logically and I have better things to do than changing
peoples irrational beliefs.

Sorry.

---
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] In case module has wrong magic, report exact problem

2008-09-03 Thread Tom Lane
Marko Kreen [EMAIL PROTECTED] writes:
 In case module and server magic blocks do not match
 report exact parameters that differ.

Applied with revisions --- your patch produced a message that wasn't
localizable and didn't follow the style guidelines.  Also it assumed
that all the fields would be plain integer values, which isn't true
even today.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Joshua Drake
On Thu, 04 Sep 2008 01:26:44 +0300
Hannu Krosing [EMAIL PROTECTED] wrote:

 So Andrews opinion was that Mb (meaning Mbit) is different from MB
 (for megabyte) and that if someone thinks that we define shared
 buffers in megabits can get confused and order wrong kind of network
 card ?

I was actually referring to:

http://archives.postgresql.org/pgsql-hackers/2008-09/msg00206.php

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] GUC source file and line number]

2008-09-03 Thread Tom Lane
Joshua Drake [EMAIL PROTECTED] writes:
 I guess I would ask, Who else would we be targeting this for?. DBAs
 seem to be the only logical choice.

Regular users look at pg_settings too, you know.  Maybe *you* only
get questions from DBAs...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] GUC source file and line number]

2008-09-03 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 First question--how about if I changed that description to read:

 Default value used at server startup if the parameter is not explicitly 
 set?

... not otherwise set would probably be an accurate phrasing.
(I'm thinking of corner cases like stuff absorbed from environment
variables, which aren't really explicitly set by any normal usage
of that term.)

 I could then expose reset-val, named like that and with a description that 
 explained the context it applies in.  And then we've give people a way to 
 experiment and understand the FAQ of why didn't the value go back to the 
 default when I commented it out of the postgresql.conf and HUP'd the 
 server?.

You do know that's an ex-FAQ as of 8.3?  If we're designing this feature
to respond to that, we are wasting a lot of effort.

 About once a month, somebody asks me how can I tell what the default is 
 for *X*?

I wonder how certain you can be of which meaning of default they have
in mind.  I don't think it means the same thing to everybody that it
means to you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our CLUSTER implementation is pessimal

2008-09-03 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 There are a couple problems with this:

 a) We need some way to decide *when* to do a sort and when to do an index
 scan. The planner has all this machinery but we don't really have all the
 pieces handy to use it in a utility statement.

Why not?  You don't even need any quals when trying to cost a full-index
scan.

 b) tuplesort no longer has the pieces needed to sort whole tuples including
 visibility info. And actually even the old pieces that were removed had not
 quite the right interface and behaviour. We need to preserve t_self for the
 heap rewrite tools and we need to be able to use _bt_mkscankey_nodata() to
 generate the scan keys that match the index.

So you just broke it irredeemably for non-btree indexes, no?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Conflict resolution in Multimaster replication(Postgres-R)

2008-09-03 Thread M2Y
Hello,

My basic question is: in multimaster replication, if each site goes
ahead and does the modifications issued by the transaction and then
sends the writeset to others in the group, how the ACID properties be
maintained?

Details:
Suppose there are two sites in the group, lets say, A and B and are
managing a database D. Two transactions TA and TB started in sites A
and B respectively, at nearly same time, wanted to update same row of
a table in the database. As, no locking structures and other
concurrency handling structures are replicated each will go ahead and
do the modifications in their corresponding databases and sends the
writeset. Since, both writesets contain update to the same row, will
the two transactions be rolled back or anything other than this
happens?

A more general question is: for Transactional isolation level
4(serializable level), the information such as locking of rows be
transmitted across sites? If not, what is the mechanism to address
concurrency with serializibility.

Thanks,
Srinivas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Conflict resolution in Multimaster replication(Postgres-R)

2008-09-03 Thread Robert Hodges
Hi Srinivas,

Multi-master replication in Postgres-R is handled using a process called 
certification that ensures there are no serializability violations.  Look at 
the paper by Kemme and Alonzo entitled Don't be Lazy, Be Consistent... 
(http://www.cs.mcgill.ca/~kemme/papers/vldb00.html).  In the first case you 
describe one transaction must abort if applying them would break 
serializability.

In the second case you describe, you must transmit read sets as well as write 
sets.  The same sort of algorithm is applied as for writes.

Please email me directly if you want more information.

Thanks, Robert

On 9/3/08 4:02 PM, M2Y [EMAIL PROTECTED] wrote:

Hello,

My basic question is: in multimaster replication, if each site goes
ahead and does the modifications issued by the transaction and then
sends the writeset to others in the group, how the ACID properties be
maintained?

Details:
Suppose there are two sites in the group, lets say, A and B and are
managing a database D. Two transactions TA and TB started in sites A
and B respectively, at nearly same time, wanted to update same row of
a table in the database. As, no locking structures and other
concurrency handling structures are replicated each will go ahead and
do the modifications in their corresponding databases and sends the
writeset. Since, both writesets contain update to the same row, will
the two transactions be rolled back or anything other than this
happens?

A more general question is: for Transactional isolation level
4(serializable level), the information such as locking of rows be
transmitted across sites? If not, what is the mechanism to address
concurrency with serializibility.

Thanks,
Srinivas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Alvaro Herrera
Hannu Krosing escribió:

 I mean, there is no known written standard, which says that Mb is
 megabit, not megabyte or that you can (or can't) write kilo as K, but
 some people just believe that kB is the Way and allowing people to
 write kilobytes as KB or kb is evil and should be punished.

Yes there is --- it's the SI.

http://en.wikipedia.org/wiki/SI#SI_writing_style

I don't know about it being evil and punishment, but it's wrong.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extending grant insert on tables to sequences

2008-09-03 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 * Jaime Casanova ([EMAIL PROTECTED]) wrote:
 updating the patch with one that only extends inserts. though, i
 haven't look at the col level privs patch yet.

 At least initially I wasn't planning to support column-level privileges
 for sequences, so I don't think it will affect you much.  Do people
 think it makes sense to try and support that?

USAGE certainly wouldn't be column-level in any case --- it'd be a
privilege on the sequence as such.  That end of it isn't the problem;
the problem is that column-level privileges on the table make it hard to
decide when to grant rights on the sequence, as I pointed out last time
round:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00624.php

 As your patch appears more ready-for-commit than the column-level
 privileges patch, I wouldn't worry about what code might have to move
 around, that'll be for me to deal with in a re-sync with HEAD once your
 patch is committed.

I think that's backwards.  The above message raises serious concerns
about whether the USAGE-granting patch can be implemented at all in the
presence of column-level privileges.  I think the right thing is to get
column privileges in and then see if it's possible to implement
USAGE-granting compatibly.  I don't want to commit a patch that is
clearly going to be broken when (not if) column privileges arrive.

I note also that no response was given to my worries about pg_dump
behavior.

In short, this patch isn't much more ready to commit than it was
in the last fest.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump roles support

2008-09-03 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 =?ISO-8859-1?Q?Benedek_L=E1szl=F3?= [EMAIL PROTECTED] writes:
  pg_dumpall now just passes the --role option to pg_dump. What do you 
  think, is it enough
  or it should issue the SET ROLE TO ... command in its own session too?
 
 I think it would have to, in the general case.  Consider the possibility
 that someone has restricted access to the system catalogs, for instance.

I would agree with this.  pg_dumpall should do the 'set role' in its
session as well.

 You have missed an important component of Stephen's original proposal,
 which was the point that something similar is needed on the restore
 side.  This is a little bit tricky since the context at restore time
 is not necessarily the same as the context at dump time.  When using
 an archive file it's not a problem: the behavior can be driven off a
 --role switch to pg_restore, and this is independent of what pg_dump
 did.  In a dump to plain text, though, I'm not sure what to do.  The
 simplest design would have pg_dump's --role switch control both
 what it does in its own connection to the source database, and what it
 puts into the output script.  I'm not sure that's adequate though.

This makes sense to me and I feel it's adequate.  If necessary, people
can post-process their .sql files using sed or something similar.
That's made reasonably easy by having a 'set role' in the .sql file.  I
actively dislike the idea that pg_restore would modify the input stream
from a text file, even if it was passed a --role switch.

 Is it worth having two different switches for the two cases?  If we
 think it's a corner case to need different role IDs, we could just
 leave it like that and tell anyone who needs different behaviors that
 they have to go through an archive file and pg_restore.  Stephen,
 you were the one who wanted this in the first place, what's your
 use-cases look like?

My primary use cases are performing a pg_dump when logging in as one
user but needing the permissions of another role, followed by loading
the data into another system when logging in as one user and needing to
set role first to another.  In at least 90% of those cases, that role is
postgres, and in the other 10% most, if not all, are the same role on
both sides.  There are a few cases where we might change the restore-as
role away from the dumped-as role, but we're happy to use pg_restore to
handle that, or take care of changing the role in the .sql file (which
is what we tend to use, honestly) using sed or similar.

Alot of this is driven from the fact that we don't allow admins to
remotely connect directly as postgres (akin to disabling remote root
logins in sshd_config via PermitRootLogin, and for the same reasons).
They must authenticate and connect as their own user first and then use
'set role postgres;' to gain superuser rights.  Not being able to have
pg_dump do that set role has been quite frustrating as we use it
extensively for transferring data between systems.

 Some other review nitpicking:

I agree with the other comments.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Extending grant insert on tables to sequences

2008-09-03 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  * Jaime Casanova ([EMAIL PROTECTED]) wrote:
  updating the patch with one that only extends inserts. though, i
  haven't look at the col level privs patch yet.
 
  At least initially I wasn't planning to support column-level privileges
  for sequences, so I don't think it will affect you much.  Do people
  think it makes sense to try and support that?
 
 USAGE certainly wouldn't be column-level in any case --- it'd be a
 privilege on the sequence as such.  That end of it isn't the problem;
 the problem is that column-level privileges on the table make it hard to
 decide when to grant rights on the sequence, as I pointed out last time
 round:
 http://archives.postgresql.org/pgsql-hackers/2008-07/msg00624.php

Ah, obviously I hadn't read far enough back about this patch.  I agree
that sequence USAGE should be granted when insert is granted on any
column.  One suggestion is that as the SQL spec indicates that a
table-level revoke implies a revoke on all columns, we could have the
revokation of the sequence permissisons done only on table-level
revokation of insert and not on any individual column-level insert, even
if that was the last column which insert rights were granted on.

I have to admit that I'm not a big fan of that though because a given
state on the table wouldn't imply a particular state for the sequence-
it would depend on how you got there.  The way the code is currently
laid out for the column-level privileges, it wouldn't be that difficult
to go through all of the other columns and check if this was the last
insert being revoked, but I don't particularly like that either, and
it strikes me as 99% of the time being wasted effort.  I guess if we
could check for and only go through that effort when there is a sequence
in place with implicit grants it might not be too bad.

  As your patch appears more ready-for-commit than the column-level
  privileges patch, I wouldn't worry about what code might have to move
  around, that'll be for me to deal with in a re-sync with HEAD once your
  patch is committed.
 
 I think that's backwards.  The above message raises serious concerns
 about whether the USAGE-granting patch can be implemented at all in the
 presence of column-level privileges.  I think the right thing is to get
 column privileges in and then see if it's possible to implement
 USAGE-granting compatibly.  I don't want to commit a patch that is
 clearly going to be broken when (not if) column privileges arrive.

Now that I understand the situation better, I agree with you on this.  I
hadn't realized this patch was about implicit grants on sequnces.  Sorry
for the noise.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [patch] GUC source file and line number]

2008-09-03 Thread Greg Smith
Before I respond to Tom's comments, let me step back a second and add the 
intro the deadline didn't leave me time for.  There are two specific 
things the bit I added to this GUC patch is aimed at:


1) Somebody has a postgresql.conf from a random source (saw it on the 
Internet and pasted dubious stuff in/previous person working on the 
server/etc.) and wants to know the default value they'd get if they just 
commented a specific line or lines out.


2) A GUC tuning tool author wants to provide a UI for modifying a GUC 
parameter that shows the default as input to the person deciding what to 
set a parameter to.  The interface I've always wanted to make available 
would be...wait a minute, I can provide a working example now.  Picture 
this:


name | Recommended | Current |  Min  | Default |   Max
-+-+-+---+-+-
 wal_buffers | 1024kB  | 64kB| 32 kB | 64 kB   | 2048 MB

With your cursor lighting up either the Recommended or Current field, 
depending on whether you're a default approve or deny kind of tool 
designer.  Pretty simple interface to decide what to do, right?  I find 
that much less useful without the default value being included, but right 
now someone who is writing a tuning tool has to maintain their own 
database with that information if they want to do that.  I will actually 
do that for earlier versions the minute I know what the 8.4 solution that 
makes the problem go away looks like.


The above is the output from:

select name,
  '1024kB' as Recommended,
  current_setting(name) as Current,
  case when unit='8kB' then pg_size_pretty(min_val::int8*8192) else
min_val end as Min,
  case when unit='8kB' then pg_size_pretty(default_val::int8*8192) else
default_val end as Default,
  case when unit='8kB' then pg_size_pretty(max_val::int8) else max_val end
as Max
  from pg_settings where name='wal_buffers';

on my system with the patch installed.

That's what I wanted but was unable to get until now.  Combine that with 
being able to figure out what source file and line the setting was 
actually taken from, and the top 3 obstacles to writing a simple and easy 
to use read/modify/write tuning tool are all cleared.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] GUC source file and line number]

2008-09-03 Thread Greg Smith

On Wed, 3 Sep 2008, Tom Lane wrote:


Greg Smith [EMAIL PROTECTED] writes:

First question--how about if I changed that description to read:



Default value used at server startup if the parameter is not explicitly
set?


... not otherwise set would probably be an accurate phrasing.
(I'm thinking of corner cases like stuff absorbed from environment
variables, which aren't really explicitly set by any normal usage
of that term.)


My opinion is that setting something in an environment variable certainly 
is explicitly setting it, but it doesn't matter; if the parameter is not 
otherwise set works just as well as far as I'm concerned.



I could then expose reset-val, named like that and with a description that
explained the context it applies in.  And then we've give people a way to
experiment and understand the FAQ of why didn't the value go back to the
default...


You do know that's an ex-FAQ as of 8.3?  If we're designing this feature
to respond to that, we are wasting a lot of effort.


Sure, but there are a lot of pre-8.3 installs out there.  I don't really 
care about the reset-val at all, so I'm not going to justify whether or 
not it should be included.



I wonder how certain you can be of which meaning of default they have
in mind.  I don't think it means the same thing to everybody that it
means to you.


When most people say the default talking about a value in a 
configuration file, they mean the value the software will assume if that 
setting isn't there at all.  In the postgresql.conf context, that means 
what they'll get if they start the server with that line missing or 
commented out (and no environment variables, etc.) which is why I mapped 
that to the boot_val.  While I'm aware there are other uses of default 
that apply in this context, I think they are extremely rare compared to 
the common usage.  The subtle distictions that require both a boot_val and 
a reset_val internally are only important to people who are also capable 
of understanding that default is a mass-consumption oriented label 
that's a touch fuzzy IMHO.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-03 Thread Simon Riggs

On Fri, 2008-08-29 at 10:47 +0300, Heikki Linnakangas wrote:
 Here's an updated FSM patch. 

Can I check some aspects of this related to Hot Standby? Some of them
sound obvious, but worth double checking.

* There will be no need to read FSM by any normal operation of a
read-only transaction, so locking correctness considerations can
possibly be ignored during recovery. pg_freespacemap exists though:
would we need to prevent that from executing during recovery, or will
the FSM be fully readable? i.e. does redo take appropriate locks already
(I don't see any Cleanup locks being required).

* FSM will be continuously maintained during recovery, so FSM will now
be correct and immediately available when recovery completes?

* There are no cases where a screwed-up FSM will crash either recovery
(FATAL+) or halt normal operation (PANIC)?

* incomplete action cleanup is fairly cheap and doesn't rely on the FSM
being searchable to correct the error? This last is a hard one...

Do we have the concept of a invalid/corrupt FSM? What happens if the
logic goes wrong and we have a corrupt page? Will that mean we can't
complete actions against the heap? 

Are there really any changes to these files?
src/include/storage/bufmgr.h
src/include/postmaster/bgwriter.h

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our CLUSTER implementation is pessimal

2008-09-03 Thread Simon Riggs

On Mon, 2008-09-01 at 00:25 +0100, Gregory Stark wrote:

 One thing that's been annoying me for a while is that our CLUSTER
 implementation is really very slow. When I say very slow I mean it's really
 very very very slow.

Does this implementation work towards being able to do
  CREATE INDEX ... CLUSTER TABLE
So that we can do both actions with just one sort of the data?

I think there needs to be an option to force this to do either sorts or
indexscans. On a large table you may not have the space to perform a
full table sort, plus on a multi-column index we may not accurately
predict the cost of an indexscan.

(What is the change to elog.c about?)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Robert Treat
On Wednesday 03 September 2008 16:12:29 Joshua Drake wrote:
 On Wed, 03 Sep 2008 23:10:24 +0300

 Hannu Krosing [EMAIL PROTECTED] wrote:
   That would equally solve
   this problem, as well as many others.
 
  AFAIK the config file is checked now, and if the check fails, the
  database won't start.

 Like apachectl configcheck ... E.g; we have the ability to check if the
 config file is valid before we restart and have an extended outage.


To paraphrase, if you can't write a config file correctly before restarting, 
I do not want you anywhere near any instance of a production system 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Cleanup of GUC units code

2008-09-03 Thread Hannu Krosing
On Wed, 2008-09-03 at 20:01 -0400, Alvaro Herrera wrote:
 Hannu Krosing escribió:
 
  I mean, there is no known written standard, which says that Mb is
  megabit, not megabyte or that you can (or can't) write kilo as K, but
  some people just believe that kB is the Way and allowing people to
  write kilobytes as KB or kb is evil and should be punished.
 
 Yes there is --- it's the SI.
 
 http://en.wikipedia.org/wiki/SI#SI_writing_style
 
 I don't know about it being evil and punishment, but it's wrong.

SI defines decimal-based prefixes, where k = kilo = 1000, so our current
conf use is also wrong.

http://en.wikipedia.org/wiki/Kilobyte
...
1024 bytes (2^10): This unit is used when expressing quantities which
are based on powers of two, such as memory chip capacities. Most
software also expresses storage capacity in units of 1024 bytes.
Although the use of kilobyte for this unit is common, this usage has
been expressly forbidden by the SI standard and other standards
organisations. To indicate a quantity of 1024 bytes, the term kibibyte
(KiB) has been recommended instead.


And no, I am not proposing switching conf units to KiB, nor am I
proposing to use MB to mean 1 000 000. I'm quite happy with us being
wrong in 1000 v. 1024 area.

---
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers