Re: [HACKERS] Oracle Style packages on postgres

2005-05-13 Thread Dave Held
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, May 11, 2005 2:22 PM
 To: Dave Held
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Oracle Style packages on postgres
 
 
 Dave Held [EMAIL PROTECTED] writes:
  /*
   * We check the catalog name and then ignore it.
   */
  if (!isValidNamespace(name[0]))
  {
  if (strcmp(name[0], 
 get_database_name(MyDatabaseId)) != 0)
  ereport(ERROR,
 
 Which more or less proves my point: the syntax is fundamentally
 ambiguous. 

Not at all.  Ambiguity means that there are two equally valid
parses.  Under the semantics I proposed, schema names take 
precedence.  That is, given:

db: foo
schema: bar
schema: foo.bar

The expression foo.bar.rel.col refers to schema foo.bar, and not
to db foo, schema bar.  If by fundamentally ambiguous, you mean
there is no a priori reason to choose one set of semantics over
another, I would tend to disagree, but the syntax as I proposed
it is not ambiguous.  We use precedence to eliminate otherwise
valid parses all the time.

 I suppose people would learn not to use schema names that
 match the database they are in, but that doesn't make it a 
 good idea to have sensible behavior depend on non-overlap of
 those names.

There's nothing wrong with using a schema name that matches the
db.  The only confusion comes when you put nested elements at
both the db level and schema level having the same names.  Since
I presume most people don't specify db names in their queries,
having schemas take precedence makes the most sense to me.

 [ thinks for awhile ... ]
 
 OTOH, what if we pretended that two-level-nested schemas ARE
 catalogs in the sense that the SQL spec expects?  Then we could
 get rid of the pro-forma special case here, which isn't ever
 likely to do anything more useful than throw an error anyway.
 Thus, we'd go back to the pre-7.3 notion that the current
 Postgres DB's name isn't part of the SQL naming scheme at all,
 and instead handle the spec's syntax requirements by setting up
 some conventions that make a schema act like what the spec says
 is a catalog.
 [...]

I think this would be worse than not having nested schemas at all.
It looks, feels, and smells like a hack.  I think there should be 
a reasonable depth to schema nesting, but I think it should be 
much larger than 2.  I think 8 is much more reasonable.  One can
argue that nested schemas are nothing more than syntactic sugar,
and this is most definitely true.  But as programming language
design teaches us, syntactic sugar is everything.  The better our
tools can model our problem spaces, the better they can help us
solve our problems.

A way in which nested schemas are more than syntactic sugar is in
the fact that they can provide a convenient means of additinoal
security management.  Rather than twiddling with the privileges on
groups of objects within a schema, objects that should have similar
privileges can be put in the same subschema.

However, returning to the original topic of the thread, nested
schemas are not nearly as interesting to me as the encapsulation
provided by a package-like feature.  To be honest, though, what
tantalizes me is not the prospect of a package feature but an
expansion of the Type system.

As a reasonably popular production system, Postgres must necessarily
be conservative.  But its roots lay in experimentation, and vestiges
of those roots can still be seen in its structure.  Because of its
maturity, Postgres is well positioned to implement some rather
advanced concepts, but perhaps the most radical of them should be
implemented in a fork rather than the main system.

Traditionally, a database is seen as a warehouse of raw data.
ODBMSes position themselves as the next generation by viewing a
database as a collection of persistent, richly structured objects.
Both views have strengths and weaknesses.  Postgres takes an
interesting middle ground position within the ORDBMS space.  It
is heavily relational with strong support for standard SQL and
numerous query tuning options.  But it also features an interesting
number of rather non-relational concepts, like custom operator
definitions, operator classes, user-defined conversions and types.
However, it seems to me that these features are probably very
underutilized.

This is probably due to two reasons: 1) most programmers aren't used
to being able to define custom operators in their favorite programming
language, so the concept isn't familiar enough to them to try it in
their DBMS.  2) The other features which support this aren't designed
or presented in a cohesive manner that impresses the programmer that
this is a compelling and superior way to go about things.

The fact is, operator overloading is a *very* powerful way to
program.  In particular, it is one of the key factors in supporting
generic programming in a natural way.  People who are unsure

Re: [HACKERS] Oracle Style packages on postgres

2005-05-11 Thread Dave Held
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 10, 2005 11:42 PM
 To: Bruce Momjian
 Cc: Dave Held; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Oracle Style packages on postgres
 
 [...]
 There's been a lot of handwaving about nested schemas in this thread,
 but no one has explained how they could actually *work* given the SQL
 syntax rules.  In general, a is a column from the current table
 set, a.b is a column b in table/alias a from the current query,
 a.b.c is a column c from table b in schema a, a.b.c.d is a column
 d from table c in schema b in catalog a, and any more than that is
 a syntax error.  I do not see how to add nested schemas 
 without creating unworkable ambiguities, not to say outright violations
 of the spec.

Clearly nested schemas would violate the SQL spec, as do the numerous
missing features in Postgres.  Obviously, they would have to be a sort
of non-conforming extension.  It's an opportunity for Postgres to take
the lead and influence the next standard, I guess.  Unless the community
decides that it's not worth the hassle, which seems much more likely.  I
am curious to know what the unworkable ambiguities are.  I propose that
if there is any ambiguity at all, just fail the parse and leave it to
the user to write something sensible.  Otherwise, it's just a matter of
defining a precise precedence for resolving name scopes, which doesn't
seem very tricky at all.

That is, if a.b is the name of a schema b nested within a schema a, then
a.b.c.d refers to a column d of table c in schema b in schema a.  If a is
not the name of a schema, then check to see if it's the name of a database.
If it is, then a.b.c.d has the meaning you define above.  If it's not,
then it's an error.  The rule is simple: when the identifier has more than
two parts, search for the first part among the schemas first, and then
the catalogs.  For the parts after the first and before the last two,
just search the appropriate schemas.  As far as I can tell, this syntax 
is completely backwards-compatible with existing SQL syntax.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Oracle Style packages on postgres

2005-05-11 Thread Dave Held
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, May 11, 2005 10:55 AM
 To: Dave Held
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Oracle Style packages on postgres
 
 
 Dave Held [EMAIL PROTECTED] writes:
  The rule is simple: when the identifier has
  more than two parts, search for the first part among the schemas 
^^^
  first, and then the catalogs.
 
 This doesn't actually work, because there is already ambiguity as to
 which level the first name is.  See for instance the comments in
 transformColumnRef().

I don't follow.  switch (numnames) case 3 is unambiguous under either
syntax.  case 1 and 2 are unchanged under my proposed rules.  It's
really only case 4+ that is affected.  And the change is as follows:

if (numnames  MAX_SCHEMA_DEPTH + 3)
{
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg(improper qualified name (too many dotted names): %s,
   NameListToString(cref-fields;
return NULL;
}
switch (numnames)
{
case 1: ...
case 2: ...
case 3: ...
default:
{
char* name[MAX_SCHEMA_DEPTH + 3];
char** i;
char** end = name + numnames;
char* colname = name + numnames - 1;
for (i = name; i != end; ++i)
{
/* definition of lnth() should be easy enough to infer */
*i = strVal(lnth(cref-fields));
}

/*
 * We check the catalog name and then ignore it.
 */
if (!isValidNamespace(name[0]))
{
if (strcmp(name[0], get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg(cross-database references are not 
implemented: %s,
NameListToString(cref-fields;
i = name + 1;
numnames -= 3;
}
else
{
i = name;
numnames -= 2;}
/*
 * isValidNamespace() should work like LookupExplicitNamespace()
 * except that it should return false on failure instead of
 * raising an error
 */

/* Whole-row reference? */
if (strcmp(end[-1], *) == 0)
{
node = transformWholeRowRef(pstate, i, numnames, end[-2]);
break;
}
/*
 * Here I've changed the signature of transformWholeRowRef() to
 * accept a char** and an int for the schema names
 */

/* Try to identify as a twice-qualified column */
node = qualifiedNameToVar(pstate, i, numnames, end[-1], true);
/*
 * And obviously we have to hack qualifiedNameToVar() similarly
 */
if (node == NULL)
{
/* Try it as a function call */
node = transformWholeRowRef(pstate, i, numnames, end[-2]);
node = ParseFuncOrColumn(pstate,
   list_make1(makeString(end[-1])),
 list_make1(node),
 false, false, true);
}
break;
}
}

What am I missing?

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Oracle Style packages on postgres

2005-05-10 Thread Dave Held
 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 10, 2005 8:43 AM
 To: Thomas Hallgren
 Cc: Tom Lane; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Oracle Style packages on postgres
 
 [...]
 I suppose.  I think we should focus on the use cases for Oracle
 packages, rather than the specific functionality it provides. 
 What things do people need PostgreSQL to do that it already
 doesn't do?

Is that really the best way to go about things?  Already RDBMSes
are patchwork quilts of functionality.  Is merely adding another
patch the most elegant way to evolve the database?  The problem is
that Oracle et al are trying to be ORDBMSes and aren't exactly sure
what the best way to go is.  Instead of trying to formulate a 
rational plan for what an ORDBMS should even look like, they simply
look at what would work with their existing infrastructure and tack
on features.  Then Postgres plays the copycat game.  Instead of
trying to play catch-up with Oracle, why not beat them at their own
game?

What packages provide is encapsulation.  Hiding the data from the
user and forcing him/her to use the public interface (methods).
That is an important and admirable OO feature.  Some people think
that using the DB's security model can achieve the same thing.  It
can't, exactly, but there's an important lesson to be learned from
the suggestion.  The problem is that OOP is a *programming* paradigm,
and a database is not a *programming language*.  In a programming
language, there really is no such thing as security.  There is 
only visibility and accessibility.  Private methods in an OOP
language do not provide *security*; they only limit *accessibility*.
Like so many other differences between the relational model and the
OOP model, there is an impedance mismatch here.  However, there is
also opportunity.

In an OOPL, you can say: Users can call this method from here, but
not from there.  What you *can't* say is: User X can call this
method, but User Y cannot.  As you can see, these are orthogonal
concepts.  You could call the first accessibility by location and
the second accessibility by authentication.  An ORDBMS should
support both.  Private does not respect your identity, only your
calling location.  An ACL does not respect your calling scope, only
your identity.  A system that has both is clearly more flexible than
one that only has one or the other.

Now what you need to keep in mind is that each visibility model 
serves a different purpose.  The purpose of a security model is to 
limit *who* can see/touch certain data because the data has intrinsic 
value.  The purpose of an accessibility model is to limit *where* and 
*how* data can be seen/touched in order to preserve *program 
invariants*.  So if you have an object (or tuple!) that records the 
start and stop time of some process, it is probably a logical 
invariant that the stop time is greater than or equal to the start 
time.  For this reason, in a PL, you would encapsulate these fields 
(attributes) and only provide controlled access to update them that 
checks and preserves the invariant, *no matter who you are*.  You 
don't want a superuser violating this invariant any more than Sue 
User.

Now you might object that constraints allow you to preserve 
invariants as well, and indeed they do.  But constraints do not
respect calling scope.  Suppose there is a process that needs to
update the timestamps in a way that temporarily breaks the invariant
but restores it afterwards.  The only way to effect this in a
constraint environment is to drop the constraint, perform the
operation, and restore it.  However, dropping a constraint is not an
ideal solution because there may be other unprivileged processes 
operating on the relation that still need the constraint to be 
enforced.  There is no way to say: There is a priviledged class of 
methods that is allowed to violate this constraint because they are 
trusted to restore it upon completion.  Note that this is different
from saying There is a priviledged class of users that is allowed
to violate this constraint.  If you try to do something like give
read-only access to everybody and only write access to one user and
define that user to be the owner of the methods that update the data,
you have to follow the convention that that user only operates 
through the defined interface, and doesn't hack the data directly.
That's because user-level accessibility is not the same as scope-
level accessibility.  Whereas, if you define something like a
package, and say: Package X is allowed full and complete access
to relation Y, and stick the interface methods in X, you still have
all the user-level security you want while preserving the invariants
in the most elegant way.

So you can think of a package as a scope in a programming language.
It's like a user, but it is not a user.  A user has privileges that
cut across scopes.  Now, whether packages should be 

Re: [HACKERS] 'infinity' in GiST index

2005-05-05 Thread Dave Held
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, May 04, 2005 11:17 PM
 To: Oleg Bartunov
 Cc: Pgsql Hackers
 Subject: Re: [HACKERS] 'infinity' in GiST index
 
 [...]
 Seems like it's not really GiST's fault but a definitional problem
 for the timestamp datatype.  Specifically, what does it mean to
 subtract two infinite timestamps?  I find it hard to assign a
 value to any of these combinations:
   +infinity minus +infinity
   -infinity minus -infinity
   +infinity minus -infinity
   -infinity minus +infinity

That's because you're talking about transfinite arithmetic, and
subtraction is not defined therein.  AKA the arithmetic of
infinite cardinals.  I've actually seen a few different 
formulations, some of which say that adding a finite number to
an infinity results in a different number than the infinity, and
some that say it is the original infinity.  However, it seems
that the most common formulation is the latter:

  w + 1 = w

Where w is lower-case omega, or aleph_0.  If we allowed subtraction,
then we could subtract w from both sides and end up with 1 = 0,
which would be an inconsistency.  Also, -w makes sense when talking
about the reals, but not when talking about transfinite arithmetic.
There are no additive inverses, because that is the same as allowing
subtraction, with the same result.  Note that within real arithemtic,
you can't do any math with infinity anyway.

 The first two can't really be identified with zero, and the 
 last two are surely not representable are they?

Not unless you change to another math system, which, of course,
wouldn't be appropriate for this application.

 What's worse, a subtraction involving one infinite and one finite
 timestamp *is* well defined from a mathematical point of view, eg
   +infinity minus 'yesterday' = +infinity

Actually not.  When doing transfinite arithmetic, you can only
add naturals to infinities.  Otherwise, you're getting a form of
subtraction, which will eventually lead to inconsistency.

 but I doubt GiST will be happy if we make the datatype behave
 that way...

I guess it depends on why you want to take the difference.  If
you want to take some measure of distance, it might be useful
to say that all infinite values of the same sign are at 0 distance 
from each other, in which case you would say that +w - +w = 0.
Probably infinities of opposite signs should just be w apart
(which is also mathematically consistent).

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Feature freeze date for 8.1

2005-05-03 Thread Dave Held
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 03, 2005 9:31 AM
 To: Hannu Krosing
 Cc: Heikki Linnakangas; Neil Conway; Oliver Jowett;
 [EMAIL PROTECTED]; Peter Eisentraut; Alvaro Herrera;
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Feature freeze date for 8.1
 
 [...]
 I am a tad worried about the possibility that if the client 
 does nothing for long enough, the TCP output buffer will fill
 causing the backend to block at send().  A permanently blocked
 backend is bad news from a performance point of view (it
 degrades the sinval protocol for everyone else).

So use MSG_DONTWAIT or O_NONBLOCK on the keepalive packets.
That won't stop the buffer from getting filled up, but if you
get an EAGAIN while sending a keepalive packet, you know the
client is either dead or really busy.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement

2005-05-03 Thread Dave Held
 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
 Sent: Monday, May 02, 2005 7:05 PM
 To: [EMAIL PROTECTED]
 Cc: Dave Held; [EMAIL PROTECTED];
 pgsql-hackers@postgresql.org
 Subject: Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: 
 Increased
 company involvement
 
 [...]
 I nat happy avout that last point - personally, I value most 
 highly the views of those who contribute code or similar and
 least highly the views of those whose principal contribution
 is opinions.

Maybe so, but if you were a new contributor, why would you write
a bunch of code with no assurance that it would go anywhere?
It seems wiser to invest your time familiarizing yourself with
the ins and outs of the codebase and the coding style of patches
by looking at other people's work.  It also seems smarter to
lurk and see what kinds of changes are likely to be considered.
I doubt you would think highly of a newcomer that contributed
code that was not in the style of the codebase and was for a
feature not on the TODO list and that didn't get community buy-in
first.  But then, how do you get community buy-in if you don't
contribute code, according to you?

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Feature freeze date for 8.1

2005-05-03 Thread Dave Held
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 03, 2005 12:39 PM
 To: Heikki Linnakangas
 Cc: Hannu Krosing; Neil Conway; Oliver Jowett;
 [EMAIL PROTECTED]; Peter Eisentraut; Alvaro Herrera;
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Feature freeze date for 8.1
 
 [...]
 BTW, the upthread proposal of just dropping the message (which is what
 O_NONBLOCK would do) doesn't work; it will lose encryption sync on SSL
 connections.

How about an optional second connection to send keepalive pings?
It could be unencrypted and non-blocking.  If authentication is
needed on the ping port (which it doesn't seem like it would need
to be), it could be very simple, like this:

* client connects to main port
* server authenticates client normally
* server sends nonce token for keepalive authentication
* client connects to keepalive port
* client sends nonce token on keepalive port
* server associates matching keepalive connection with main 
connection
* if server does not receive matching token within a small
timeout, no keepalive support enabled for this session

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Feature freeze date for 8.1

2005-05-03 Thread Dave Held
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 03, 2005 3:36 PM
 To: Dave Held; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Feature freeze date for 8.1
 
 [...]
 Yes, this looks like good.But ;
 
   1. Do client interfaces (ODBC,JDBC OLEDB etc) need to
 be changed ?

Only if they want to support the keepalive mechanism.  It should
be purely optional.

   2. If a firewall is used, ppl need to know the second
 port number so mean that 2 parameters should be added to
 postgres the first is timeout value and the second is port
 number of the second port would be used for keepalive..

Sounds fine to me.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Feature freeze date for 8.1

2005-05-03 Thread Dave Held
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 03, 2005 4:20 PM
 To: Dave Held
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Feature freeze date for 8.1
 
 
 Dave Held [EMAIL PROTECTED] writes:
  How about an optional second connection to send keepalive pings?
  It could be unencrypted and non-blocking.  If authentication is
  needed on the ping port (which it doesn't seem like it would need
  to be), it could be very simple, like this:
 
  * client connects to main port
  * server authenticates client normally
  * server sends nonce token for keepalive authentication
  * client connects to keepalive port
  * client sends nonce token on keepalive port
  * server associates matching keepalive connection with main 
  connection
  * if server does not receive matching token within a small
  timeout, no keepalive support enabled for this session
 
 
 This seems to have nothing whatever to do with the stated problem?

I thought the problem was a server process that loses a 
connection to a client sticking around and consuming resources.
And then I thought a possible solution was to try to see if
the client is still alive by sending it an occasional packet.
And then I thought a new problem is sending packets to an
unresponsive client and filling up the output buffer and
blocking the server process.

So it seems that a possible solution to that problem is to
have a separate connection for keepalive packets that doesn't
block and doesn't interfere with normal client/server 
communication.

Now granted, it is possible that the primary connection could
die and the secondary is still alive.  So let's consider the
likely failure modes:

* physical network failure

In this case, I don't see how the secondary could survive while
the primary dies.

* client hangs or dies

If the client isn't reading keepalives from the server, 
eventually the server's send queue will fill and the server 
will see that the client is unresponsive.  The only way the 
client could fail on the primary while responding on the 
secondary is if it makes the connections in different threads, 
and the primary thread crashes somehow.  At that point, I would 
hope that the user would notice that the client has died and 
shut it down completely.  Otherwise, the client should just not
create a separate thread for responding to keepalives.

* transient network congestion

It's possible that a keepalive could be delayed past the 
expiration time, and the server would assume that the client 
is dead when it's really not.  Then it would close the client's
connection rather rudely.  But then, since there's no reliable
way to tell if a client is dead or not, your other option is to
consume all your connections on maybe-dead clients.

So what am I missing?

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [pgsql-advocacy] [HACKERS] Increased company involvement

2005-05-02 Thread Dave Held
 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED]
 Sent: Saturday, April 30, 2005 12:04 PM
 To: PostgreSQL advocacy
 Cc: Kris Jurka; Andrew Dunstan; PostgreSQL-development
 Subject: Re: [pgsql-advocacy] [HACKERS] Increased company involvement
 
 [...]
 The thing that limits centralization is that it is critical that
 any individual or company feel free to join the community efforts.
 When centralization happens, there is often an _in_ and and _out_ 
 group that is very bad for encouraging new members.
 [...]
 We don't want core to steer development anymore than we want a
 centralized group to do that, because if we did, the next company
 that comes along and wants to enhance PostgreSQL or offer technical
 support services will feel they have to get approval/buy-in from
 the _in_ group, and that isn't a productive setup.  The fact that
 new companies getting involved can't find a central authority is a
 _good_ thing, if you think about it. It means that we have succeeded
 in building a community that allows people to join and feel a part
 right away, and they don't have to buy-in or play politics to do it.

Well, you make Postgres sound like a very democratic community, but
I'm afraid this is a fairy tale.  Aren't the people who approve
patches exactly the in group that you claim doesn't exist?  Aren't
they the people that you need buy-in from to really contribute to
Postgres?  The reason I make this point is because I know what a
democratic development community really looks like, and the Boost
community is one such example.  That truly *is* democratic, because
decisions are made as a group, and no fixed subset of members has 
an overriding veto.  The group has moderators, but they exist only
to moderate discussion on the mailing lists.  I'm not saying that
it is bad that Postgres is not democratic.  Postgres is a totally
different kind of beast than Boost, and probably benefits from 
having a few people ultimately decide its fate.  But let's call a 
spade a spade and not pretend that contributors don't have to get 
buy-in from core.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [pgsql-advocacy] [HACKERS] Increased company involvement

2005-05-02 Thread Dave Held
 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED]
 Sent: Monday, May 02, 2005 12:17 PM
 To: PostgreSQL advocacy
 Cc: Dave Held; PostgreSQL-development
 Subject: Re: [pgsql-advocacy] [HACKERS] Increased company involvement
 
  [...]
  Really?  You have a different perspective than I see.  I have
  seen patches be accepted that had no core buy-in.  We accept 
  patches based on group feedback, not some closed approval
  process.
 
 Let me also ask for you to provide an example of the behavior you
 describe.

Well, I think there's numerous examples where someone suggests some
feature or idea, and Tom or one or two other core developers will
say: I don't like that idea, and then the proposer will more or
less give up on it because it is clear that it won't go anywhere.
So whether the process gets stopped at the patch submission level
or the feature proposal level isn't really relevant.  It seems pretty
clear that a handful of people decide the direction of Postgres,
and everyone else can either contribute to the features that have
been agreed to be acceptable and relevant, or they can fork their
own version.

Just watching the hackers list suggests to me that this is the norm,
rather than the exception.  I guess I'm interested to see which
patches have been accepted that the core developers opposed.  Now
don't get me wrong.  Sometimes there are good technical reasons why
feature A or B can't or shouldn't be added or even developed.  And
I don't suggest that patches lacking technical merit should not be
rejected.  But sometimes it seems that ideas with undetermined
merit get passed over because of a quick judgement based on 
intuition, and only if the proposer actively fights for it for a
while does it get reconsidered.

Of course, it would be quite a bit of work for me to review the
list and compile instances where I think this has occurred, but
only because of the tedium involved to make a minor point...not
because I think I would have difficulty finding evidence.  I'm just
saying that as an outsider, if I had a lot of resources to devote
to contributing to Postgres, I would only consider working on
approved TODO items or making sure I more or less had core buy-in
before writing any code.  I don't think it would be worth my
time to work on something that non-core users/developers might
like but core hackers don't.

Like I said, that's not necessarily a bad thing.  Postgres is a
piece of software with many interacting components, and there
needs to be some coordination to make sure it evolves in a 
sensible way.  But I think that implies that there must be and
is some de facto centralization of control, whether that is the
published ideology or not.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [pgsql-advocacy] [HACKERS] Increased company involvement

2005-05-02 Thread Dave Held
 -Original Message-
 From: Joshua D. Drake [mailto:[EMAIL PROTECTED]
 Sent: Monday, May 02, 2005 12:33 PM
 To: Dave Held
 Cc: PostgreSQL-development; PostgreSQL advocacy
 Subject: Re: [pgsql-advocacy] [HACKERS] Increased company involvement
 
 [...]
 PostgreSQL is more of Democratic Republic than an actual 
 democracy but they do very well at it.

I buy that.  It is probably a fairly accurate description of
the Postgres community.  Everyone has a voice, but ultimately,
the Senate (i.e.: patch approvers) passes the laws.  Where
it differs is that the Senate is not necessarily democratically
elected. ;)

 Any person can bring a patch and submit it, any person in the 
 community can argue for it and any person can take the time to
 fix it to the specifications that core sets forth.

Which brings up an important point.  The core developers define
the structure in which change can occur.  If people think that
Postgres should move in a direction that affects that framework,
they have to convince core to redefine that specification.  It's
like writing new laws vs. amending the Constitution.  Even though
anyone can draft a bill and submit it to their representative,
it's ultimately Congress that makes the laws.  And while public
opinion can ultimately affect the actions of Congress, it is
still a sovereign body.  As Bruce himself said, companies that
wish to contribute must not assume that their work will be
integrated into Postgres.  The official stance is that there
only needs to be community buy-in, but it seems more realistic
that there needs to be core buy-in as well, at the least because
of the influence that core thinking has on the community itself.
That's not a bad thing per se, but it's definitely something that
contributors should consider.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement

2005-05-02 Thread Dave Held
 -Original Message-
 From: Josh Berkus [mailto:[EMAIL PROTECTED]
 Sent: Monday, May 02, 2005 1:21 PM
 To: Bruce Momjian
 Cc: Marc G. Fournier; PostgreSQL advocacy; Dave Held;
 PostgreSQL-development
 Subject: Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: 
 Increased
 company involvement
 
 [...]
 Hmmm ... why does everyone assume that Core does more than 
 what we do?  I think that most people would be surprised by
 how *little* traffic there is on the pgsql-core mailing list.   

Well, I never said that core runs around saving the world.  I
mostly made the point that core developers have special 
influence, and that should be considered when contributing to
Postgres, which is directly relevant to the point of the
thread, which was originally called Increased company 
involvement.

 Core decides on releases, and approves committers.  
 Occasionally we'll handle something which requires
 confidentiality, like a security issue or a new 
 corporate participant.

Which is also something that new would-be corporate 
contributors should know about.

 [...]
 Materially, what's accepted is decided through open
 discussion on the pgsql-hackers list; even Tom brings
 up his patches for discussion before commit, and I'd
 defy you to point to even one patch which was accepted
 by consensus on pgsql-hackers and not committed.

But this misses the point.  The point is that consensus is
often an iterative process, and even if a few people support
an idea at first, in the end, the weight of a few inner
circle people (whether they be core or patch approvers
or whatnot) tends to sway the consensus in a certain 
direction.  This isn't always bad, especially if those
core people simply know more about the internals of 
Postgres to have better judgement.  It is bad if the person
making the proposal doesn't feel he/she has good odds in
defending the proposal and gives up without a fight.

 As you've already observed, if Tom doesn't like something 
 it's very unlikely to get through.But that's true for
 a lot of major contributors; the consensus process we use
 provides ample opportunities to veto and slender
 opportunities to pass. 

This also misses another point.  I'm not saying that the 
current process is inherently flawed.  It's probably about as 
good as any OSS project.  My point is that it's not *democratic*,
and that outsiders wishing to contribute should understand
the dynamic of the process that is not explicitly and officially 
spelled out anywhere.

 [...]
 From my perspective, this is a good thing for a database
 system which can get easily broken by an ill-considered 
 patch.  It's *good* for us to be development-conservative.

Right.  I agree.  I'm not criticising the process as a whole,
and I've more or less made this exact point myself.

 So there is an insider group, but it's the group of major 
 contributors. 

That is exactly my point, but you said it better.

 Tom has the loudest voice because he writes the most code.  
 The fact that Tom, Bruce or Peter's veto is often as far as
 a proposal goes is simply because most of the pgsql-hackers
 subscribers simply don't involve themselves in the process
 unless it's one of their own pet features. 

Which is perfectly understandable.  You can probaby guess that
most people who use Postgres haven't tried to implement an 
RDBMS themselves, and have only a shallow understanding of the 
details.

 And the important thing about the group of major contributors
 is that membership is open.

Which may be true philosophically, but in practice, most people
who contribute will not have the resources or motivation to
become a major contributor.  I do not mean to imply that this
is necessarily a bad thing; but I think it is the true state of
affairs, and part of the dynamic which must be understood by
someone considering investing in Postgres as a contributor.

 [...]
 If people want the acceptance process to be more democratic,
 then those people have to be willing to do the work of full 
 participation. 

That actually doesn't make it more democratic.  In a democracy,
everyone has an equal vote regardless of their status.  The point
is that a democracy is not always a priori the best form of 
organization.  What you describe is actually a meritocracy,
and for a project like Postgres, it makes a lot of sense.  But
that merely reinforces my point that contributors need to
understand that if their pet feature they create is not in line
with core thinking, they will have to earn the credibility to
get community buy-in.

 [...]
 (P.S. on a complete tangent, call a spade a spade is 
 actually a racist expression originating in the
 reconstruction-era South.  spade does not mean garden tool
 but is a derogatory slang term for black people.
 [...]

Interesting.  Duly noted.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast

Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement

2005-05-02 Thread Dave Held
 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED]
 Sent: Monday, May 02, 2005 3:33 PM
 To: Dave Held
 Cc: PostgreSQL advocacy; PostgreSQL-development
 Subject: Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: 
 Increased
 company involvement
 
 [...]
 Here is a new FAQ entry:
 
 H3A name=1.131.13/A) Who controls PostgreSQL?BR
 
 PIf you are looking for a PostgreSQL gatekeeper, 
 central committee, or controlling company, give up, because
 none exists.  We do have a core committee and CVS committers,
 but these groups are more for administrative purposes then
 control.  The project is directed by the open community of
 developers and users of PostgreSQL.  Everyone is welcome to
 subscribe and take part in the discussions.  (See the
 a href=http://www.postgresql.org/docs/faqs.FAQ_DEV.html;
 Developer's FAQ/A for information on how to get 
 involved in PostgreSQL development.)/P
 
 Adjustments?

...are more for administrative purposes [then-than] control...

pBecause PostgreSQL is a monolithic product, all of its features
must work together in tight harmony.  It is in the interests of 
the PostgreSQL community that new features be integrated in a way 
that preserves this harmony.  Thus, new feature proposals are
scrutinized and debated by the community to ensure that changes 
have sufficient technical merit.  Be prepared to defend your 
proposal, and don't assume that a privately developed contribution 
will automatically be accepted by the PostgreSQL community.  To 
maximize the chance of success in proposing a change, consider 
these suggestions:

* Propose your change/feature publicly - OSS is about community,
and a collection of contributors working independently without
communication is not a community; this avoids duplication of
effort and promotes collaboration/cooperation among parties
that have a common interest
* Research your proposal to see if it has already been discussed
on the mailing list
* Research your proposed solution to make sure it is the best of
breed - database technology is a very active subject of
academic research, and it is possible, if not likely, that
someone has written a paper on the topic
* Engage the community by participating in discussions and patch
reviews - your credibility as a contributor depends on your
willingness to contribute to the community in non-coding
ways as well
/p

I realize that this runs a bit far afield from the original
question of Who controls PostgreSQL?, but I think it addresses
the points that someone who asks this question is likely to
want to know.  It also tackles the contribution question from
a higher level than the dev-faq.  Obviously, the bullet points
would be formatted as a list or some other appropriate HTML
construct.  And as a minor point, it would be nice if the
website validated to XHTML-strict, although XHTML-transitional
would be a good compromise.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement

2005-05-02 Thread Dave Held
 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Monday, May 02, 2005 1:50 PM
 To: josh@agliodbs.com
 Cc: Bruce Momjian; Marc G. Fournier; PostgreSQL advocacy; Dave Held;
 PostgreSQL-development
 Subject: Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: 
 Increased
 company involvement
 
 
 [...]
 Our process is not democratic in the sense of any random 
 subscriber to the mailing lists having the same vote as a
 core member --- and I'll bet Boost doesn't run things that
 way either. 

Actually, it does, but it can afford to for very special reasons.
Because Boost is not about a single problem domain, there is no
real core of developers.  There are some who have contributed
more libraries, or larger libraries; but at the end of the day,
each review and submission is judged on its own merits.  Often,
the person submitting a new library for review is a domain expert
for that library; and people reviewing a library are also often
domain experts, even if they are a first-time reviewer.  So the
very nature of Boost allows it to be more democratic.  Because
Postgres is about a single problem domain, and because each
submission must work in concert with an extant whole, it has
totally different needs and a totally different type of community.
And because a database isn't exactly a modular beast like, say,
a web server, that limits the openness of the community further.
That is to say, there is a barrier to entry, but it isn't
capriciously imposed by the community members.  It's just a
necessary outcome of the nature of the project.  People who
want to contribute should understand this barrier and how it
works before they start writing code.

 What we have is pretty informal but I think it effectively
 gives more weight to the opinions of those more involved in
 the project; which seems a good way to operate.

For Postgres, I agree.

 But there isn't anyone here who has an absolute veto, nor
 contrarily anyone who can force things in unilaterally over
 strong objections.

Nor would one expect such a thing in a project that claims to
be OSS.  But ultimately persuasion is as much a part of 
consensus as merit, and people should recognize that fact
when contributing to the project.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-27 Thread Dave Held
 -Original Message-
 From: Gurmeet Manku [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, April 26, 2005 5:01 PM
 To: Simon Riggs
 Cc: Tom Lane; josh@agliodbs.com; Greg Stark; Marko Ristola;
 pgsql-perform; pgsql-hackers@postgresql.org; Utkarsh Srivastava;
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
 suggested?
 
 [...]
  2. In a single scan, it is possible to estimate n_distinct by using
 a very simple algorithm:
 
  Distinct sampling for highly-accurate answers to distinct value
   queries and event reports by Gibbons, VLDB 2001.
 
  http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf
 
 [...]

This paper looks the most promising, and isn't too different 
from what I suggested about collecting stats over the whole table
continuously.  What Gibbons does is give a hard upper bound on
the sample size by using a logarithmic technique for storing
sample information.  His technique appears to offer very good 
error bounds and confidence intervals as shown by tests on 
synthetic and real data.  I think it deserves a hard look from 
people hacking the estimator.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-27 Thread Dave Held
 -Original Message-
 From: Greg Stark [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, April 27, 2005 1:00 AM
 To: Tom Lane
 Cc: Rod Taylor; Greg Stark; pgsql-hackers@postgresql.org; 
 Gurmeet Manku
 Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
 suggested?
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  Rod Taylor [EMAIL PROTECTED] writes:
   If when we have partitions, that'll be good enough. If 
   partitions aren't available this would be quite painful
   to anyone with large tables -- much as the days of old
   used to be painful for ANALYZE.
  
  Yeah ... I am very un-enthused about these suggestions to 
  make ANALYZE go back to doing a full scan ...

I don't see why ANALYZE would always have to do a full scan.
Clearly, this statistic would only be useful to people who need
a very accurate n_distinct on tables where the current metric
does not work very well.  Applying a specialized solution to
every table doesn't seem like an efficient way to go about 
things.  Instead, the distinct sampling mechanism should be
purely optional, and probably purely separate from the vanilla
ANALYZE mechanism, because it works differently.  If it were
designed that way, the full table scan would be a one-time
cost that would not even need to be paid if the user turned
on this mechanism at table creation.  Thereafter, the statistic
would need to be updated incrementally, but that just 
distributes the cost of ANALYZE over the INSERT/UPDATE/DELETEs.
Obviously, it's a higher cost because you touch every record
that hits the table, but that's the price you pay for a good
n_distinct.

The block estimator should probably become the default, since
it works within the current ANALYZE paradigm of sampling the
data.

 [...]
 For most use cases users have to run vacuum occasionally. In 
 those cases vacuum analyze would be no worse than a straight
 normal vacuum.

And that's only if you do a full table scan every time.  In the
incremental implementation, there are no lump sum costs involved
except when the statistic is first initialized.

 Note that this algorithm doesn't require storing more data
 because of the large scan or performing large sorts per
 column. It's purely O(n) time and O(1) space.

And I think it should be emphasized that distinct sampling not
only gives you a good n_distinct for query planning, it also
gives you a very fast approximate answer for related aggregate
queries.  So you're getting more than just query tuning for that
cost.

 On the other hand, if you have tables you aren't vacuuming 
 that means you perform zero updates or deletes. In which case
 some sort of incremental statistics updating would be a good
 solution. A better solution even than sampling.

Which, for the large data warehousing situations where it seems
this mechanism would be most useful, this would probably be the 
most common case.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-27 Thread Dave Held
 -Original Message-
 From: Josh Berkus [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, April 27, 2005 10:25 AM
 To: Andrew Dunstan
 Cc: Mischa Sandberg; pgsql-perform; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
 suggested?
 
 [...]
 Actually, it's more to characterize how large of a sample
 we need.  For example, if we sample 0.005 of disk pages, and
 get an estimate, and then sample another 0.005 of disk pages
 and get an estimate which is not even close to the first
 estimate, then we have an idea that this is a table which 
 defies analysis based on small samples.  

I buy that.

 Wheras if the two estimates are  1.0 stdev apart, we can
 have good confidence that the table is easily estimated. 

I don't buy that.  A negative indication is nothing more than
proof by contradiction.  A positive indication is mathematical
induction over the set, which in this type of context is 
logically unsound.  There is no reason to believe that two
small samples with a small difference imply that a table is
easily estimated rather than that you got unlucky in your
samples.

 [...]
 Yes, actually.   We need 3 different estimation methods:
 1 for tables where we can sample a large % of pages
 (say, = 0.1)
 1 for tables where we sample a small % of pages but are 
 easily estimated
 1 for tables which are not easily estimated by we can't 
 afford to sample a large % of pages.

I don't buy that the first and second need to be different
estimation methods.  I think you can use the same block
sample estimator for both, and simply stop sampling at
different points.  If you set the default to be a fixed
number of blocks, you could get a large % of pages on
small tables and a small % of pages on large tables, which
is exactly how you define the first two cases.  However,
I think such a default should also be overridable to a
% of the table or a desired accuracy.

Of course, I would recommend the distinct sample technique
for the third case.

 If we're doing sampling-based estimation, I really don't
 want people to lose sight of the fact that page-based random
 sampling is much less expensive than row-based random
 sampling.   We should really be focusing on methods which 
 are page-based.

Of course, that savings comes at the expense of having to
account for factors like clustering within blocks.  So block
sampling is more efficient, but can also be less accurate.
Nonetheless, I agree that of the sampling estimators, block
sampling is the better technique.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Woo hoo ... a whole new set of compiler headaches!!

2005-04-22 Thread Dave Held
 -Original Message-
 From: Dann Corbit [mailto:[EMAIL PROTECTED]
 Sent: Friday, April 22, 2005 1:08 PM
 To: Andrew Dunstan; Dave Held
 Cc: pgsql-hackers@postgresql.org
 Subject: RE: [HACKERS] Woo hoo ... a whole new set of compiler
 headaches!!
 
  From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
  [EMAIL PROTECTED] On Behalf Of Andrew Dunstan
  
  Dave Held wrote:
  
  I see the smiley, but moving to C++ isn't just about switching
  to the latest fad language.
  
  No, it's about moving to the fad language about 2 
  generations back ...

Except that C++ is hardly a fad language.  Most estimates place
the top 3 languages by number of programmers as C++, Java, and C,
with C++ and Java switching positions, and C lagging behind by a
decent margin.  And it's been this way for quite a while.

 Language wars are about as much fun as operating system wars.
 C and C++ are both nice languages.

My intent was not to start a language jihad.  I don't think C is a
bad language.  I just think C++ is better.

  [...]
  Unless you did a major rewrite it's hard to see any great 
  advantages.

There doesn't need to be great advantages.  Just enough to justify
the effort.  Take casts, for instance: C uses a single syntax for
all types of cast.  C++ breaks casting down into static_casts
(for upcasting to a type for which the opposite cast would be an
implicit conversion), dynamic_cast (for polymorphic types, which
don't exist in the C++ sense in the Postgres codebase, by definition),
const_cast (for casting away constness, or adding it), and
reinterpret_cast (for dangerous bit twiddling where you'd better
know the exact layout for each platform).  Probably most of the
casts in the Postgres codebase would be converted to static_cast.
And if the types were upgraded over time, many of those casts could
probably go away.  The dangerous casts would remain marked as
reinterpret_cast, and that would serve to highlight the portions
of the code that are probably platform-dependent and that probably
need inspection when porting to a new platform.

While the benefits of using the C++-style casts would only be
maximized if they were used everywhere, you still get incremental
benefit from converting them a few at a time.

Consider inline functions.  In C, you have to implement them as
macros, which eliminates your type safety.  In C++, you can get
both type safety and performance.  Take a concrete example: qsort().
In C, you must pass a function pointer to use this function, and
that function pointer gets dereferenced every time qsort() needs to
do a comparison.  That's a lot of overhead that is eliminated in
C++'s sort() function, which accepts a comparison functor that
can and often does get inlined.

  There are over 600,000 lines of code in Postgres by my rough
  count. The potential rewrite effort is enormous. A thorough
  job would probably consume a release cycle just on its own.
 
 You could use C++ as a better C with very little effort 
 (but there are C++ keywords sprinkled here and there, so it
 would be a good month of work for somebody).

My point exactly.  It's *not* a task that has to be tackled all
at once.  Once you have a total C++ codebase, converting it into
a C++ style can be done quite incrementally, with benefits
accruing with each update.

 [...]
  On the downside, some of us (including me) have much more
  experience in and ease with writing C than C++. I could
  certainly do it - I write plenty of Java, so OO isn't a closed
  book to me, far from it - but ramping up in it would take me
  at least some effort. I bet I'm not alone in that.
 
 This is the crux of the matter.  You will certainly not be alone
 here.  I (personally) prefer C++ to C, but I am comfortable in 
 either language.  However, if you have a team of 100 C programmers
 and a huge C project, it is a terrible mistake to ask them to use
 C++.

I disagree.  The C programmers could learn C++ rules one at a time.
The first rule would simply to be to not use C++ keywords as 
identifiers.  That is really the minimum necessary to write C style
code in a C++ program.  The next might be to replace macro constants
with const ints.  The next might be to replace C-style casts with
C++-style.  There is really no need to throw the whole book at
the developer community all at once.  It might take a year or two
to get the codebase into idiomatic C++, but the developers would
have learned C++ quite easily without really noticing it.  I would
certainly not suggest something radical like replacing hand-rolled
containers with standard library equivalents.  *That's* the kind of
rewrite that should give any coder nightmares.

Even OOP-style encapsulation could be done incrementally.  You take
a few fields of some struct, make them private, add accessor 
functions, and update the references.  You don't have to hide all
the data all at once.  I know, because I've upgraded lots of C
code to C++, and it's not nearly as hard as the typical C 
programmer thinks

Re: [HACKERS] Woo hoo ... a whole new set of compiler headaches!! :)

2005-04-22 Thread Dave Held
 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
 Sent: Friday, April 22, 2005 3:49 PM
 To: Dave Held
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Woo hoo ... a whole new set of compiler
 headaches!! :)
 
 [...]
 I recall saying something like this when we were being urged 
 to replace CVS with SubVersion/Arch/SomethingElse but I'll say
 it again - this decision should be made by the people who
 contribute the most. All the rest (including my contribution)
 is just noise, IMHO.

Well, I think it goes without saying that such a decision will
ultimately be made by the core developers.  But to say that nobody
should *suggest* changes seems a bit odd to me.  I mean, people
suggest changes to the design of Postgres almost daily, and some 
of them aren't even coders.  But if nobody outside of the core
developers suggests changes, that kind of takes some of the open
out of open source.  True, the source would still be open, but
there's a subtlety in the name that is similar to the free in
free software.  That's not to say that any given suggestion
is worthy of serious consideration; but to say that any suggestion
that doesn't come from the core is just noise to my ear doesn't
sound any different than Redmond saying any suggestion that doesn't
come from 1 Microsoft Way is just noise.

As an aside, I don't think that switching to SVN is a half bad
idea either. ;

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] argtype_inherit() is dead code

2005-04-20 Thread Dave Held
 -Original Message-
 From: Jim C. Nasby [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, April 19, 2005 5:56 PM
 To: Christopher Browne
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] argtype_inherit() is dead code
 
 [...]
 On Sun, Apr 17, 2005 at 07:01:41PM -0400, Christopher Browne wrote:
  [...]
  Object Orientation is all about the notion of having data that
  is aware of its type, and where there can be a dispatching of
  methods against those types.
  
  There is already a perfectly functional ability to dispatch based
  on argument types.
  
  These essentials are there.

Well, if you go with Bjarne Stroustrup's formulation of OOP (which is,
of course, by no means exclusively authoritative), the core of OOP
is encapsulation, inheritance, and polymorphism.  Inherited tables
provide the second, overloaded functions provide the third, but
the security model is left providing the first.  However, I would say
that the first property is the most essential for OOP, because in my
view, OOP is about *data hiding*.  In particular, it's about separating
the implementation from the interface, and forcing users to access
objects through the interface.  While such a design philosophy is
*possible* with Postgres, it is by no means encouraged or *easy*.
Furthermore, it probably doesn't make sense in all contexts.  

One way to think about an object-relational database is as a set of
persistent objects stored in well-known containers.  While traditional
programming languages offer several common access methods for 
containers, the point of a query language is to offer an extremely
powerful and generalized container access system.  However, this
access system is really an implementation detail of the object system,
while at the same time being the primary means of object interaction.

In terms of manipulating data, it's really about as OOP as passing
around raw pointers to everything.  From this perspective, DBs will not
support OOP while SQL remains the primary access method; and there is
no reason to believe that people will give up SQL in favor of a more
OOP-like interface.

 Yes, but they're only there when it comes to storing data. There's
 nothing allowing you to cohesively combine code and data.

I agree entirely.  And I also agree that in many cases, there is no
sensible way to do so.  One of the ways in which DBs are different from
programming language objects is in the data decomposition.  Most PLs
have self-contained objects whose data is primarily localized within
one structure that is more or less contiguous in memory.  DBs, on the
other hand, tend to have objects that may span multiple tables, because
this is the most efficient way of storing the data.  In a way, the
relational model is the antithesis of the OOP model.  The central theme
of the relational model is *data-sharing*.  The idea that data should
be decomposed and the common pieces factored out.  Whereas, OOP says
that it is the *functionality* that should be factored out into a
minimal interface.

 An object should be able to have methods attached to it, for example.

I don't think that's sufficient.  To support encapsulation, you also
need to enforce access to the data through the method interface.  Else,
you can simulate methods with stored procedures.

 And that functionality is essentially missing. There's no way to
 present a combined set data and code that operates on that data.

That's encapsulation.  And it's missing.  But for a good reason.

 It doesn't really matter why this kind of functionality is missing;
 the fact that it is missing means it's much less likely that any of
 the OO stuff will be used.

Actually, it *does* matter why it's missing.  The reason it's missing
tells us why people don't use the OOP features of the DB.  What needs
to be done is to construct a consistent theory of how the relational
model and the OOP model can be integrated.  The OOP model is about
data integrity, maintaining object invariants, ensuring program
correctness, etc.  The relational model is about performance, storing
data efficiently, querying it efficiently, etc.  These are competing
goals, and it may well be that a good object-relational theory simply
develops a framework in which the tradeoffs are explicitly stated
and describes how to implement different points in the design space 
in a consistent way.  I realize that there is some existing work
with object-relational modelling, but my impression is that such
work is still fairly immature and scattered.

 I think the current limitations (foreign keys, and cross-table
 constraints) are issues as well. It might also help if the 
 docs had some info about how inherited tables worked 'under the
 covers', so people knew what kind of overhead they implied.

I don't think inherited tables work in an entirely intuitive way.  It
certainly doesn't help that viewing an inherited table through pgAdmin
shows records that aren't returned by an equivalent query.  I think
the problem is that 

Re: [HACKERS] pg_hba.conf

2005-04-18 Thread Dave Held
 -Original Message-
 From: ElayaRaja S [mailto:[EMAIL PROTECTED]
 Sent: Monday, April 18, 2005 1:38 PM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] pg_hba.conf
 
 
 Hi,
  I am using Redhat linux 9. i had configure in pg_hba.conf as
 hostpostgres  postgres   10.10.0.76   255.255.255.0   password
 
 If i try to connect with postgresql admin i am getting excpetion as
 
 An erro has occured:
 
 Error connecting to the server: could not connect to server:
 Connection refuesed(0x274D/10061)
   Is the server running on host 10.10.0.76 and accepting
   TCP/IP connections on port 5432?
   
 
 Please help me.  

The first bit of advice I can offer is to ask on the right list.  A
perusal of:

http://www.postgresql.org/community/lists/

should indicate that the pgsql-admin list would be a good list to
ask.  If you're not sure what list would be best, it seems that
psql-general would be a better default choice than -hackers.
Second, the error message is quite informative.  It says that the
client doesn't think you have a server listening on the default port.
Check your process list to make sure that you do.  Check to make
sure that you can connect to that host (try connecting to a different
service on the same server).  Check to make sure you are not getting
blocked by a firewall.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] ARC patent

2005-04-01 Thread Dave Held
 -Original Message-
 From: Marian POPESCU [mailto:[EMAIL PROTECTED]
 Sent: Friday, April 01, 2005 8:06 AM
 To: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] ARC patent
 
 Neil Conway [EMAIL PROTECTED] writes:
 
 
 FYI, IBM has applied for a patent on ARC (AFAICS the 
 patent application is still pending, although the USPTO
 site is a little hard to grok):
 
 Ugh.  We could hope that the patent wouldn't be granted, 
 but I think it unlikely, unless Jan is aware of prior art
 (like a publication predating the filing date).  I fear we'll
 have to change or remove that code.

Why not just ask IBM for a free license first?  After all, they put 
1,000+ patents in the public domain or something, didn't they?  I 
realize that they might use this technology in DB2, and don't want
to encourage competitors.  But IBM seems a lot more friendly to OSS
than most companies, and it doesn't seem like it would hurt to ask.
At the worst they say no and you just proceed as you would have
originally.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ARC patent

2005-04-01 Thread Dave Held
 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED]
 Sent: Friday, April 01, 2005 10:23 AM
 To: Dave Held
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] ARC patent
 
 
 Dave Held wrote:
   -Original Message-
   From: Marian POPESCU [mailto:[EMAIL PROTECTED]
   Sent: Friday, April 01, 2005 8:06 AM
   To: pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] ARC patent
   
   Neil Conway [EMAIL PROTECTED] writes:
   
   
   FYI, IBM has applied for a patent on ARC (AFAICS the 
   patent application is still pending, although the USPTO
   site is a little hard to grok):
   
   Ugh.  We could hope that the patent wouldn't be granted, 
   but I think it unlikely, unless Jan is aware of prior art
   (like a publication predating the filing date).  I fear we'll
   have to change or remove that code.
  
  Why not just ask IBM for a free license first?  After all, they put 
  1,000+ patents in the public domain or something, didn't they?  I 
  realize that they might use this technology in DB2, and don't want
  to encourage competitors.  But IBM seems a lot more friendly to OSS
  than most companies, and it doesn't seem like it would hurt to ask.
  At the worst they say no and you just proceed as you would have
  originally.
 
 The problem is that they would have to license all commercial,
 closed-source distributions of PostgreSQL too, and I doubt 
 they would do
 that.

Why would they have to do that?  Why couldn't they just give a license
for OSS distributions of PostgreSQL, and make commercial distributions
obtain their own license for the ARC code?  Doesn't IBM hire lawyers
exactly for the purpose of writing complicated legal documents of this
nature? ;  Or is it that the Postgres team wouldn't use an algorithm
that wasn't freely available to everyone?

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] Modifying COPY TO

2005-02-25 Thread Dave Held
Title: Modifying COPY TO






I am interested in hacking COPY TO such that one can specify that

rows are copied in a certain index order. I got as far as 

src/backend/commands/copy.c:CopyTo(), and it looks like I would need

to modify the call to heap_beginscan() so that it uses a key. However,

I couldn't figure out how to provide one, or if I'm even looking at the

right area. Ideally, this behavior would be specified with a flag,

perhaps: WITH INDEX index_name or WITH PRIMARY KEY

or something similar.


The motivation for this change is as follows. I have a fairly large

database (10 million+ records) that mirrors the data in a proprietary

system. The only access to that data is through exported flat files.

Currently, those flat files are copied directly into a staging area in the 

db via a COPY FROM, the actual tables are truncated, and the

staging data is inserted into the live tables. Since the data is read-only,

it doesn't matter that it is recreated every day. However, as you

can imagine, the import process takes quite a while (several hours).

Also, rebuilding the db from scratch every day loses any statistical

information gathered from the execution of queries during the day.


A possibility that I would like to pursue is to keep the staging data

from the previous day, do a COPY TO, import the new data into

another staging table with a COPY FROM, then export the fresh

data with another COPY TO. Then, I can write a fast C/C++

program to do a line-by-line comparison of each record, isolating

the ones that have changed from the previous day. I can then

emit those records in a change file that should be relatively small

and easy to update. Of course, this scheme can only work if

COPY TO emits the records in a reliable order.


Any assistance on this project would be greatly appreciated. The

best I can see, I'm stuck on line 1053 from copy.c:


 scandesc = heap_beginscan(rel, mySnapshot, 0, NULL);


I suspect that I want it to look like this:


 scandesc = heap_beginscan(rel, mySnapshot, 1, key);


where 'key' is an appropriately constructed ScanKey. It looks

like I want to call ScanKeyEntryInitialize(), but I'm not sure what

parameters I need to pass to it to get an index or the primary

key. I mostly need help building the ScanKey object. I think I 

can figure out how to hack the custom option, etc. I should 

mention that I am using the 7.4.7 codebase on Linux 2.4.


__

David B. Held

Software Engineer/Array Services Group

200 14th Ave. East, Sartell, MN 56377

320.534.3637 320.253.7800 800.752.8129