Re: [HACKERS] Oracle Style packages on postgres
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 2:22 PM To: Dave Held Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Oracle Style packages on postgres Dave Held [EMAIL PROTECTED] writes: /* * We check the catalog name and then ignore it. */ if (!isValidNamespace(name[0])) { if (strcmp(name[0], get_database_name(MyDatabaseId)) != 0) ereport(ERROR, Which more or less proves my point: the syntax is fundamentally ambiguous. Not at all. Ambiguity means that there are two equally valid parses. Under the semantics I proposed, schema names take precedence. That is, given: db: foo schema: bar schema: foo.bar The expression foo.bar.rel.col refers to schema foo.bar, and not to db foo, schema bar. If by fundamentally ambiguous, you mean there is no a priori reason to choose one set of semantics over another, I would tend to disagree, but the syntax as I proposed it is not ambiguous. We use precedence to eliminate otherwise valid parses all the time. I suppose people would learn not to use schema names that match the database they are in, but that doesn't make it a good idea to have sensible behavior depend on non-overlap of those names. There's nothing wrong with using a schema name that matches the db. The only confusion comes when you put nested elements at both the db level and schema level having the same names. Since I presume most people don't specify db names in their queries, having schemas take precedence makes the most sense to me. [ thinks for awhile ... ] OTOH, what if we pretended that two-level-nested schemas ARE catalogs in the sense that the SQL spec expects? Then we could get rid of the pro-forma special case here, which isn't ever likely to do anything more useful than throw an error anyway. Thus, we'd go back to the pre-7.3 notion that the current Postgres DB's name isn't part of the SQL naming scheme at all, and instead handle the spec's syntax requirements by setting up some conventions that make a schema act like what the spec says is a catalog. [...] I think this would be worse than not having nested schemas at all. It looks, feels, and smells like a hack. I think there should be a reasonable depth to schema nesting, but I think it should be much larger than 2. I think 8 is much more reasonable. One can argue that nested schemas are nothing more than syntactic sugar, and this is most definitely true. But as programming language design teaches us, syntactic sugar is everything. The better our tools can model our problem spaces, the better they can help us solve our problems. A way in which nested schemas are more than syntactic sugar is in the fact that they can provide a convenient means of additinoal security management. Rather than twiddling with the privileges on groups of objects within a schema, objects that should have similar privileges can be put in the same subschema. However, returning to the original topic of the thread, nested schemas are not nearly as interesting to me as the encapsulation provided by a package-like feature. To be honest, though, what tantalizes me is not the prospect of a package feature but an expansion of the Type system. As a reasonably popular production system, Postgres must necessarily be conservative. But its roots lay in experimentation, and vestiges of those roots can still be seen in its structure. Because of its maturity, Postgres is well positioned to implement some rather advanced concepts, but perhaps the most radical of them should be implemented in a fork rather than the main system. Traditionally, a database is seen as a warehouse of raw data. ODBMSes position themselves as the next generation by viewing a database as a collection of persistent, richly structured objects. Both views have strengths and weaknesses. Postgres takes an interesting middle ground position within the ORDBMS space. It is heavily relational with strong support for standard SQL and numerous query tuning options. But it also features an interesting number of rather non-relational concepts, like custom operator definitions, operator classes, user-defined conversions and types. However, it seems to me that these features are probably very underutilized. This is probably due to two reasons: 1) most programmers aren't used to being able to define custom operators in their favorite programming language, so the concept isn't familiar enough to them to try it in their DBMS. 2) The other features which support this aren't designed or presented in a cohesive manner that impresses the programmer that this is a compelling and superior way to go about things. The fact is, operator overloading is a *very* powerful way to program. In particular, it is one of the key factors in supporting generic programming in a natural way. People who are unsure
Re: [HACKERS] Oracle Style packages on postgres
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 10, 2005 11:42 PM To: Bruce Momjian Cc: Dave Held; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Oracle Style packages on postgres [...] There's been a lot of handwaving about nested schemas in this thread, but no one has explained how they could actually *work* given the SQL syntax rules. In general, a is a column from the current table set, a.b is a column b in table/alias a from the current query, a.b.c is a column c from table b in schema a, a.b.c.d is a column d from table c in schema b in catalog a, and any more than that is a syntax error. I do not see how to add nested schemas without creating unworkable ambiguities, not to say outright violations of the spec. Clearly nested schemas would violate the SQL spec, as do the numerous missing features in Postgres. Obviously, they would have to be a sort of non-conforming extension. It's an opportunity for Postgres to take the lead and influence the next standard, I guess. Unless the community decides that it's not worth the hassle, which seems much more likely. I am curious to know what the unworkable ambiguities are. I propose that if there is any ambiguity at all, just fail the parse and leave it to the user to write something sensible. Otherwise, it's just a matter of defining a precise precedence for resolving name scopes, which doesn't seem very tricky at all. That is, if a.b is the name of a schema b nested within a schema a, then a.b.c.d refers to a column d of table c in schema b in schema a. If a is not the name of a schema, then check to see if it's the name of a database. If it is, then a.b.c.d has the meaning you define above. If it's not, then it's an error. The rule is simple: when the identifier has more than two parts, search for the first part among the schemas first, and then the catalogs. For the parts after the first and before the last two, just search the appropriate schemas. As far as I can tell, this syntax is completely backwards-compatible with existing SQL syntax. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Oracle Style packages on postgres
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 10:55 AM To: Dave Held Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Oracle Style packages on postgres Dave Held [EMAIL PROTECTED] writes: The rule is simple: when the identifier has more than two parts, search for the first part among the schemas ^^^ first, and then the catalogs. This doesn't actually work, because there is already ambiguity as to which level the first name is. See for instance the comments in transformColumnRef(). I don't follow. switch (numnames) case 3 is unambiguous under either syntax. case 1 and 2 are unchanged under my proposed rules. It's really only case 4+ that is affected. And the change is as follows: if (numnames MAX_SCHEMA_DEPTH + 3) { ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg(improper qualified name (too many dotted names): %s, NameListToString(cref-fields; return NULL; } switch (numnames) { case 1: ... case 2: ... case 3: ... default: { char* name[MAX_SCHEMA_DEPTH + 3]; char** i; char** end = name + numnames; char* colname = name + numnames - 1; for (i = name; i != end; ++i) { /* definition of lnth() should be easy enough to infer */ *i = strVal(lnth(cref-fields)); } /* * We check the catalog name and then ignore it. */ if (!isValidNamespace(name[0])) { if (strcmp(name[0], get_database_name(MyDatabaseId)) != 0) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg(cross-database references are not implemented: %s, NameListToString(cref-fields; i = name + 1; numnames -= 3; } else { i = name; numnames -= 2;} /* * isValidNamespace() should work like LookupExplicitNamespace() * except that it should return false on failure instead of * raising an error */ /* Whole-row reference? */ if (strcmp(end[-1], *) == 0) { node = transformWholeRowRef(pstate, i, numnames, end[-2]); break; } /* * Here I've changed the signature of transformWholeRowRef() to * accept a char** and an int for the schema names */ /* Try to identify as a twice-qualified column */ node = qualifiedNameToVar(pstate, i, numnames, end[-1], true); /* * And obviously we have to hack qualifiedNameToVar() similarly */ if (node == NULL) { /* Try it as a function call */ node = transformWholeRowRef(pstate, i, numnames, end[-2]); node = ParseFuncOrColumn(pstate, list_make1(makeString(end[-1])), list_make1(node), false, false, true); } break; } } What am I missing? __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Oracle Style packages on postgres
-Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 10, 2005 8:43 AM To: Thomas Hallgren Cc: Tom Lane; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Oracle Style packages on postgres [...] I suppose. I think we should focus on the use cases for Oracle packages, rather than the specific functionality it provides. What things do people need PostgreSQL to do that it already doesn't do? Is that really the best way to go about things? Already RDBMSes are patchwork quilts of functionality. Is merely adding another patch the most elegant way to evolve the database? The problem is that Oracle et al are trying to be ORDBMSes and aren't exactly sure what the best way to go is. Instead of trying to formulate a rational plan for what an ORDBMS should even look like, they simply look at what would work with their existing infrastructure and tack on features. Then Postgres plays the copycat game. Instead of trying to play catch-up with Oracle, why not beat them at their own game? What packages provide is encapsulation. Hiding the data from the user and forcing him/her to use the public interface (methods). That is an important and admirable OO feature. Some people think that using the DB's security model can achieve the same thing. It can't, exactly, but there's an important lesson to be learned from the suggestion. The problem is that OOP is a *programming* paradigm, and a database is not a *programming language*. In a programming language, there really is no such thing as security. There is only visibility and accessibility. Private methods in an OOP language do not provide *security*; they only limit *accessibility*. Like so many other differences between the relational model and the OOP model, there is an impedance mismatch here. However, there is also opportunity. In an OOPL, you can say: Users can call this method from here, but not from there. What you *can't* say is: User X can call this method, but User Y cannot. As you can see, these are orthogonal concepts. You could call the first accessibility by location and the second accessibility by authentication. An ORDBMS should support both. Private does not respect your identity, only your calling location. An ACL does not respect your calling scope, only your identity. A system that has both is clearly more flexible than one that only has one or the other. Now what you need to keep in mind is that each visibility model serves a different purpose. The purpose of a security model is to limit *who* can see/touch certain data because the data has intrinsic value. The purpose of an accessibility model is to limit *where* and *how* data can be seen/touched in order to preserve *program invariants*. So if you have an object (or tuple!) that records the start and stop time of some process, it is probably a logical invariant that the stop time is greater than or equal to the start time. For this reason, in a PL, you would encapsulate these fields (attributes) and only provide controlled access to update them that checks and preserves the invariant, *no matter who you are*. You don't want a superuser violating this invariant any more than Sue User. Now you might object that constraints allow you to preserve invariants as well, and indeed they do. But constraints do not respect calling scope. Suppose there is a process that needs to update the timestamps in a way that temporarily breaks the invariant but restores it afterwards. The only way to effect this in a constraint environment is to drop the constraint, perform the operation, and restore it. However, dropping a constraint is not an ideal solution because there may be other unprivileged processes operating on the relation that still need the constraint to be enforced. There is no way to say: There is a priviledged class of methods that is allowed to violate this constraint because they are trusted to restore it upon completion. Note that this is different from saying There is a priviledged class of users that is allowed to violate this constraint. If you try to do something like give read-only access to everybody and only write access to one user and define that user to be the owner of the methods that update the data, you have to follow the convention that that user only operates through the defined interface, and doesn't hack the data directly. That's because user-level accessibility is not the same as scope- level accessibility. Whereas, if you define something like a package, and say: Package X is allowed full and complete access to relation Y, and stick the interface methods in X, you still have all the user-level security you want while preserving the invariants in the most elegant way. So you can think of a package as a scope in a programming language. It's like a user, but it is not a user. A user has privileges that cut across scopes. Now, whether packages should be
Re: [HACKERS] 'infinity' in GiST index
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 04, 2005 11:17 PM To: Oleg Bartunov Cc: Pgsql Hackers Subject: Re: [HACKERS] 'infinity' in GiST index [...] Seems like it's not really GiST's fault but a definitional problem for the timestamp datatype. Specifically, what does it mean to subtract two infinite timestamps? I find it hard to assign a value to any of these combinations: +infinity minus +infinity -infinity minus -infinity +infinity minus -infinity -infinity minus +infinity That's because you're talking about transfinite arithmetic, and subtraction is not defined therein. AKA the arithmetic of infinite cardinals. I've actually seen a few different formulations, some of which say that adding a finite number to an infinity results in a different number than the infinity, and some that say it is the original infinity. However, it seems that the most common formulation is the latter: w + 1 = w Where w is lower-case omega, or aleph_0. If we allowed subtraction, then we could subtract w from both sides and end up with 1 = 0, which would be an inconsistency. Also, -w makes sense when talking about the reals, but not when talking about transfinite arithmetic. There are no additive inverses, because that is the same as allowing subtraction, with the same result. Note that within real arithemtic, you can't do any math with infinity anyway. The first two can't really be identified with zero, and the last two are surely not representable are they? Not unless you change to another math system, which, of course, wouldn't be appropriate for this application. What's worse, a subtraction involving one infinite and one finite timestamp *is* well defined from a mathematical point of view, eg +infinity minus 'yesterday' = +infinity Actually not. When doing transfinite arithmetic, you can only add naturals to infinities. Otherwise, you're getting a form of subtraction, which will eventually lead to inconsistency. but I doubt GiST will be happy if we make the datatype behave that way... I guess it depends on why you want to take the difference. If you want to take some measure of distance, it might be useful to say that all infinite values of the same sign are at 0 distance from each other, in which case you would say that +w - +w = 0. Probably infinities of opposite signs should just be w apart (which is also mathematically consistent). __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Feature freeze date for 8.1
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 03, 2005 9:31 AM To: Hannu Krosing Cc: Heikki Linnakangas; Neil Conway; Oliver Jowett; [EMAIL PROTECTED]; Peter Eisentraut; Alvaro Herrera; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Feature freeze date for 8.1 [...] I am a tad worried about the possibility that if the client does nothing for long enough, the TCP output buffer will fill causing the backend to block at send(). A permanently blocked backend is bad news from a performance point of view (it degrades the sinval protocol for everyone else). So use MSG_DONTWAIT or O_NONBLOCK on the keepalive packets. That won't stop the buffer from getting filled up, but if you get an EAGAIN while sending a keepalive packet, you know the client is either dead or really busy. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement
-Original Message- From: Andrew Dunstan [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 7:05 PM To: [EMAIL PROTECTED] Cc: Dave Held; [EMAIL PROTECTED]; pgsql-hackers@postgresql.org Subject: Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement [...] I nat happy avout that last point - personally, I value most highly the views of those who contribute code or similar and least highly the views of those whose principal contribution is opinions. Maybe so, but if you were a new contributor, why would you write a bunch of code with no assurance that it would go anywhere? It seems wiser to invest your time familiarizing yourself with the ins and outs of the codebase and the coding style of patches by looking at other people's work. It also seems smarter to lurk and see what kinds of changes are likely to be considered. I doubt you would think highly of a newcomer that contributed code that was not in the style of the codebase and was for a feature not on the TODO list and that didn't get community buy-in first. But then, how do you get community buy-in if you don't contribute code, according to you? __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Feature freeze date for 8.1
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 03, 2005 12:39 PM To: Heikki Linnakangas Cc: Hannu Krosing; Neil Conway; Oliver Jowett; [EMAIL PROTECTED]; Peter Eisentraut; Alvaro Herrera; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Feature freeze date for 8.1 [...] BTW, the upthread proposal of just dropping the message (which is what O_NONBLOCK would do) doesn't work; it will lose encryption sync on SSL connections. How about an optional second connection to send keepalive pings? It could be unencrypted and non-blocking. If authentication is needed on the ping port (which it doesn't seem like it would need to be), it could be very simple, like this: * client connects to main port * server authenticates client normally * server sends nonce token for keepalive authentication * client connects to keepalive port * client sends nonce token on keepalive port * server associates matching keepalive connection with main connection * if server does not receive matching token within a small timeout, no keepalive support enabled for this session __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Feature freeze date for 8.1
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 03, 2005 3:36 PM To: Dave Held; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Feature freeze date for 8.1 [...] Yes, this looks like good.But ; 1. Do client interfaces (ODBC,JDBC OLEDB etc) need to be changed ? Only if they want to support the keepalive mechanism. It should be purely optional. 2. If a firewall is used, ppl need to know the second port number so mean that 2 parameters should be added to postgres the first is timeout value and the second is port number of the second port would be used for keepalive.. Sounds fine to me. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Feature freeze date for 8.1
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 03, 2005 4:20 PM To: Dave Held Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Feature freeze date for 8.1 Dave Held [EMAIL PROTECTED] writes: How about an optional second connection to send keepalive pings? It could be unencrypted and non-blocking. If authentication is needed on the ping port (which it doesn't seem like it would need to be), it could be very simple, like this: * client connects to main port * server authenticates client normally * server sends nonce token for keepalive authentication * client connects to keepalive port * client sends nonce token on keepalive port * server associates matching keepalive connection with main connection * if server does not receive matching token within a small timeout, no keepalive support enabled for this session This seems to have nothing whatever to do with the stated problem? I thought the problem was a server process that loses a connection to a client sticking around and consuming resources. And then I thought a possible solution was to try to see if the client is still alive by sending it an occasional packet. And then I thought a new problem is sending packets to an unresponsive client and filling up the output buffer and blocking the server process. So it seems that a possible solution to that problem is to have a separate connection for keepalive packets that doesn't block and doesn't interfere with normal client/server communication. Now granted, it is possible that the primary connection could die and the secondary is still alive. So let's consider the likely failure modes: * physical network failure In this case, I don't see how the secondary could survive while the primary dies. * client hangs or dies If the client isn't reading keepalives from the server, eventually the server's send queue will fill and the server will see that the client is unresponsive. The only way the client could fail on the primary while responding on the secondary is if it makes the connections in different threads, and the primary thread crashes somehow. At that point, I would hope that the user would notice that the client has died and shut it down completely. Otherwise, the client should just not create a separate thread for responding to keepalives. * transient network congestion It's possible that a keepalive could be delayed past the expiration time, and the server would assume that the client is dead when it's really not. Then it would close the client's connection rather rudely. But then, since there's no reliable way to tell if a client is dead or not, your other option is to consume all your connections on maybe-dead clients. So what am I missing? __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [pgsql-advocacy] [HACKERS] Increased company involvement
-Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: Saturday, April 30, 2005 12:04 PM To: PostgreSQL advocacy Cc: Kris Jurka; Andrew Dunstan; PostgreSQL-development Subject: Re: [pgsql-advocacy] [HACKERS] Increased company involvement [...] The thing that limits centralization is that it is critical that any individual or company feel free to join the community efforts. When centralization happens, there is often an _in_ and and _out_ group that is very bad for encouraging new members. [...] We don't want core to steer development anymore than we want a centralized group to do that, because if we did, the next company that comes along and wants to enhance PostgreSQL or offer technical support services will feel they have to get approval/buy-in from the _in_ group, and that isn't a productive setup. The fact that new companies getting involved can't find a central authority is a _good_ thing, if you think about it. It means that we have succeeded in building a community that allows people to join and feel a part right away, and they don't have to buy-in or play politics to do it. Well, you make Postgres sound like a very democratic community, but I'm afraid this is a fairy tale. Aren't the people who approve patches exactly the in group that you claim doesn't exist? Aren't they the people that you need buy-in from to really contribute to Postgres? The reason I make this point is because I know what a democratic development community really looks like, and the Boost community is one such example. That truly *is* democratic, because decisions are made as a group, and no fixed subset of members has an overriding veto. The group has moderators, but they exist only to moderate discussion on the mailing lists. I'm not saying that it is bad that Postgres is not democratic. Postgres is a totally different kind of beast than Boost, and probably benefits from having a few people ultimately decide its fate. But let's call a spade a spade and not pretend that contributors don't have to get buy-in from core. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [pgsql-advocacy] [HACKERS] Increased company involvement
-Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 12:17 PM To: PostgreSQL advocacy Cc: Dave Held; PostgreSQL-development Subject: Re: [pgsql-advocacy] [HACKERS] Increased company involvement [...] Really? You have a different perspective than I see. I have seen patches be accepted that had no core buy-in. We accept patches based on group feedback, not some closed approval process. Let me also ask for you to provide an example of the behavior you describe. Well, I think there's numerous examples where someone suggests some feature or idea, and Tom or one or two other core developers will say: I don't like that idea, and then the proposer will more or less give up on it because it is clear that it won't go anywhere. So whether the process gets stopped at the patch submission level or the feature proposal level isn't really relevant. It seems pretty clear that a handful of people decide the direction of Postgres, and everyone else can either contribute to the features that have been agreed to be acceptable and relevant, or they can fork their own version. Just watching the hackers list suggests to me that this is the norm, rather than the exception. I guess I'm interested to see which patches have been accepted that the core developers opposed. Now don't get me wrong. Sometimes there are good technical reasons why feature A or B can't or shouldn't be added or even developed. And I don't suggest that patches lacking technical merit should not be rejected. But sometimes it seems that ideas with undetermined merit get passed over because of a quick judgement based on intuition, and only if the proposer actively fights for it for a while does it get reconsidered. Of course, it would be quite a bit of work for me to review the list and compile instances where I think this has occurred, but only because of the tedium involved to make a minor point...not because I think I would have difficulty finding evidence. I'm just saying that as an outsider, if I had a lot of resources to devote to contributing to Postgres, I would only consider working on approved TODO items or making sure I more or less had core buy-in before writing any code. I don't think it would be worth my time to work on something that non-core users/developers might like but core hackers don't. Like I said, that's not necessarily a bad thing. Postgres is a piece of software with many interacting components, and there needs to be some coordination to make sure it evolves in a sensible way. But I think that implies that there must be and is some de facto centralization of control, whether that is the published ideology or not. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-advocacy] [HACKERS] Increased company involvement
-Original Message- From: Joshua D. Drake [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 12:33 PM To: Dave Held Cc: PostgreSQL-development; PostgreSQL advocacy Subject: Re: [pgsql-advocacy] [HACKERS] Increased company involvement [...] PostgreSQL is more of Democratic Republic than an actual democracy but they do very well at it. I buy that. It is probably a fairly accurate description of the Postgres community. Everyone has a voice, but ultimately, the Senate (i.e.: patch approvers) passes the laws. Where it differs is that the Senate is not necessarily democratically elected. ;) Any person can bring a patch and submit it, any person in the community can argue for it and any person can take the time to fix it to the specifications that core sets forth. Which brings up an important point. The core developers define the structure in which change can occur. If people think that Postgres should move in a direction that affects that framework, they have to convince core to redefine that specification. It's like writing new laws vs. amending the Constitution. Even though anyone can draft a bill and submit it to their representative, it's ultimately Congress that makes the laws. And while public opinion can ultimately affect the actions of Congress, it is still a sovereign body. As Bruce himself said, companies that wish to contribute must not assume that their work will be integrated into Postgres. The official stance is that there only needs to be community buy-in, but it seems more realistic that there needs to be core buy-in as well, at the least because of the influence that core thinking has on the community itself. That's not a bad thing per se, but it's definitely something that contributors should consider. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement
-Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 1:21 PM To: Bruce Momjian Cc: Marc G. Fournier; PostgreSQL advocacy; Dave Held; PostgreSQL-development Subject: Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement [...] Hmmm ... why does everyone assume that Core does more than what we do? I think that most people would be surprised by how *little* traffic there is on the pgsql-core mailing list. Well, I never said that core runs around saving the world. I mostly made the point that core developers have special influence, and that should be considered when contributing to Postgres, which is directly relevant to the point of the thread, which was originally called Increased company involvement. Core decides on releases, and approves committers. Occasionally we'll handle something which requires confidentiality, like a security issue or a new corporate participant. Which is also something that new would-be corporate contributors should know about. [...] Materially, what's accepted is decided through open discussion on the pgsql-hackers list; even Tom brings up his patches for discussion before commit, and I'd defy you to point to even one patch which was accepted by consensus on pgsql-hackers and not committed. But this misses the point. The point is that consensus is often an iterative process, and even if a few people support an idea at first, in the end, the weight of a few inner circle people (whether they be core or patch approvers or whatnot) tends to sway the consensus in a certain direction. This isn't always bad, especially if those core people simply know more about the internals of Postgres to have better judgement. It is bad if the person making the proposal doesn't feel he/she has good odds in defending the proposal and gives up without a fight. As you've already observed, if Tom doesn't like something it's very unlikely to get through.But that's true for a lot of major contributors; the consensus process we use provides ample opportunities to veto and slender opportunities to pass. This also misses another point. I'm not saying that the current process is inherently flawed. It's probably about as good as any OSS project. My point is that it's not *democratic*, and that outsiders wishing to contribute should understand the dynamic of the process that is not explicitly and officially spelled out anywhere. [...] From my perspective, this is a good thing for a database system which can get easily broken by an ill-considered patch. It's *good* for us to be development-conservative. Right. I agree. I'm not criticising the process as a whole, and I've more or less made this exact point myself. So there is an insider group, but it's the group of major contributors. That is exactly my point, but you said it better. Tom has the loudest voice because he writes the most code. The fact that Tom, Bruce or Peter's veto is often as far as a proposal goes is simply because most of the pgsql-hackers subscribers simply don't involve themselves in the process unless it's one of their own pet features. Which is perfectly understandable. You can probaby guess that most people who use Postgres haven't tried to implement an RDBMS themselves, and have only a shallow understanding of the details. And the important thing about the group of major contributors is that membership is open. Which may be true philosophically, but in practice, most people who contribute will not have the resources or motivation to become a major contributor. I do not mean to imply that this is necessarily a bad thing; but I think it is the true state of affairs, and part of the dynamic which must be understood by someone considering investing in Postgres as a contributor. [...] If people want the acceptance process to be more democratic, then those people have to be willing to do the work of full participation. That actually doesn't make it more democratic. In a democracy, everyone has an equal vote regardless of their status. The point is that a democracy is not always a priori the best form of organization. What you describe is actually a meritocracy, and for a project like Postgres, it makes a lot of sense. But that merely reinforces my point that contributors need to understand that if their pet feature they create is not in line with core thinking, they will have to earn the credibility to get community buy-in. [...] (P.S. on a complete tangent, call a spade a spade is actually a racist expression originating in the reconstruction-era South. spade does not mean garden tool but is a derogatory slang term for black people. [...] Interesting. Duly noted. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast
Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement
-Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 3:33 PM To: Dave Held Cc: PostgreSQL advocacy; PostgreSQL-development Subject: Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement [...] Here is a new FAQ entry: H3A name=1.131.13/A) Who controls PostgreSQL?BR PIf you are looking for a PostgreSQL gatekeeper, central committee, or controlling company, give up, because none exists. We do have a core committee and CVS committers, but these groups are more for administrative purposes then control. The project is directed by the open community of developers and users of PostgreSQL. Everyone is welcome to subscribe and take part in the discussions. (See the a href=http://www.postgresql.org/docs/faqs.FAQ_DEV.html; Developer's FAQ/A for information on how to get involved in PostgreSQL development.)/P Adjustments? ...are more for administrative purposes [then-than] control... pBecause PostgreSQL is a monolithic product, all of its features must work together in tight harmony. It is in the interests of the PostgreSQL community that new features be integrated in a way that preserves this harmony. Thus, new feature proposals are scrutinized and debated by the community to ensure that changes have sufficient technical merit. Be prepared to defend your proposal, and don't assume that a privately developed contribution will automatically be accepted by the PostgreSQL community. To maximize the chance of success in proposing a change, consider these suggestions: * Propose your change/feature publicly - OSS is about community, and a collection of contributors working independently without communication is not a community; this avoids duplication of effort and promotes collaboration/cooperation among parties that have a common interest * Research your proposal to see if it has already been discussed on the mailing list * Research your proposed solution to make sure it is the best of breed - database technology is a very active subject of academic research, and it is possible, if not likely, that someone has written a paper on the topic * Engage the community by participating in discussions and patch reviews - your credibility as a contributor depends on your willingness to contribute to the community in non-coding ways as well /p I realize that this runs a bit far afield from the original question of Who controls PostgreSQL?, but I think it addresses the points that someone who asks this question is likely to want to know. It also tackles the contribution question from a higher level than the dev-faq. Obviously, the bullet points would be formatted as a list or some other appropriate HTML construct. And as a minor point, it would be nice if the website validated to XHTML-strict, although XHTML-transitional would be a good compromise. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Monday, May 02, 2005 1:50 PM To: josh@agliodbs.com Cc: Bruce Momjian; Marc G. Fournier; PostgreSQL advocacy; Dave Held; PostgreSQL-development Subject: Re: [pgsql-advocacy] [HACKERS] Decision Process WAS: Increased company involvement [...] Our process is not democratic in the sense of any random subscriber to the mailing lists having the same vote as a core member --- and I'll bet Boost doesn't run things that way either. Actually, it does, but it can afford to for very special reasons. Because Boost is not about a single problem domain, there is no real core of developers. There are some who have contributed more libraries, or larger libraries; but at the end of the day, each review and submission is judged on its own merits. Often, the person submitting a new library for review is a domain expert for that library; and people reviewing a library are also often domain experts, even if they are a first-time reviewer. So the very nature of Boost allows it to be more democratic. Because Postgres is about a single problem domain, and because each submission must work in concert with an extant whole, it has totally different needs and a totally different type of community. And because a database isn't exactly a modular beast like, say, a web server, that limits the openness of the community further. That is to say, there is a barrier to entry, but it isn't capriciously imposed by the community members. It's just a necessary outcome of the nature of the project. People who want to contribute should understand this barrier and how it works before they start writing code. What we have is pretty informal but I think it effectively gives more weight to the opinions of those more involved in the project; which seems a good way to operate. For Postgres, I agree. But there isn't anyone here who has an absolute veto, nor contrarily anyone who can force things in unilaterally over strong objections. Nor would one expect such a thing in a project that claims to be OSS. But ultimately persuasion is as much a part of consensus as merit, and people should recognize that fact when contributing to the project. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
-Original Message- From: Gurmeet Manku [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 26, 2005 5:01 PM To: Simon Riggs Cc: Tom Lane; josh@agliodbs.com; Greg Stark; Marko Ristola; pgsql-perform; pgsql-hackers@postgresql.org; Utkarsh Srivastava; [EMAIL PROTECTED] Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested? [...] 2. In a single scan, it is possible to estimate n_distinct by using a very simple algorithm: Distinct sampling for highly-accurate answers to distinct value queries and event reports by Gibbons, VLDB 2001. http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf [...] This paper looks the most promising, and isn't too different from what I suggested about collecting stats over the whole table continuously. What Gibbons does is give a hard upper bound on the sample size by using a logarithmic technique for storing sample information. His technique appears to offer very good error bounds and confidence intervals as shown by tests on synthetic and real data. I think it deserves a hard look from people hacking the estimator. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
-Original Message- From: Greg Stark [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 1:00 AM To: Tom Lane Cc: Rod Taylor; Greg Stark; pgsql-hackers@postgresql.org; Gurmeet Manku Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested? Tom Lane [EMAIL PROTECTED] writes: Rod Taylor [EMAIL PROTECTED] writes: If when we have partitions, that'll be good enough. If partitions aren't available this would be quite painful to anyone with large tables -- much as the days of old used to be painful for ANALYZE. Yeah ... I am very un-enthused about these suggestions to make ANALYZE go back to doing a full scan ... I don't see why ANALYZE would always have to do a full scan. Clearly, this statistic would only be useful to people who need a very accurate n_distinct on tables where the current metric does not work very well. Applying a specialized solution to every table doesn't seem like an efficient way to go about things. Instead, the distinct sampling mechanism should be purely optional, and probably purely separate from the vanilla ANALYZE mechanism, because it works differently. If it were designed that way, the full table scan would be a one-time cost that would not even need to be paid if the user turned on this mechanism at table creation. Thereafter, the statistic would need to be updated incrementally, but that just distributes the cost of ANALYZE over the INSERT/UPDATE/DELETEs. Obviously, it's a higher cost because you touch every record that hits the table, but that's the price you pay for a good n_distinct. The block estimator should probably become the default, since it works within the current ANALYZE paradigm of sampling the data. [...] For most use cases users have to run vacuum occasionally. In those cases vacuum analyze would be no worse than a straight normal vacuum. And that's only if you do a full table scan every time. In the incremental implementation, there are no lump sum costs involved except when the statistic is first initialized. Note that this algorithm doesn't require storing more data because of the large scan or performing large sorts per column. It's purely O(n) time and O(1) space. And I think it should be emphasized that distinct sampling not only gives you a good n_distinct for query planning, it also gives you a very fast approximate answer for related aggregate queries. So you're getting more than just query tuning for that cost. On the other hand, if you have tables you aren't vacuuming that means you perform zero updates or deletes. In which case some sort of incremental statistics updating would be a good solution. A better solution even than sampling. Which, for the large data warehousing situations where it seems this mechanism would be most useful, this would probably be the most common case. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?
-Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 27, 2005 10:25 AM To: Andrew Dunstan Cc: Mischa Sandberg; pgsql-perform; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested? [...] Actually, it's more to characterize how large of a sample we need. For example, if we sample 0.005 of disk pages, and get an estimate, and then sample another 0.005 of disk pages and get an estimate which is not even close to the first estimate, then we have an idea that this is a table which defies analysis based on small samples. I buy that. Wheras if the two estimates are 1.0 stdev apart, we can have good confidence that the table is easily estimated. I don't buy that. A negative indication is nothing more than proof by contradiction. A positive indication is mathematical induction over the set, which in this type of context is logically unsound. There is no reason to believe that two small samples with a small difference imply that a table is easily estimated rather than that you got unlucky in your samples. [...] Yes, actually. We need 3 different estimation methods: 1 for tables where we can sample a large % of pages (say, = 0.1) 1 for tables where we sample a small % of pages but are easily estimated 1 for tables which are not easily estimated by we can't afford to sample a large % of pages. I don't buy that the first and second need to be different estimation methods. I think you can use the same block sample estimator for both, and simply stop sampling at different points. If you set the default to be a fixed number of blocks, you could get a large % of pages on small tables and a small % of pages on large tables, which is exactly how you define the first two cases. However, I think such a default should also be overridable to a % of the table or a desired accuracy. Of course, I would recommend the distinct sample technique for the third case. If we're doing sampling-based estimation, I really don't want people to lose sight of the fact that page-based random sampling is much less expensive than row-based random sampling. We should really be focusing on methods which are page-based. Of course, that savings comes at the expense of having to account for factors like clustering within blocks. So block sampling is more efficient, but can also be less accurate. Nonetheless, I agree that of the sampling estimators, block sampling is the better technique. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Woo hoo ... a whole new set of compiler headaches!!
-Original Message- From: Dann Corbit [mailto:[EMAIL PROTECTED] Sent: Friday, April 22, 2005 1:08 PM To: Andrew Dunstan; Dave Held Cc: pgsql-hackers@postgresql.org Subject: RE: [HACKERS] Woo hoo ... a whole new set of compiler headaches!! From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Andrew Dunstan Dave Held wrote: I see the smiley, but moving to C++ isn't just about switching to the latest fad language. No, it's about moving to the fad language about 2 generations back ... Except that C++ is hardly a fad language. Most estimates place the top 3 languages by number of programmers as C++, Java, and C, with C++ and Java switching positions, and C lagging behind by a decent margin. And it's been this way for quite a while. Language wars are about as much fun as operating system wars. C and C++ are both nice languages. My intent was not to start a language jihad. I don't think C is a bad language. I just think C++ is better. [...] Unless you did a major rewrite it's hard to see any great advantages. There doesn't need to be great advantages. Just enough to justify the effort. Take casts, for instance: C uses a single syntax for all types of cast. C++ breaks casting down into static_casts (for upcasting to a type for which the opposite cast would be an implicit conversion), dynamic_cast (for polymorphic types, which don't exist in the C++ sense in the Postgres codebase, by definition), const_cast (for casting away constness, or adding it), and reinterpret_cast (for dangerous bit twiddling where you'd better know the exact layout for each platform). Probably most of the casts in the Postgres codebase would be converted to static_cast. And if the types were upgraded over time, many of those casts could probably go away. The dangerous casts would remain marked as reinterpret_cast, and that would serve to highlight the portions of the code that are probably platform-dependent and that probably need inspection when porting to a new platform. While the benefits of using the C++-style casts would only be maximized if they were used everywhere, you still get incremental benefit from converting them a few at a time. Consider inline functions. In C, you have to implement them as macros, which eliminates your type safety. In C++, you can get both type safety and performance. Take a concrete example: qsort(). In C, you must pass a function pointer to use this function, and that function pointer gets dereferenced every time qsort() needs to do a comparison. That's a lot of overhead that is eliminated in C++'s sort() function, which accepts a comparison functor that can and often does get inlined. There are over 600,000 lines of code in Postgres by my rough count. The potential rewrite effort is enormous. A thorough job would probably consume a release cycle just on its own. You could use C++ as a better C with very little effort (but there are C++ keywords sprinkled here and there, so it would be a good month of work for somebody). My point exactly. It's *not* a task that has to be tackled all at once. Once you have a total C++ codebase, converting it into a C++ style can be done quite incrementally, with benefits accruing with each update. [...] On the downside, some of us (including me) have much more experience in and ease with writing C than C++. I could certainly do it - I write plenty of Java, so OO isn't a closed book to me, far from it - but ramping up in it would take me at least some effort. I bet I'm not alone in that. This is the crux of the matter. You will certainly not be alone here. I (personally) prefer C++ to C, but I am comfortable in either language. However, if you have a team of 100 C programmers and a huge C project, it is a terrible mistake to ask them to use C++. I disagree. The C programmers could learn C++ rules one at a time. The first rule would simply to be to not use C++ keywords as identifiers. That is really the minimum necessary to write C style code in a C++ program. The next might be to replace macro constants with const ints. The next might be to replace C-style casts with C++-style. There is really no need to throw the whole book at the developer community all at once. It might take a year or two to get the codebase into idiomatic C++, but the developers would have learned C++ quite easily without really noticing it. I would certainly not suggest something radical like replacing hand-rolled containers with standard library equivalents. *That's* the kind of rewrite that should give any coder nightmares. Even OOP-style encapsulation could be done incrementally. You take a few fields of some struct, make them private, add accessor functions, and update the references. You don't have to hide all the data all at once. I know, because I've upgraded lots of C code to C++, and it's not nearly as hard as the typical C programmer thinks
Re: [HACKERS] Woo hoo ... a whole new set of compiler headaches!! :)
-Original Message- From: Andrew Dunstan [mailto:[EMAIL PROTECTED] Sent: Friday, April 22, 2005 3:49 PM To: Dave Held Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Woo hoo ... a whole new set of compiler headaches!! :) [...] I recall saying something like this when we were being urged to replace CVS with SubVersion/Arch/SomethingElse but I'll say it again - this decision should be made by the people who contribute the most. All the rest (including my contribution) is just noise, IMHO. Well, I think it goes without saying that such a decision will ultimately be made by the core developers. But to say that nobody should *suggest* changes seems a bit odd to me. I mean, people suggest changes to the design of Postgres almost daily, and some of them aren't even coders. But if nobody outside of the core developers suggests changes, that kind of takes some of the open out of open source. True, the source would still be open, but there's a subtlety in the name that is similar to the free in free software. That's not to say that any given suggestion is worthy of serious consideration; but to say that any suggestion that doesn't come from the core is just noise to my ear doesn't sound any different than Redmond saying any suggestion that doesn't come from 1 Microsoft Way is just noise. As an aside, I don't think that switching to SVN is a half bad idea either. ; __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] argtype_inherit() is dead code
-Original Message- From: Jim C. Nasby [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 19, 2005 5:56 PM To: Christopher Browne Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] argtype_inherit() is dead code [...] On Sun, Apr 17, 2005 at 07:01:41PM -0400, Christopher Browne wrote: [...] Object Orientation is all about the notion of having data that is aware of its type, and where there can be a dispatching of methods against those types. There is already a perfectly functional ability to dispatch based on argument types. These essentials are there. Well, if you go with Bjarne Stroustrup's formulation of OOP (which is, of course, by no means exclusively authoritative), the core of OOP is encapsulation, inheritance, and polymorphism. Inherited tables provide the second, overloaded functions provide the third, but the security model is left providing the first. However, I would say that the first property is the most essential for OOP, because in my view, OOP is about *data hiding*. In particular, it's about separating the implementation from the interface, and forcing users to access objects through the interface. While such a design philosophy is *possible* with Postgres, it is by no means encouraged or *easy*. Furthermore, it probably doesn't make sense in all contexts. One way to think about an object-relational database is as a set of persistent objects stored in well-known containers. While traditional programming languages offer several common access methods for containers, the point of a query language is to offer an extremely powerful and generalized container access system. However, this access system is really an implementation detail of the object system, while at the same time being the primary means of object interaction. In terms of manipulating data, it's really about as OOP as passing around raw pointers to everything. From this perspective, DBs will not support OOP while SQL remains the primary access method; and there is no reason to believe that people will give up SQL in favor of a more OOP-like interface. Yes, but they're only there when it comes to storing data. There's nothing allowing you to cohesively combine code and data. I agree entirely. And I also agree that in many cases, there is no sensible way to do so. One of the ways in which DBs are different from programming language objects is in the data decomposition. Most PLs have self-contained objects whose data is primarily localized within one structure that is more or less contiguous in memory. DBs, on the other hand, tend to have objects that may span multiple tables, because this is the most efficient way of storing the data. In a way, the relational model is the antithesis of the OOP model. The central theme of the relational model is *data-sharing*. The idea that data should be decomposed and the common pieces factored out. Whereas, OOP says that it is the *functionality* that should be factored out into a minimal interface. An object should be able to have methods attached to it, for example. I don't think that's sufficient. To support encapsulation, you also need to enforce access to the data through the method interface. Else, you can simulate methods with stored procedures. And that functionality is essentially missing. There's no way to present a combined set data and code that operates on that data. That's encapsulation. And it's missing. But for a good reason. It doesn't really matter why this kind of functionality is missing; the fact that it is missing means it's much less likely that any of the OO stuff will be used. Actually, it *does* matter why it's missing. The reason it's missing tells us why people don't use the OOP features of the DB. What needs to be done is to construct a consistent theory of how the relational model and the OOP model can be integrated. The OOP model is about data integrity, maintaining object invariants, ensuring program correctness, etc. The relational model is about performance, storing data efficiently, querying it efficiently, etc. These are competing goals, and it may well be that a good object-relational theory simply develops a framework in which the tradeoffs are explicitly stated and describes how to implement different points in the design space in a consistent way. I realize that there is some existing work with object-relational modelling, but my impression is that such work is still fairly immature and scattered. I think the current limitations (foreign keys, and cross-table constraints) are issues as well. It might also help if the docs had some info about how inherited tables worked 'under the covers', so people knew what kind of overhead they implied. I don't think inherited tables work in an entirely intuitive way. It certainly doesn't help that viewing an inherited table through pgAdmin shows records that aren't returned by an equivalent query. I think the problem is that
Re: [HACKERS] pg_hba.conf
-Original Message- From: ElayaRaja S [mailto:[EMAIL PROTECTED] Sent: Monday, April 18, 2005 1:38 PM To: pgsql-hackers@postgresql.org Subject: [HACKERS] pg_hba.conf Hi, I am using Redhat linux 9. i had configure in pg_hba.conf as hostpostgres postgres 10.10.0.76 255.255.255.0 password If i try to connect with postgresql admin i am getting excpetion as An erro has occured: Error connecting to the server: could not connect to server: Connection refuesed(0x274D/10061) Is the server running on host 10.10.0.76 and accepting TCP/IP connections on port 5432? Please help me. The first bit of advice I can offer is to ask on the right list. A perusal of: http://www.postgresql.org/community/lists/ should indicate that the pgsql-admin list would be a good list to ask. If you're not sure what list would be best, it seems that psql-general would be a better default choice than -hackers. Second, the error message is quite informative. It says that the client doesn't think you have a server listening on the default port. Check your process list to make sure that you do. Check to make sure that you can connect to that host (try connecting to a different service on the same server). Check to make sure you are not getting blocked by a firewall. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] ARC patent
-Original Message- From: Marian POPESCU [mailto:[EMAIL PROTECTED] Sent: Friday, April 01, 2005 8:06 AM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] ARC patent Neil Conway [EMAIL PROTECTED] writes: FYI, IBM has applied for a patent on ARC (AFAICS the patent application is still pending, although the USPTO site is a little hard to grok): Ugh. We could hope that the patent wouldn't be granted, but I think it unlikely, unless Jan is aware of prior art (like a publication predating the filing date). I fear we'll have to change or remove that code. Why not just ask IBM for a free license first? After all, they put 1,000+ patents in the public domain or something, didn't they? I realize that they might use this technology in DB2, and don't want to encourage competitors. But IBM seems a lot more friendly to OSS than most companies, and it doesn't seem like it would hurt to ask. At the worst they say no and you just proceed as you would have originally. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] ARC patent
-Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: Friday, April 01, 2005 10:23 AM To: Dave Held Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] ARC patent Dave Held wrote: -Original Message- From: Marian POPESCU [mailto:[EMAIL PROTECTED] Sent: Friday, April 01, 2005 8:06 AM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] ARC patent Neil Conway [EMAIL PROTECTED] writes: FYI, IBM has applied for a patent on ARC (AFAICS the patent application is still pending, although the USPTO site is a little hard to grok): Ugh. We could hope that the patent wouldn't be granted, but I think it unlikely, unless Jan is aware of prior art (like a publication predating the filing date). I fear we'll have to change or remove that code. Why not just ask IBM for a free license first? After all, they put 1,000+ patents in the public domain or something, didn't they? I realize that they might use this technology in DB2, and don't want to encourage competitors. But IBM seems a lot more friendly to OSS than most companies, and it doesn't seem like it would hurt to ask. At the worst they say no and you just proceed as you would have originally. The problem is that they would have to license all commercial, closed-source distributions of PostgreSQL too, and I doubt they would do that. Why would they have to do that? Why couldn't they just give a license for OSS distributions of PostgreSQL, and make commercial distributions obtain their own license for the ARC code? Doesn't IBM hire lawyers exactly for the purpose of writing complicated legal documents of this nature? ; Or is it that the Postgres team wouldn't use an algorithm that wasn't freely available to everyone? __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Modifying COPY TO
Title: Modifying COPY TO I am interested in hacking COPY TO such that one can specify that rows are copied in a certain index order. I got as far as src/backend/commands/copy.c:CopyTo(), and it looks like I would need to modify the call to heap_beginscan() so that it uses a key. However, I couldn't figure out how to provide one, or if I'm even looking at the right area. Ideally, this behavior would be specified with a flag, perhaps: WITH INDEX index_name or WITH PRIMARY KEY or something similar. The motivation for this change is as follows. I have a fairly large database (10 million+ records) that mirrors the data in a proprietary system. The only access to that data is through exported flat files. Currently, those flat files are copied directly into a staging area in the db via a COPY FROM, the actual tables are truncated, and the staging data is inserted into the live tables. Since the data is read-only, it doesn't matter that it is recreated every day. However, as you can imagine, the import process takes quite a while (several hours). Also, rebuilding the db from scratch every day loses any statistical information gathered from the execution of queries during the day. A possibility that I would like to pursue is to keep the staging data from the previous day, do a COPY TO, import the new data into another staging table with a COPY FROM, then export the fresh data with another COPY TO. Then, I can write a fast C/C++ program to do a line-by-line comparison of each record, isolating the ones that have changed from the previous day. I can then emit those records in a change file that should be relatively small and easy to update. Of course, this scheme can only work if COPY TO emits the records in a reliable order. Any assistance on this project would be greatly appreciated. The best I can see, I'm stuck on line 1053 from copy.c: scandesc = heap_beginscan(rel, mySnapshot, 0, NULL); I suspect that I want it to look like this: scandesc = heap_beginscan(rel, mySnapshot, 1, key); where 'key' is an appropriately constructed ScanKey. It looks like I want to call ScanKeyEntryInitialize(), but I'm not sure what parameters I need to pass to it to get an index or the primary key. I mostly need help building the ScanKey object. I think I can figure out how to hack the custom option, etc. I should mention that I am using the 7.4.7 codebase on Linux 2.4. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129