Re: [HACKERS] tsearch filenames unlikes special symbols and numbers
On 9/4/07, Tom Lane <[EMAIL PROTECTED]> wrote: [...] > But on further thought it strikes me that insisting on all lower case > doesn't eliminate case-sensitivity portability problems. For instance, > suppose the given parameter is 'foo' and the actual file name is > Foo.dict. This will work fine on Windows and will stop working when > moved to Unix. So I'm not sure we really buy much by rejecting > upper-case letters in the parameter --- all we do is constrain which > side of the fence you have to fix any mismatches on. And we picked the > side that only a DBA, rather than a plain SQL user, can fix. True, only a DBA can fix it. But only a DBA can screw it up. That seems reasonable to me. Furthermore fixing this mistake at the plain SQL user level in reality means auditing a code base for the construct, which is never fun. However if you wish to be paranoid, I believe that all filesystems of interest to PostgreSQL are at least case preserving. In which case on case sensitive filesystems you could check that the case of the stored filename matches what you want it to be. Now the problem of the filename having the case wrong can be detected on both Windows and Unix. Of course that check is a complication and slows things down. If all dictionary files have to be in a fixed directory, then you can easily add a cron job that scans that directory and fixes the case of any dictionary files that have upper case letters in their names. (Beware, there was once a bug in Windows where renaming Foo to foo accidentally deleted the file. It is therefore safer to rename Foo to bar then bar to foo. However this is a moot point since I doubt that anyone would actually run a brand new PostgreSQL database on an early version of NT 4.0...) Cheers, Ben ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Hash index todo list item
On 9/3/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Ben Tilly" <[EMAIL PROTECTED]> writes: > > That raises a very random thought. One of the nicer features of > > Oracle is the ability to have function-based indexes. So you could > > index, say, trim(lower(person.name)). > > > Is there any prospect of postgres aquiring that functionality? > > Uh, no, since it's already there; has been since Berkeley days ... Nice! I know of at least one DBA who is moving from Oracle to postgres who will be *very* happy to hear that. Ben ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Hash index todo list item
On 9/3/07, Gregory Stark <[EMAIL PROTECTED]> wrote: > > "Kenneth Marshall" <[EMAIL PROTECTED]> writes: > > > On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: > >> Kenneth Marshall <[EMAIL PROTECTED]> writes: > >> > ... This is the rough plan. Does anyone see anything critical that > >> > is missing at this point? > >> > >> Sounds pretty good. Let me brain-dump one item on you: one thing that > >> hash currently has over btree is the ability to handle index items up > >> to a full page. Now, if you go with a scheme that only stores hash > >> codes and not the underlying data, you can not only handle that but > >> improve on it; > > I think that would be a big selling point for hash indexes. It would let you > index even toasted data which are larger than a page. I'm not sure whether you > can make it work for unique indexes though. But for non-unique indexes I think > it would be a solid win and mean you cover a set of use cases quite distinct > from btree indexes. > > > - Hash lookup is O(1) while btree is O(logN). > > That's not really true. There's a tradeoff between insertion time and lookup > time. In order to get O(1) lookups you need to work pretty hard to maintain > the hash table including spending a lot of time reorganizing it when you grow > it. If you don't want to spend the time on inserts then you end up with > buckets and the hash table is basically just a linear speedup to whatever > algorithm you use to scan the buckets. These facts notwithstanding, average insert performance remains O(1) if you grow the hash exponentially every time it needs to be grown. Suppose, for example, that you use a power of 2 arrangement. Then the worst case scenario, right after a split, is that all of your keys had to be inserted, all had to be moved once, half had to be moved twice, a quarter 3 times, etc. So the ratio of moves to keys is 1 + 1/2 + 1/4 + ... which is a well-known geometric series converging on 2. True, when you cross the threshold a lot of work needs to be done. Life would be simpler if you could just put up a lock while you split the hash. You can't do that for a busy transactional database though. But if you want to be clever about it, you build into your hash implementation the intelligence to be able to have 1 or 2 hash locations to search. When they are both present, all inserts go into one of them, all deletes and updates are performed against both. Then you're able to have a background job reorganize your hash while the database continues to use it. > > - What about multi-column indexes? The current implementation > > only supports 1 column. > > That seems kind of weird. It seems obvious that you mix the three hashes > together which reduces it to the solved problem. That raises a very random thought. One of the nicer features of Oracle is the ability to have function-based indexes. So you could index, say, trim(lower(person.name)). There are a *lot* of practical situations where that comes in handy. The best workaround that I can think of for not having that is to have a column defined to hold the result of the function, maintain that column with a trigger, then index that column. Which works, but is inelegant. (It also requires storing completely redundant data.) Is there any prospect of postgres aquiring that functionality? Ben ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] tsearch filenames unlikes special symbols and numbers
On 9/3/07, Tom Lane <[EMAIL PROTECTED]> wrote: > Gregory Stark <[EMAIL PROTECTED]> writes: > > "Tom Lane" <[EMAIL PROTECTED]> writes: > >> I'm not convinced that . is issue-free. On most if not all versions of > >> Unix, > >> you are allowed to open a directory as a file and read the filenames it > >> contains. While I don't say it'd be easy to manage that through > >> tsearch, there's at least a potential for discovering the filenames > >> present in . and .. --- how much do we care about that? > > > Actually I don't think that's true any more, most file systems on most > > Unixen > > do not allow it. However it appears it's still the case for Solaris so it's > > still a good point. > > Actually, now that I've woken up a bit more, it is not a problem as > long as the tsearch code always appends some kind of file extension > to what the user gives, such as ".dict". It'll be impossible to name > "." or ".." with that addition. I don't know what you're discussing well enough to know if this is relevant, but what you just said is not always true. If there is any way to pass arbitrary binary data into your function call, then someone can pass in a string with nul in it. When that hits the OS API, your appended .dict won't be seen as part of the filename. (This is a common security oversight when calling C APIs from higher-level languages such as Perl. See http://artofhacking.com/files/phrack/phrack55/P55-07.TXT for more.) [...] Cheers, Ben ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] tsearch filenames unlikes special symbols and numbers
On 9/3/07, Tom Lane <[EMAIL PROTECTED]> wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > On the other hand, this means the name has to be quoted if it would be > > quoted as an SQL identifier, right? > > Something like that. I wasn't planning on rejecting uppercase letters, > though, which would be necessary if you wanted to be strict about > matching unquoted identifiers. > > There seems fairly clear use-case for allowing A-Z a-z 0-9 and > underscore (while CVS head rejects 0-9 and underscore). There also seem > to be good arguments for disallowing / \ : on various platforms, which > leaves us with some other punctuation in question, as well as the whole > matter of non-ASCII characters. I'm not sure whether we want to touch > the idea of non-ASCII; comments? The problem with allowing uppercase letters is that on some filesystems foo and Foo are the same file, and on others they are not. This may lead to obscure portability problems where code worked fine on Unix and then fails when the database is running on Windows. The approach that I'd suggest is allow a very restricted subset as an immediate solution (say a-z and 0-9), and plan to later allow arbitrary data to be passed in, then be encoded in some way before hitting disk. (And later need not be much later - such encodings are not that hard to write.) Cheers, Ben ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] SQL feature requests
On 8/23/07, Florian G. Pflug <[EMAIL PROTECTED]> wrote: > Ben Tilly wrote: > > On 8/22/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote: > >> On Aug 22, 2007, at 20:49 , Ben Tilly wrote: > >> > >>> If your implementation accepts: > >>> > >>> group by case when true then 'foo' end > >> What would that mean? Regardless of whether or not it's accepted, it > >> should have *some* meaning. > > > > To my eyes it has a very clear meaning, we're grouping on an > > expression that happens to be a constant. Which happens to be the > > same for all rows. Which is a spectacularly useless thing to actually > > do, but the ability to do it happens to be convenient when I'm looking > > for something to terminate a series of commas in a dynamically built > > query. > > Which is the same very clear meaning that "group by 1" has - we're > grouping on a expression which happens to be the constant 1. Hey, > wait a second. This isn't what "group by 1" means at all - it > rather means group by whatever the fist column in the select list is. Which feature shocked me when I first saw it in Oracle. It violated every expectation that I have. I also deliberately do NOT use that feature. Because it is not safe if someone else is possibly going to edit your query. Add a field in a natural place and, oops, your query just broke. Also I hate referring to things by position in code. Particularly when they are far away from each other as they may be in a large query. (I've written queries that are over 1000 lines long in the past.) > So, yes, "group by 'foo'" *seems* to have a very clear meaning - but > that clearness vanishes as soon as you take into account what "group by 1" > means. I'm happy to use "group by 'foo'::text" instead. Anyone else in my position will have to stumble on their own solution, but I don't think there are that many in my position. Cheers, Ben ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] SQL feature requests
On 8/23/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote: > On Aug 23, 2007, at 10:47 , Ben Tilly wrote: [...] > > Why does it seem like a bug to you? > > > > Turn it around, and tell me in what way is its behaviour surprising to > > someone who knows SQL. You asked to group on something that is the > > same for all rows. That group by condition did nothing. (Except > > rendered the syntax valid when it might not have been.) As I would > > expect. > > Considering that I expect the GROUP BY clause to include only column > references (or other groupings of column references), not > expressions. Whether or not the value is the same, it surprises me > that something other than a column reference is accepted at all. I > hadn't realized this behavior was accepted in PostgreSQL, but I learn > something new every day. That's obviously a very different point of view than mine. Over the last decade of using relational databases I've always assumed that any non-aggregate expression that is valid in a select is valid in a group by, and I've only rarely been surprised by this. (The only exceptions that I can list off of the top of my head are that integer constants can refer to columns by position, and text constants are not allowed by postgres.) > My non-rigorous way of thinking about GROUP BY is that it groups this > listed columns when the values of the listed columns are the same. An > expression that evaluates to anything other than a column name > doesn't provide any information about which column to consider > grouped, and expressions don't evaluate to column names, or > identifiers in general. If I understand you correctly, a GROUP BY > item that isn't a column name would be a value that's applied to all > columns, and the actual value is irrelevant—different values don't > change the result. That's not quite how it works. The better non-rigorous way of thinking about it is that any non-aggregate function you can put in a select is allowed in a group by. So if I group by trim(foo.bar), I will be grouping rows based on the distinct values of trim(foo.bar). So the values 'baz', ' baz', 'baz ' and ' baz ' would all be rolled up into one row in the group by query. But the value 'blat' would wind up in another row. The case of a constant expression is the logical (if normally useless) extension of this. > So the only purpose it would serve would be to prevent a trailing > comma from raising a syntax error: you'd still need to explicitly > list the other columns (unless the implementation behavior is changed > to extend the spec there as well). What this does is allow you to use > something like this (psuedocode): That is the only purpose of a constant expression is that. > group_columns = [ 'foo', 'bar', 'baz' ] > > group_column_list = '' > for col in group_columns { group_column_list += col + ',' } # > group_column_list = "foo,bar,baz," > > group_by_clause = "GROUP BY $group_column_list CASE WHEN TRUE THEN > 'quux' END" > > rather than > > group_column_list = join group_columns, ',' # group_column_list = > "foo,bar,baz" > group_by_clause = "GROUP BY $group_column_list" > > I still feel I'm missing something. If that's it, it seems like > something easy enough to handle in middleware. Sorry if it appears > I'm being dense. I've definitely learned things in this thread. That's mostly right. However don't forget the group_columns might be an empty list, and in that case you need to optionally not have a group by clause at all. (Yes, in some of my queries this is a very real possibility.) Plus a bit of context. This comes up for me in reports which are basically implemented as a series of queries using temp tables. So making the generation of SQL more convoluted significantly increases the complexity of the code. (Writing reports is most of my job, so I get to write lots and lots of these.) > > Furthermore ask yourself whether anyone who wrote that would likely > > have written it by accident. > > I don't see what that has to do with anything. There are plenty of > things I can write on purpose that would be nonsense. You might even > consider my posts as prime examples :) When you have a boundary case, sometimes you really want to answer the question, "Is this case likely to be a result of confusion?" Which question is particularly relevant in this case because my strong suspicion is that constant text expressions are banned in PostgreSQL explicitly because of fears that they are a result of confusion. Cheers, Ben ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SQL feature requests
On 8/23/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Chuck McDevitt" <[EMAIL PROTECTED]> writes: > > Tom, it isn't just a case of "convenience". When we are trying to > > convert users from another database (say Oracle for example) to > > PostgeSQL, one of the big stumbling blocks that slows down the work is > > all the little query changes that people have to make > > Well, if you're trying to sell it on the grounds of Oracle > compatibility, then it should actually *be* Oracle compatible. > What exactly do they do about the default-alias problem? To the best of my knowledge such subqueries are completely anonymous. There is no way to explicitly refer to them unless you provide an alias. Which is exactly the solution that was proposed twice in this thread, and has the further benefit of being forwards compatible with any reasonable future standard. As verification I asked a certified Oracle DBA. His understanding is that Oracle may choose to rewrite the query for you or not. If it does not rewrite the query, then it has an internal identifier but there is no way you can get to it. Cheers, Ben ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] SQL feature requests
On 8/22/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote: > > On Aug 22, 2007, at 20:36 , Ben Tilly wrote: [...] > > I can well believe that the standard says that you must accept > > subqueries with aliases. But does it say you must reject subqueries > > without aliases? I strongly doubt that. > > If I'm reading my draft copy of the SQL:2003 spec right (and there's > a good chance that I'm not, as it's not the easiest document for me > to parse), aliases *are* required. Again, it lays out very carefully the syntax that must be valid. It does not say that extensions to that syntax are not valid. Every database implements such extensions. [...] > > I have no clue what EnterpriseDB does. > > In case it wasn't clear, the reason I bring it up is that > EnterpriseDB, while working from a PostgreSQL base, strives for > Oracle compatibility. I got the reference. But I don't know what EnterpriseDB does - I've never used it. [...] > >> AIUI, Integers are only allowed because the SQL standard explicitly > >> allows you to refer to columns by the order they appear in the SELECT > >> list. Otherwise the GROUP BY items need to be column names. > > > > Need to be? > > > > The SQL-92 standard is clear that you must accept a list of column > > names. It is also clear that a column name must be be of the form > > field or table.field. > > The 2003 draft (same as above) seems to agree with the SQL92 standard: > [ large snippet of the draft elided ] > > There'd have to be a pretty strong reason to extend this, more than > just a convenience, I should think. It is already extended in postgres. For pretty good reasons. > > In no way, shape or form does that allow having > > terms like trim(foo.bar) in a group by. > > > > But every reasonable database that I know - including postgres - > > allows that. > > Can you give an example of something like this working in PostgreSQL? > I get an error when I try to use a text value in a GROUP BY clause. > (Or are you referring specifically to the CASE expression corner case?) This works in every reasonable database that I have tried it in: select trim(foo.bar), count(*) from foo group by trim(foo.bar) And yes, I have tried it in postgres. [...] > > Postgres explicitly disallows a constant character expression. But it > > allows the constant case expression that I gave. It would be nice for > > me to not have to remember that very obscure and convoluted case. > > I agree, and would move that it should be disallowed if there isn't a > reason for it to be maintained, for exactly the reason you give: > there shouldn't be such convoluted, and obscure corner case. There is an excellent reason to generally allow complex expressions in group by statements, and that reason is that many useful and reasonable queries won't work if you don't. Such as the one I gave above. [...] > > I don't know what the SQL spec says, but I know (having talked to > > other developers) that many people would find it very nice. > > Since I had the spec open, I tried to look at this as well, though I > must admit I found it very tough going. > > I think this is the key section: > > > 10.9 > > > > ... > > > > Syntax Rules > > > > ... > > > > 4) The argument source of an is > > Case: > > a) If AF is immediately contained in a > specification>, then a table or group of a grouped table as > > specified in Subclause 7.10, "", and Subclause 7.12, > > "". > > > > b) Otherwise, the collection of rows in the current row's window > > frame defined by the window structure descriptor identified by the > > that simply contains AF, as defined in Subclause > > 7.11, "". > > Now the stuff is pretty dense, and the clause> is currently impenetrable for me, so I just looked at the > and sections referenced above. I'm not surprised that the window clause section is impenetrable to you. Window clauses are part of the definition of analytic functions, which postgres does NOT yet implement. However they are on the todo list. Speaking personally, analytic functions are the single feature from Oracle that I've most missed when moving to postgres. I would be happy to explain what they are and how they should work either on the list or off to anyone who is interested in implementing them. However they are a topic for another thread, and probably for another week. (I'm out next week, and am loathe to open that can of worms just yet.) However I will note as a practical matter that implementing analytic functions will increase
Re: [HACKERS] SQL feature requests
On 8/23/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Ben Tilly" <[EMAIL PROTECTED]> writes: > > ... But then I need this (possibly > > empty) list to have a valid group by statement at the end. In Oracle > > I used to just write it like this: > > > SELECT ... > > GROUP BY $possible_term_1 > > $possible_term_2 > > $possible_term_3 > > 'end of possible groupings' > > FWIW, that will work if you write it as > > 'end of possible groupings'::text Ah, that is the solution that I'll move to. > I'm disinclined to accept it as-is because (a) it's highly likely to be > a mistake, and (b) there's no principled way to assign a datatype to the > expression, if we are to interpret it as an expression. > > Basically anything but an unadorned constant will work there. As long as that behaviour is guaranteed, I'm OK on this one. Ben ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] SQL feature requests
On 8/22/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote: > > On Aug 22, 2007, at 20:49 , Ben Tilly wrote: > > > If your implementation accepts: > > > > group by case when true then 'foo' end > > What would that mean? Regardless of whether or not it's accepted, it > should have *some* meaning. To my eyes it has a very clear meaning, we're grouping on an expression that happens to be a constant. Which happens to be the same for all rows. Which is a spectacularly useless thing to actually do, but the ability to do it happens to be convenient when I'm looking for something to terminate a series of commas in a dynamically built query. > It's not equivalent to GROUP BY "foo" I wouldn't want it to be. Strings and identifiers are very different things. [...] > *This* seems like a bug: > test=# select record_id > , count(observation_id) as bar > from observation > group by record_id > , case when true >then 'foo' > end; > record_id | bar > ---+- > 1 | 4 > 2 | 4 > 3 | 2 > (3 rows) Why does it seem like a bug to you? Turn it around, and tell me in what way is its behaviour surprising to someone who knows SQL. You asked to group on something that is the same for all rows. That group by condition did nothing. (Except rendered the syntax valid when it might not have been.) As I would expect. Furthermore ask yourself whether anyone who wrote that would likely have written it by accident. Cheers, Ben ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] SQL feature requests
On 8/22/07, Gregory Stark <[EMAIL PROTECTED]> wrote: > > "Ben Tilly" <[EMAIL PROTECTED]> writes: > > > Hopefully this is the right place for a few feature requests that > > would address some of the things that I've noticed in postgres. > > > > 1. Just a minor annoyance, but why must subqueries in FROM clauses > > have an alias? For instance suppose that I have an orders table, and > > one of the fields is userid. The following is unambiguous and is > > legal in Oracle: > > Thank you, this is one of my top pet peeves but when I proposed changing it I > was told nobody's complained. Now we have at least one user complaint, any > others out there? Always happy to complain. :-) > > 2. Why is 'non-integer constant in GROUP BY' an error? > > Hm... I was a bit surprised by this warning myself. IIRC there was an > implementation convenience issue. If your implementation accepts: group by case when true then 'foo' end how much harder can it be to accept: group by 'foo' ? > > 3. How hard would it be to have postgres ignore aliases in group by > > clauses? > > That sounds like a strange idea. It is a strange idea, but it makes dynamically building queries easier. Right now I'm following a strategy of storing what I'm going to insert in the select clause in one variable, and the group by clause in another. So I need 2 variables for each dynamic field that I might choose to group by and want to have a custom name for. With this change I would only need one variable. > > 4) Items 2 and 3 would both be made irrelevant if postgres did > > something that I'd really, really would like. Which is to assume that > > a query without a group by clause, but with an aggregate function in > > the select, should have an implicit group by clause where you group by > > all non-aggregate functions in the select. > > > > For example > > > > SELECT foo, count(*) > > FROM bar > > > > would be processed as: > > > > SELECT foo, count(*) > > FROM bar > > GROUP BY foo > > I agree this would be convenient but it seems too scary to actually go > anywhere. What would you group by in the case of: > > SELECT a+b, count(*) FROM bar > > Should it group by a,b or a+b ? It should group by a+b. Which is to say, every field in the select clause that currently triggers an error because it isn't in the group by clause. > Also, this might be a bit shocking for MySQL users who are accustomed to > MySQL's non-standard extension for the same syntax. There it's treated as an > assertion that the columns are equal for all records in a group or at least > that it doesn't matter which such value is returned, effectively equivalent to > our DISTINCT ON feature. I don't mind shocking MySQL users. ;-) But seriously, if that objection is the barrier then I'd be happy to see it be something that is explicitly turned on in the query. For instance: select autogroup bar, count(*) from foo If that was available then I for one would type autogroup a lot more often than group by. After all autogroup is about as hard to type, and I don't have to type the redundant list of fields in the group by. Cheers, Ben ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] SQL feature requests
On 8/22/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote: > > On Aug 22, 2007, at 18:45 , Ben Tilly wrote: > > > 1. Just a minor annoyance, but why must subqueries in FROM clauses > > have an alias? > > It's required by the SQL standard, AIUI. I wonder what EnterpriseDB > does? I can well believe that the standard says that you must accept subqueries with aliases. But does it say you must reject subqueries without aliases? I strongly doubt that. I have no clue what EnterpriseDB does. > > 2. Why is 'non-integer constant in GROUP BY' an error? > > > This works for now: > > > > case when true then true end > > > > but I don't know whether some future version of postgres might break > > my code by banning that as well. > > The PostgreSQL developers generally tries hard to preserve backwards > compatibility, so I doubt the case expression as you have it would go > away (though I'm kind of surprised it's allowed). Am I wrong in > thinking that Oracle would accept the same format PostgreSQL does? In The reason for my comparing to Oracle is that I used to work at an Oracle shop. I now work at a postgres shop. Portability is not my issue, just the annoyances that I experienced moving from one to the other. As for whether that case expression would go away, that it is allowed is such an obscure feature that I doubt anyone changing that code would notice if it was removed. > that case, couldn't you use whatever method works in PostgreSQL in > Oracle? I haven't checked the SQL standard, but it seems unlikely > it'd allow something like > > GROUP BY , , , ; That's not what Oracle accepts that postgres does not. What Oracle accepts is: ... GROUP BY 'foo'; > AIUI, Integers are only allowed because the SQL standard explicitly > allows you to refer to columns by the order they appear in the SELECT > list. Otherwise the GROUP BY items need to be column names. Need to be? The SQL-92 standard is clear that you must accept a list of column names. It is also clear that a column name must be be of the form field or table.field. In no way, shape or form does that allow having terms like trim(foo.bar) in a group by. But every reasonable database that I know - including postgres - allows that. The standard very wisely does not forbid extensions. Every database has extensions. In some cases, such as allowing trim(foo.bar) in a group by clause, some extensions are so common as to be a standard. (I don't have a copy of any later standards so I don't know whether that has since been explicitly allowed.) Therefore the real question is how much farther than the standard you go. Postgres explicitly disallows a constant character expression. But it allows the constant case expression that I gave. It would be nice for me to not have to remember that very obscure and convoluted case. > Both 1 and 2 seem to me to be places where Oracle is likely deviating > from the standard. If you're targeting Oracle, then using Oracle- > specific syntax might be warranted. If you're hoping to target more > than one possible backend, I'd think it be better to use more > portable syntax (e.g., SQL-standard syntax) than expecting other > DBMSs to follow another's deviations. That's not to say PostgreSQL > does not have non-standard syntax: in places, it does. But it does > try to hew very closely to the standard. The queries that I'm writing are not hoping to target more than one database at one company. > Again, I wonder what EnterpriseDB does in this case? No clue. > > 3. How hard would it be to have postgres ignore aliases in group by > > clauses? Per my comments above, I often build complex queries in > > code. I can't easily use the shortcut of referring to the select > > column by number because the position is hard to determine. So my > > code has to copy the select terms. But I can't copy them exactly > > because the select terms include lots of "...as foo" clauses that are > > not allowed in a group by. So I have to store very similar terms to > > use twice. > > Perhaps someone else knows what you're referring to here, but I'm > having a hard time without an example. Here's what I *think* you're > trying to say: What I'm trying to say is that it would be convenient for me to be able to write: select bar as "baz" , count(*) as "some count" from foo group by bar as "baz" That's not allowed right now because as is not allowed in a group by statement. [...] > > Which is to assume that > > a query without a group by clause, but with an aggregate function in > > the select, should have an implic
[HACKERS] SQL feature requests
Hopefully this is the right place for a few feature requests that would address some of the things that I've noticed in postgres. 1. Just a minor annoyance, but why must subqueries in FROM clauses have an alias? For instance suppose that I have an orders table, and one of the fields is userid. The following is unambiguous and is legal in Oracle: SELECT order_count , count(*) as people FROM ( SELECT count(*) as order_count FROM orders GROUP BY userid ) GROUP BY order_count It annoys me that it isn't legal in postgres. (Yes, I know how to fix the query. But it still is an annoyance, and it comes up fairly often in reporting purposes.) 2. Why is 'non-integer constant in GROUP BY' an error? I find it inconvenient. For reporting purposes I often have to dynamically build queries in code. An easy way to do that is just interpolate in a set of possible statements which will either be empty strings or have trailing commas. But then I need this (possibly empty) list to have a valid group by statement at the end. In Oracle I used to just write it like this: SELECT ... GROUP BY $possible_term_1 $possible_term_2 $possible_term_3 'end of possible groupings' In postgres I either have to use a different strategy to build up these strings, or else use a more complicated term to finish that off. This works for now: case when true then true end but I don't know whether some future version of postgres might break my code by banning that as well. 3. How hard would it be to have postgres ignore aliases in group by clauses? Per my comments above, I often build complex queries in code. I can't easily use the shortcut of referring to the select column by number because the position is hard to determine. So my code has to copy the select terms. But I can't copy them exactly because the select terms include lots of "...as foo" clauses that are not allowed in a group by. So I have to store very similar terms to use twice. It would be nice if I could just make the group by look like the select, and have the (obviously irrelevant) aliases just be ignored. 4) Items 2 and 3 would both be made irrelevant if postgres did something that I'd really, really would like. Which is to assume that a query without a group by clause, but with an aggregate function in the select, should have an implicit group by clause where you group by all non-aggregate functions in the select. For example SELECT foo, count(*) FROM bar would be processed as: SELECT foo, count(*) FROM bar GROUP BY foo If I write a query with an aggregate function in the select, better than 95% of the time this is the group by clause that I want. (This email has one of the few exceptions.) In the remaining cases I could easily add the extra stuff in the group by to the select without problems. Therefore if postgres could just insert the obvious group by clause in, I would never again write the words "group by" when working with postgres. And I predict that many other people would do the same. But it doesn't. So when working with postgres, just like every other database that I've used, I have to constantly type in group by clauses with entirely redundant information. (But they're not EXACTLY the same as the select clauses that they are redundant with...) Cheers, Ben ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster