Re: [HACKERS] tsearch filenames unlikes special symbols and numbers

2007-09-04 Thread Ben Tilly
On 9/4/07, Tom Lane <[EMAIL PROTECTED]> wrote:
[...]
> But on further thought it strikes me that insisting on all lower case
> doesn't eliminate case-sensitivity portability problems.  For instance,
> suppose the given parameter is 'foo' and the actual file name is
> Foo.dict.  This will work fine on Windows and will stop working when
> moved to Unix.  So I'm not sure we really buy much by rejecting
> upper-case letters in the parameter --- all we do is constrain which
> side of the fence you have to fix any mismatches on.  And we picked the
> side that only a DBA, rather than a plain SQL user, can fix.

True, only a DBA can fix it.  But only a DBA can screw it up.  That
seems reasonable to me.  Furthermore fixing this mistake at the plain
SQL user level in reality means auditing a code base for the
construct, which is never fun.

However if you wish to be paranoid, I believe that all filesystems of
interest to PostgreSQL are at least case preserving.  In which case on
case sensitive filesystems you could check that the case of the stored
filename matches what you want it to be.  Now the problem of the
filename having the case wrong can be detected on both Windows and
Unix.

Of course that check is a complication and slows things down.  If all
dictionary files have to be in a fixed directory, then you can easily
add a cron job that scans that directory and fixes the case of any
dictionary files that have upper case letters in their names.
(Beware, there was once a bug in Windows where renaming Foo to foo
accidentally deleted the file.  It is therefore safer to rename Foo to
bar then bar to foo.  However this is a moot point since I doubt that
anyone would actually run a brand new PostgreSQL database on an early
version of NT 4.0...)

Cheers,
Ben

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly
On 9/3/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Ben Tilly" <[EMAIL PROTECTED]> writes:
> > That raises a very random thought.  One of the nicer features of
> > Oracle is the ability to have function-based indexes.  So you could
> > index, say, trim(lower(person.name)).
>
> > Is there any prospect of postgres aquiring that functionality?
>
> Uh, no, since it's already there; has been since Berkeley days ...

Nice!

I know of at least one DBA who is moving from Oracle to postgres who
will be *very* happy to hear that.

Ben

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly
On 9/3/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
>
> "Kenneth Marshall" <[EMAIL PROTECTED]> writes:
>
> > On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
> >> Kenneth Marshall <[EMAIL PROTECTED]> writes:
> >> > ... This is the rough plan. Does anyone see anything critical that
> >> > is missing at this point?
> >>
> >> Sounds pretty good.  Let me brain-dump one item on you: one thing that
> >> hash currently has over btree is the ability to handle index items up
> >> to a full page.  Now, if you go with a scheme that only stores hash
> >> codes and not the underlying data, you can not only handle that but
> >> improve on it;
>
> I think that would be a big selling point for hash indexes. It would let you
> index even toasted data which are larger than a page. I'm not sure whether you
> can make it work for unique indexes though. But for non-unique indexes I think
> it would be a solid win and mean you cover a set of use cases quite distinct
> from btree indexes.
>
> > - Hash lookup is O(1) while btree is O(logN).
>
> That's not really true. There's a tradeoff between insertion time and lookup
> time. In order to get O(1) lookups you need to work pretty hard to maintain
> the hash table including spending a lot of time reorganizing it when you grow
> it. If you don't want to spend the time on inserts then you end up with
> buckets and the hash table is basically just a linear speedup to whatever
> algorithm you use to scan the buckets.

These facts notwithstanding, average insert performance remains O(1)
if you grow the hash exponentially every time it needs to be grown.
Suppose, for example, that you use a power of 2 arrangement.  Then the
worst case scenario, right after a split, is that all of your keys had
to be inserted, all had to be moved once, half had to be moved twice,
a quarter 3 times, etc.  So the ratio of moves to keys is 1 + 1/2 +
1/4 + ... which is a well-known geometric series converging on 2.

True, when you cross the threshold a lot of work needs to be done.
Life would be simpler if you could just put up a lock while you split
the hash.  You can't do that for a busy transactional database though.
 But if you want to be clever about it, you build into your hash
implementation the intelligence to be able to have 1 or 2 hash
locations to search.  When they are both present, all inserts go into
one of them, all deletes and updates are performed against both.  Then
you're able to have a background job reorganize your hash while the
database continues to use it.

> > - What about multi-column indexes? The current implementation
> >   only supports 1 column.
>
> That seems kind of weird. It seems obvious that you mix the three hashes
> together which reduces it to the solved problem.

That raises a very random thought.  One of the nicer features of
Oracle is the ability to have function-based indexes.  So you could
index, say, trim(lower(person.name)).  There are a *lot* of practical
situations where that comes in handy.  The best workaround that I can
think of for not having that is to have a column defined to hold the
result of the function, maintain that column with a trigger, then
index that column.  Which works, but is inelegant.  (It also requires
storing completely redundant data.)

Is there any prospect of postgres aquiring that functionality?

Ben

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] tsearch filenames unlikes special symbols and numbers

2007-09-03 Thread Ben Tilly
On 9/3/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> Gregory Stark <[EMAIL PROTECTED]> writes:
> > "Tom Lane" <[EMAIL PROTECTED]> writes:
> >> I'm not convinced that . is issue-free.  On most if not all versions of 
> >> Unix,
> >> you are allowed to open a directory as a file and read the filenames it
> >> contains.  While I don't say it'd be easy to manage that through
> >> tsearch, there's at least a potential for discovering the filenames
> >> present in . and .. --- how much do we care about that?
>
> > Actually I don't think that's true any more, most file systems on most 
> > Unixen
> > do not allow it. However it appears it's still the case for Solaris so it's
> > still a good point.
>
> Actually, now that I've woken up a bit more, it is not a problem as
> long as the tsearch code always appends some kind of file extension
> to what the user gives, such as ".dict".  It'll be impossible to name
> "." or ".." with that addition.

I don't know what you're discussing well enough to know if this is
relevant, but what you just said is not always true.  If there is any
way to pass arbitrary binary data into your function call, then
someone can pass in a string with nul in it.  When that hits the OS
API, your appended .dict won't be seen as part of the filename.

(This is a common security oversight when calling C APIs from
higher-level languages such as Perl.  See
http://artofhacking.com/files/phrack/phrack55/P55-07.TXT for more.)

[...]

Cheers,
Ben

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] tsearch filenames unlikes special symbols and numbers

2007-09-03 Thread Ben Tilly
On 9/3/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > On the other hand, this means the name has to be quoted if it would be
> > quoted as an SQL identifier, right?
>
> Something like that.  I wasn't planning on rejecting uppercase letters,
> though, which would be necessary if you wanted to be strict about
> matching unquoted identifiers.
>
> There seems fairly clear use-case for allowing A-Z a-z 0-9 and
> underscore (while CVS head rejects 0-9 and underscore).  There also seem
> to be good arguments for disallowing / \ : on various platforms, which
> leaves us with some other punctuation in question, as well as the whole
> matter of non-ASCII characters.  I'm not sure whether we want to touch
> the idea of non-ASCII; comments?

The problem with allowing uppercase letters is that on some
filesystems foo and Foo are the same file, and on others they are not.
 This may lead to obscure portability problems where code worked fine
on Unix and then fails when the database is running on Windows.

The approach that I'd suggest is allow a very restricted subset as an
immediate solution (say a-z and 0-9), and plan to later allow
arbitrary data to be passed in, then be encoded in some way before
hitting disk.  (And later need not be much later - such encodings are
not that hard to write.)

Cheers,
Ben

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/23/07, Florian G. Pflug <[EMAIL PROTECTED]> wrote:
> Ben Tilly wrote:
> > On 8/22/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote:
> >> On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
> >>
> >>> If your implementation accepts:
> >>>
> >>>   group by case when true then 'foo' end
> >> What would that mean? Regardless of whether or not it's accepted, it
> >> should have *some* meaning.
> >
> > To my eyes it has a very clear meaning, we're grouping on an
> > expression that happens to be a constant.  Which happens to be the
> > same for all rows.  Which is a spectacularly useless thing to actually
> > do, but the ability to do it happens to be convenient when I'm looking
> > for something to terminate a series of commas in a dynamically built
> > query.
>
> Which is the same very clear meaning that "group by 1" has - we're
> grouping on a expression which happens to be the constant 1. Hey,
> wait a second. This isn't what "group by 1" means at all - it
> rather means group by whatever the fist column in the select list is.

Which feature shocked me when I first saw it in Oracle.  It violated
every expectation that I have.

I also deliberately do NOT use that feature.  Because it is not safe
if someone else is possibly going to edit your query.  Add a field in
a natural place and, oops, your query just broke.  Also I hate
referring to things by position in code.  Particularly when they are
far away from each other as they may be in a large query.  (I've
written queries that are over 1000 lines long in the past.)

> So, yes, "group by 'foo'" *seems* to have a very clear meaning - but
> that clearness vanishes as soon as you take into account what "group by 1"
> means.

I'm happy to use "group by 'foo'::text" instead.  Anyone else in my
position will have to stumble on their own solution, but I don't think
there are that many in my position.

Cheers,
Ben

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/23/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote:
> On Aug 23, 2007, at 10:47 , Ben Tilly wrote:
[...]
> > Why does it seem like a bug to you?
> >
> > Turn it around, and tell me in what way is its behaviour surprising to
> > someone who knows SQL.  You asked to group on something that is the
> > same for all rows.  That group by condition did nothing.  (Except
> > rendered the syntax valid when it might not have been.)  As I would
> > expect.
>
> Considering that I expect the GROUP BY clause to include only column
> references (or other groupings of column references), not
> expressions. Whether or not the value is the same, it surprises me
> that something other than a column reference is accepted at all. I
> hadn't realized this behavior was accepted in PostgreSQL, but I learn
> something new every day.

That's obviously a very different point of view than mine.  Over the
last decade of using relational databases I've always assumed that any
non-aggregate expression that is valid in a select is valid in a group
by, and I've only rarely been surprised by this.  (The only exceptions
that I can list off of the top of my head are that integer constants
can refer to columns by position, and text constants are not allowed
by postgres.)

> My non-rigorous way of thinking about GROUP BY is that it groups this
> listed columns when the values of the listed columns are the same. An
> expression that evaluates to anything other than a column name
> doesn't provide any information about which column to consider
> grouped, and expressions don't evaluate to column names, or
> identifiers in general. If I understand you correctly, a GROUP BY
> item that isn't a column name would be a value that's applied to all
> columns, and the actual value is irrelevant—different values don't
> change the result.

That's not quite how it works.  The better non-rigorous way of
thinking about it is that any non-aggregate function you can put in a
select is allowed in a group by.  So if I group by trim(foo.bar), I
will be grouping rows based on the distinct values of trim(foo.bar).
So the values 'baz', ' baz', 'baz ' and ' baz ' would all be rolled up
into one row in the group by query.  But the value 'blat' would wind
up in another row.

The case of a constant expression is the logical (if normally useless)
extension of this.

> So the only purpose it would serve would be to prevent a trailing
> comma from raising a syntax error: you'd still need to explicitly
> list the other columns (unless the implementation behavior is changed
> to extend the spec there as well). What this does is allow you to use
> something like this (psuedocode):

That is the only purpose of a constant expression is that.

> group_columns = [ 'foo', 'bar', 'baz' ]
>
> group_column_list = ''
> for col in group_columns { group_column_list += col + ',' } #
> group_column_list = "foo,bar,baz,"
>
> group_by_clause = "GROUP BY $group_column_list CASE WHEN TRUE THEN
> 'quux' END"
>
> rather than
>
> group_column_list = join group_columns, ',' # group_column_list =
> "foo,bar,baz"
> group_by_clause = "GROUP BY $group_column_list"
>
> I still feel I'm missing something. If that's it, it seems like
> something easy enough to handle in middleware. Sorry if it appears
> I'm being dense. I've definitely learned things in this thread.

That's mostly right.  However don't forget the group_columns might be
an empty list, and in that case you need to optionally not have a
group by clause at all.  (Yes, in some of my queries this is a very
real possibility.)

Plus a bit of context.  This comes up for me in reports which are
basically implemented as a series of queries using temp tables.  So
making the generation of SQL more convoluted significantly increases
the complexity of the code.  (Writing reports is most of my job, so I
get to write lots and lots of these.)

> > Furthermore ask yourself whether anyone who wrote that would likely
> > have written it by accident.
>
> I don't see what that has to do with anything. There are plenty of
> things I can write on purpose that would be nonsense. You might even
> consider my posts as prime examples :)

When you have a boundary case, sometimes you really want to answer the
question, "Is this case likely to be a result of confusion?"  Which
question is particularly relevant in this case because my strong
suspicion is that constant text expressions are banned in PostgreSQL
explicitly because of fears that they are a result of confusion.

Cheers,
Ben

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/23/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Chuck McDevitt" <[EMAIL PROTECTED]> writes:
> > Tom, it isn't just a case of "convenience".  When we are trying to
> > convert users from another database (say Oracle for example) to
> > PostgeSQL, one of the big stumbling blocks that slows down the work is
> > all the little query changes that people have to make
>
> Well, if you're trying to sell it on the grounds of Oracle
> compatibility, then it should actually *be* Oracle compatible.
> What exactly do they do about the default-alias problem?

To the best of my knowledge such subqueries are completely anonymous.
There is no way to explicitly refer to them unless you provide an
alias.  Which is exactly the solution that was proposed twice in this
thread, and has the further benefit of being forwards compatible with
any reasonable future standard.

As verification I asked a certified Oracle DBA.  His understanding is
that Oracle may choose to rewrite the query for you or not.  If it
does not rewrite the query, then it has an internal identifier but
there is no way you can get to it.

Cheers,
Ben

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/22/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote:
>
> On Aug 22, 2007, at 20:36 , Ben Tilly wrote:
[...]
> > I can well believe that the standard says that you must accept
> > subqueries with aliases.  But does it say you must reject subqueries
> > without aliases?  I strongly doubt that.
>
> If I'm reading my draft copy of the SQL:2003 spec right (and there's
> a good chance that I'm not, as it's not the easiest document for me
> to parse), aliases *are* required.

Again, it lays out very carefully the syntax that must be valid.  It
does not say that extensions to that syntax are not valid.  Every
database implements such extensions.

[...]
> > I have no clue what EnterpriseDB does.
>
> In case it wasn't clear, the reason I bring it up is that
> EnterpriseDB, while working from a PostgreSQL base, strives for
> Oracle compatibility.

I got the reference.  But I don't know what EnterpriseDB does - I've
never used it.

[...]
> >> AIUI, Integers are only allowed because the SQL standard explicitly
> >> allows you to refer to columns by the order they appear in the SELECT
> >> list. Otherwise the GROUP BY items need to be column names.
> >
> > Need to be?
> >
> > The SQL-92 standard is clear that you must accept a list of column
> > names.  It is also clear that a column name must be be of the form
> > field or table.field.
>
> The 2003 draft (same as above) seems to agree with the SQL92 standard:
>
[ large snippet of the draft elided ]
>
> There'd have to be a pretty strong reason to extend this, more than
> just a convenience, I should think.

It is already extended in postgres.  For pretty good reasons.

> >   In no way, shape or form does that allow having
> > terms like trim(foo.bar) in a group by.
> >
> > But every reasonable database that I know - including postgres -
> > allows that.
>
> Can you give an example of something like this working in PostgreSQL?
> I get an error when I try to use a text value in a GROUP BY clause.
> (Or are you referring specifically to the CASE expression corner case?)

This works in every reasonable database that I have tried it in:

  select trim(foo.bar), count(*)
  from foo
  group by trim(foo.bar)

And yes, I have tried it in postgres.

[...]
> > Postgres explicitly disallows a constant character expression.  But it
> > allows the constant case expression that I gave.  It would be nice for
> > me to not have to remember that very obscure and convoluted case.
>
> I agree, and would move that it should be disallowed if there isn't a
> reason for it to be maintained, for exactly the reason you give:
> there shouldn't be such convoluted, and obscure corner case.

There is an excellent reason to generally allow complex expressions in
group by statements, and that reason is that many useful and
reasonable queries won't work if you don't.  Such as the one I gave
above.

[...]
> > I don't know what the SQL spec says, but I know (having talked to
> > other developers) that many people would find it very nice.
>
> Since I had the spec open, I tried to look at this as well, though I
> must admit I found it very tough going.
>
> I think this is the key section:
>
> > 10.9 
> >
> > ...
> >
> > Syntax Rules
> >
> > ...
> >
> > 4) The argument source of an  is
> > Case:
> > a) If AF is immediately contained in a  > specification>, then a table or group of a grouped table as
> > specified in Subclause 7.10, "", and Subclause 7.12,
> > "".
> >
> > b) Otherwise, the collection of rows in the current row's window
> > frame defined by the window structure  descriptor identified by the
> >  that simply contains AF, as defined in Subclause
> > 7.11, "".
>
> Now the  stuff is pretty dense, and the  clause> is currently impenetrable for me, so I just looked at the
>  and  sections referenced above.

I'm not surprised that the window clause section is impenetrable to
you.  Window clauses are part of the definition of analytic functions,
which postgres does NOT yet implement.  However they are on the todo
list.

Speaking personally, analytic functions are the single feature from
Oracle that I've most missed when moving to postgres.  I would be
happy to explain what they are and how they should work either on the
list or off to anyone who is interested in implementing them.  However
they are a topic for another thread, and probably for another week.
(I'm out next week, and am loathe to open that can of worms just yet.)

However I will note as a practical matter that implementing analytic
functions will increase 

Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/23/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Ben Tilly" <[EMAIL PROTECTED]> writes:
> > ... But then I need this (possibly
> > empty) list to have a valid group by statement at the end.  In Oracle
> > I used to just write it like this:
>
> >   SELECT ...
> >   GROUP BY $possible_term_1
> > $possible_term_2
> > $possible_term_3
> > 'end of possible groupings'
>
> FWIW, that will work if you write it as
>
> 'end of possible groupings'::text

Ah, that is the solution that I'll move to.

> I'm disinclined to accept it as-is because (a) it's highly likely to be
> a mistake, and (b) there's no principled way to assign a datatype to the
> expression, if we are to interpret it as an expression.
>
> Basically anything but an unadorned constant will work there.

As long as that behaviour is guaranteed, I'm OK on this one.

Ben

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] SQL feature requests

2007-08-23 Thread Ben Tilly
On 8/22/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote:
>
> On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
>
> > If your implementation accepts:
> >
> >   group by case when true then 'foo' end
>
> What would that mean? Regardless of whether or not it's accepted, it
> should have *some* meaning.

To my eyes it has a very clear meaning, we're grouping on an
expression that happens to be a constant.  Which happens to be the
same for all rows.  Which is a spectacularly useless thing to actually
do, but the ability to do it happens to be convenient when I'm looking
for something to terminate a series of commas in a dynamically built
query.

> It's not equivalent to GROUP BY "foo"

I wouldn't want it to be.  Strings and identifiers are very different things.

[...]
> *This* seems like a bug:
> test=# select record_id
> , count(observation_id) as bar
> from observation
> group by record_id
> , case when true
>then 'foo'
>   end;
> record_id | bar
> ---+-
>   1 |   4
>   2 |   4
>   3 |   2
> (3 rows)

Why does it seem like a bug to you?

Turn it around, and tell me in what way is its behaviour surprising to
someone who knows SQL.  You asked to group on something that is the
same for all rows.  That group by condition did nothing.  (Except
rendered the syntax valid when it might not have been.)  As I would
expect.

Furthermore ask yourself whether anyone who wrote that would likely
have written it by accident.

Cheers,
Ben

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
On 8/22/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
>
> "Ben Tilly" <[EMAIL PROTECTED]> writes:
>
> > Hopefully this is the right place for a few feature requests that
> > would address some of the things that I've noticed in postgres.
> >
> > 1. Just a minor annoyance, but why must subqueries in FROM clauses
> > have an alias?  For instance suppose that I have an orders table, and
> > one of the fields is userid.  The following is unambiguous and is
> > legal in Oracle:
>
> Thank you, this is one of my top pet peeves but when I proposed changing it I
> was told nobody's complained. Now we have at least one user complaint, any
> others out there?

Always happy to complain. :-)

> > 2. Why is 'non-integer constant in GROUP BY' an error?
>
> Hm... I was a bit surprised by this warning myself. IIRC there was an
> implementation convenience issue.

If your implementation accepts:

  group by case when true then 'foo' end

how much harder can it be to accept:

  group by 'foo'

?

> > 3. How hard would it be to have postgres ignore aliases in group by
> > clauses?
>
> That sounds like a strange idea.

It is a strange idea, but it makes dynamically building queries
easier.  Right now I'm following a strategy of storing what I'm going
to insert in the select clause in one variable, and the group by
clause in another.  So I need 2 variables for each dynamic field that
I might choose to group by and want to have a custom name for.  With
this change I would only need one variable.

> > 4) Items 2 and 3 would both be made irrelevant if postgres did
> > something that I'd really, really would like.  Which is to assume that
> > a query without a group by clause, but with an aggregate function in
> > the select, should have an implicit group by clause where you group by
> > all non-aggregate functions in the select.
> >
> > For example
> >
> >   SELECT foo, count(*)
> >   FROM bar
> >
> > would be processed as:
> >
> >   SELECT foo, count(*)
> >   FROM bar
> >   GROUP BY foo
>
> I agree this would be convenient but it seems too scary to actually go
> anywhere. What would you group by in the case of:
>
> SELECT a+b, count(*) FROM bar
>
> Should it group by a,b or a+b ?

It should group by a+b.  Which is to say, every field in the select
clause that currently triggers an error because it isn't in the group
by clause.

> Also, this might be a bit shocking for MySQL users who are accustomed to
> MySQL's non-standard extension for the same syntax. There it's treated as an
> assertion that the columns are equal for all records in a group or at least
> that it doesn't matter which such value is returned, effectively equivalent to
> our DISTINCT ON feature.

I don't mind shocking MySQL users. ;-)

But seriously, if that objection is the barrier then I'd be happy to
see it be something that is explicitly turned on in the query.  For
instance:

  select autogroup bar, count(*) from foo

If that was available then I for one would type autogroup a lot more
often than group by.  After all autogroup is about as hard to type,
and I don't have to type the redundant list of fields in the group by.

Cheers,
Ben

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
On 8/22/07, Michael Glaesemann <[EMAIL PROTECTED]> wrote:
>
> On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
>
> > 1. Just a minor annoyance, but why must subqueries in FROM clauses
> > have an alias?
>
> It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
> does?

I can well believe that the standard says that you must accept
subqueries with aliases.  But does it say you must reject subqueries
without aliases?  I strongly doubt that.

I have no clue what EnterpriseDB does.

> > 2. Why is 'non-integer constant in GROUP BY' an error?
>
> >  This works for now:
> >
> >   case when true then true end
> >
> > but I don't know whether some future version of postgres might break
> > my code by banning that as well.
>
> The PostgreSQL developers generally tries hard to preserve backwards
> compatibility, so I doubt the case expression as you have it would go
> away (though I'm kind of surprised it's allowed). Am I wrong in
> thinking that Oracle would accept the same format PostgreSQL does? In

The reason for my comparing to Oracle is that I used to work at an
Oracle shop.  I now work at a postgres shop.  Portability is not my
issue, just the annoyances that I experienced moving from one to the
other.

As for whether that case expression would go away, that it is allowed
is such an obscure feature that I doubt anyone changing that code
would notice if it was removed.

> that case, couldn't you use whatever method works in PostgreSQL in
> Oracle? I haven't checked the SQL standard, but it seems unlikely
> it'd allow something like
>
> GROUP BY , , , ;

That's not what Oracle accepts  that postgres does not.  What Oracle accepts is:

  ...
  GROUP BY 'foo';

> AIUI, Integers are only allowed because the SQL standard explicitly
> allows you to refer to columns by the order they appear in the SELECT
> list. Otherwise the GROUP BY items need to be column names.

Need to be?

The SQL-92 standard is clear that you must accept a list of column
names.  It is also clear that a column name must be be of the form
field or table.field.  In no way, shape or form does that allow having
terms like trim(foo.bar) in a group by.

But every reasonable database that I know - including postgres - allows that.

The standard very wisely does not forbid extensions.  Every database
has extensions.  In some cases, such as allowing trim(foo.bar) in a
group by clause, some extensions are so common as to be a standard.
(I don't have a copy of any later standards so I don't know whether
that has since been explicitly allowed.)  Therefore the real question
is how much farther than the standard you go.

Postgres explicitly disallows a constant character expression.  But it
allows the constant case expression that I gave.  It would be nice for
me to not have to remember that very obscure and convoluted case.

> Both 1 and 2 seem to me to be places where Oracle is likely deviating
> from the standard. If you're targeting Oracle, then using Oracle-
> specific syntax might be warranted. If you're hoping to target more
> than one possible backend, I'd think it be better to use more
> portable syntax (e.g., SQL-standard syntax) than expecting other
> DBMSs to follow another's deviations. That's not to say PostgreSQL
> does not have non-standard syntax: in places, it does. But it does
> try to hew very closely to the standard.

The queries that I'm writing are not hoping to target more than one
database at one company.

> Again, I wonder what EnterpriseDB does in this case?

No clue.

> > 3. How hard would it be to have postgres ignore aliases in group by
> > clauses?  Per my comments above, I often build complex queries in
> > code.  I can't easily use the shortcut of referring to the select
> > column by number because the position is hard to determine.  So my
> > code has to copy the select terms.  But I can't copy them exactly
> > because the select terms include lots of "...as foo" clauses that are
> > not allowed in a group by.  So I have to store very similar terms to
> > use twice.
>
> Perhaps someone else knows what you're referring to here, but I'm
> having a hard time without an example. Here's what I *think* you're
> trying to say:

What I'm trying to say is that it would be convenient for me to be
able to write:

  select bar as "baz"
, count(*) as "some count"
  from foo
  group by bar as "baz"

That's not allowed right now because as is not allowed in a group by statement.

[...]
> > Which is to assume that
> > a query without a group by clause, but with an aggregate function in
> > the select, should have an implic

[HACKERS] SQL feature requests

2007-08-22 Thread Ben Tilly
Hopefully this is the right place for a few feature requests that
would address some of the things that I've noticed in postgres.

1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?  For instance suppose that I have an orders table, and
one of the fields is userid.  The following is unambiguous and is
legal in Oracle:

  SELECT order_count
, count(*) as people
  FROM (
  SELECT count(*) as order_count
  FROM orders
  GROUP BY userid
)
  GROUP BY order_count

It annoys me that it isn't legal in postgres.  (Yes, I know how to fix
the query.  But it still is an annoyance, and it comes up fairly often
in reporting purposes.)

2. Why is 'non-integer constant in GROUP BY' an error?

I find it inconvenient.  For reporting purposes I often have to
dynamically build queries in code.  An easy way to do that is just
interpolate in a set of possible statements which will either be empty
strings or have trailing commas.  But then I need this (possibly
empty) list to have a valid group by statement at the end.  In Oracle
I used to just write it like this:

  SELECT ...
  GROUP BY $possible_term_1
$possible_term_2
$possible_term_3
'end of possible groupings'

In postgres I either have to use a different strategy to build up
these strings, or else use a more complicated term to finish that off.
 This works for now:

  case when true then true end

but I don't know whether some future version of postgres might break
my code by banning that as well.

3. How hard would it be to have postgres ignore aliases in group by
clauses?  Per my comments above, I often build complex queries in
code.  I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine.  So my
code has to copy the select terms.  But I can't copy them exactly
because the select terms include lots of "...as foo" clauses that are
not allowed in a group by.  So I have to store very similar terms to
use twice.

It would be nice if I could just make the group by look like the
select, and have the (obviously irrelevant) aliases just be ignored.

4) Items 2 and 3 would both be made irrelevant if postgres did
something that I'd really, really would like.  Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.

For example

  SELECT foo, count(*)
  FROM bar

would be processed as:

  SELECT foo, count(*)
  FROM bar
  GROUP BY foo

If I write a query with an aggregate function in the select, better
than 95% of the time this is the group by clause that I want.  (This
email has one of the few exceptions.)  In the remaining cases I could
easily add the extra stuff in the group by to the select without
problems.  Therefore if postgres could just insert the obvious group
by clause in, I would never again write the words "group by" when
working with postgres.  And I predict that many other people would do
the same.

But it doesn't.  So when working with postgres, just like every other
database that I've used, I have to constantly type in group by clauses
with entirely redundant information.  (But they're not EXACTLY the
same as the select clauses that they are redundant with...)

Cheers,
Ben

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster