Re: [HACKERS] [SQL] Case Preservation disregarding case

2006-11-06 Thread Chuck McDevitt


-Original Message-
From: Stephan Szabo [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 31, 2006 10:23 AM
To: Chuck McDevitt
Cc: Tom Lane; beau hargis; pgsql-sql@postgresql.org;
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case

On Tue, 31 Oct 2006, Chuck McDevitt wrote:

> We treated quoted identifiers as case-specific, as the spec requires.
>
> In the catalog, we stored TWO columns... The column name with case
> converted as appropriate (as PostgreSQL already does), used for
looking
> up the attribute,
> And a second column, which was the column name with the case exactly
as
> entered by the user.

Wouldn't using that second column's value tend to often violate 5.2SR10
(at least that's the reference item in SQL92)? AFAICT, that rule
basically
says that the regular identifier is equivalent to the case-folded one
for
purposes of information and definition schema and similar purposes which
seems like it would be intended to include things like column labeling
for
output. There's a little bit of flexibility there on both similar
purposes
and equivalence, though.


Equivalent, yes.  But I can interpret that clause it mean I can show
either the case folded or non-case-folded value in the information
schema, as they are equivalent.

Anyway, we have many things that are "enhancements" beyond the spec, and
this could also be considered an enhancement.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [SQL] Case Preservation disregarding case

2006-11-06 Thread Chuck McDevitt
Sorry, my last mail wasn't well thought out.  Yes, the
information_schema needs the case-folded name (although it might be ok
to add additional columns to the information_schema for extra
information).

But, stepping back from all that, what is it the users want?

1)  When re-creating a CREATE TABLE statement from whatever catalog
info, they'd like the names to come back exactly as then entered them.
If I do:
 CREATE TABLE BobsTable (WeeklySales numeric(10,2),
"SomeStrangeName" int);

  They'd like to see exactly that when the CREATE TABLE gets
re-created, not what we do now:

   CREATE TABLE bobstable (weeklysales numeric(10,2),
"SomeStrangeName" int);

2)  When doing reports, they'd like the name as entered to be the title
of the column:
Select * from bobstable;  

  Would be nice if they saw this:
  WeeklySalesSomeStrangeName
  ------

   
For compatibility with existing apps and spec compliance, you'd still
want PQfname() to return the case-folded name.
But there isn't any reason you couldn't also return a "suggested title"
field (PQftitle?) which preserves the user's case.

You could also extend the concept of a PQftitle to make nicer titles for
expressions.  Instead of 
SELECT sum(WeeklySales) from BobsTable;

Producing "?column?" or somesuch to use in the report, it could return a
title like "sum(WeeklySales)"

-Original Message-----
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 31, 2006 10:38 PM
To: Chuck McDevitt
Cc: Stephan Szabo; beau hargis; pgsql-sql@postgresql.org;
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case

"Chuck McDevitt" <[EMAIL PROTECTED]> writes:
> Equivalent, yes.  But I can interpret that clause it mean I can show
> either the case folded or non-case-folded value in the information
> schema, as they are equivalent.

Well, that's an interesting bit of specs-lawyering, but I don't see
how you can defend it against these rules in SQL99 5.2:

21) For every  IB there is exactly one
corresponding case-normal form CNF. CNF is an 
derived from IB as follows.

Let n be the number of characters in IB. For i ranging from
1
(one) to n, the i-th character M(i) of IB is translated into
the
corresponding character or characters of CNF as follows.

Case:

a) If M(i) is a lower case character or a title case
character
  for which an equivalent upper case sequence U is defined
by
  Unicode, then let j be the number of characters in U; the
  next j characters of CNF are U.

b) Otherwise, the next character of CNF is M(i).

22) The case-normal form of the  of a  is used for purposes such as and including
determination of identifier equivalence, representation in
the Definition and Information Schemas, and representation
in
diagnostics areas.

NOTE 44 - Any lower-case letters for which there are no
upper-
case equivalents are left in their lower-case form.

Again, obviously we are not compliant because we fold to lower rather
than upper case, but I do not see how you can read (22) as not requiring
the information schema to show the upper-cased form.  The output of
functions such as PQfname() might be considered closer to diagnostics
info than information schema, but that's covered too.

But the really serious problem with what you propose is that it would
allow two table columns with names that the system considers distinct
to show as the same string in the information schema and diagnostic
outputs.  That can't be acceptable --- it's going to break any
application that does any nontrivial analysis of what it sees there,
not to mention that it violates various primary key constraints in
the information schema specification.

regards, tom lane



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [SQL] Case Preservation disregarding case

2006-11-06 Thread Chuck McDevitt
Oh... And Microsoft SQLServer does something similar.

At Greenplum, we've already gotten complaints from customers about this
when they were switching from MSSQL to GP's PostgreSQL-based database.

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 30, 2006 10:35 PM
To: Chuck McDevitt
Cc: beau hargis; pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case
sensitivity?

"Chuck McDevitt" <[EMAIL PROTECTED]> writes:
> At Teradata, we certainly interpreted the spec to allow
case-preserving,
> but case-insensitive, identifiers.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [SQL] Case Preservation disregarding case

2006-11-06 Thread Chuck McDevitt
We treated quoted identifiers as case-specific, as the spec requires.

In the catalog, we stored TWO columns... The column name with case
converted as appropriate (as PostgreSQL already does), used for looking
up the attribute,
And a second column, which was the column name with the case exactly as
entered by the user.

So, your example would work just fine.


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 30, 2006 10:35 PM
To: Chuck McDevitt
Cc: beau hargis; pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case
sensitivity?

"Chuck McDevitt" <[EMAIL PROTECTED]> writes:
> At Teradata, we certainly interpreted the spec to allow
case-preserving,
> but case-insensitive, identifiers.

Really?

As I see it, the controlling parts of the SQL spec are (SQL99 sec 5.2)

26) A  and a  are
equivalent if the  of the 
(with every letter that is a lower-case letter replaced by
the
corresponding upper-case letter or letters) and the
 of the  (with all
occurrences of  replaced by  and all
occurrences of  replaced by ),
considered as the repetition of a 
that specifies a  of
SQL_IDENTIFIER
and an implementation-defined collation that is sensitive to
case, compare equally according to the comparison rules in
Subclause 8.2, "".

27) Two s are equivalent if their
s, considered as the repetition of a
 that specifies a 
of SQL_IDENTIFIER and an implementation-defined collation
that is sensitive to case, compare equally according to the
comparison rules in Subclause 8.2, "".

Note well the "sensitive to case" bits there.  Now consider

CREATE TABLE tab (
"foobar" int,
"FooBar" timestamp,
"FOOBAR" varchar(3)
);

We can *not* reject this as containing duplicate column names, else we
have certainly violated rule 27.  Now what will you do with

SELECT fooBar FROM tab;

?  The spec is unquestionably on the side of "you selected the varchar
column"; historical Postgres practice is on the side of "you selected
the int column".  AFAICS a case-insensitive approach would have to
fail with some "I can't identify which column you mean" error.  I am
interested to see where you find support for that in the spec...

regards, tom lane



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [SQL] Case Preservation disregarding case

2006-11-06 Thread Chuck McDevitt
At Teradata, we certainly interpreted the spec to allow case-preserving,
but case-insensitive, identifiers.
Users really liked it that way:  If you re-created a CREATE TABLE
statement from the catalog, you could get back exactly the case the user
had entered, but people using the table didn't need to worry about case.
And column titles in reports would have the nice case preserving
information.
Sort of like how Windows systems treat file names... The case is
preserved, but you don't need to know it to access the file.

I know UNIX users usually think "case-preserving with case-insensitive"
a foreign concept, but that doesn't mean the average user feels the
same.

If I want my column named "WeeklyTotalSales", I really don't want to
have to always quote it and type in the exact case.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
Sent: Monday, October 30, 2006 7:24 PM
To: beau hargis
Cc: pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case
sensitivity?

beau hargis <[EMAIL PROTECTED]> writes:
> Considering the differences that already exist between database
systems and 
> their varying compliance with SQL and the various extensions that have
been 
> created, I do not consider that the preservation of case for
identifiers 
> would violate any SQL standard.

That's not how I read the spec.  It is true that we are not 100% spec
compliant, but that isn't a good argument for moving further away from
spec.  Not to mention breaking backwards compatibility with our
historical behavior.  The change you propose would fix your application
at the cost of breaking other people's applications.   Perhaps you
should consider fixing your app instead.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings