Re: indentifier case

2002-03-08 Thread Steffen Goeldner

Jeff Zucker wrote:
> 
> Tim Bunce wrote:
> >
> > On Mon, Mar 04, 2002 at 08:50:45AM -0800, Jeff Zucker wrote:
> >
> > > Hmm, I just checked the SQL92 standard and if I'm reading it correctly
> > > the situation is ugly.  Section 5.2.13 appears to say that a regular
> > > identifier and a delimited identifier are equivalent if the *upper case*
> > > version of the regular identifier compares *in a case sensitive manor*
> > > to the delimited identifier.
> >
> > Got a URL?
> 
> It's from the PDF of SQL92 I purchased from www.ansi.org.  But here's
> the text:
> 
>   13)
>   A  and a  are equiva-
>   lent if the  of the  (with
>   every letter that is a lower-case letter replaced by the equiva-
>   lent upper-case letter or letters) and thebody> of the  (with all occurrences of
>replaced by  and all occurrences ofblequote symbol> replaced by ), considered as
>   the repetition of a  that specifies a
>of SQL_TEXT and an implementation-
>   defined collation that is sensitive to case, compare equally
>   according to the comparison rules in Subclause 8.2, "   predicate>".

And SQL3 - Part 11: Information and Definition Schemas (SQL/Schemata)

  

has something to say about the representation of an identifier:

 The representation of an  in the base tables and views
 of the Information Schema is by a character string corresponding
 to its  (in the case of a )
 or its  (in the case of a ). Within this character string, any lower-case letter
 appearing in a  is replaced by the equivalent
 upper-case letter, and any  appearing in a
  is replaced by a .


Steffen




Re: indentifier case

2002-03-07 Thread Tim Bunce

Yeap, seems reasonable.

Tim.

On Wed, Mar 06, 2002 at 04:49:58PM -0800, Jeff Zucker wrote:
> Tim Bunce wrote:
> > 
> > On Wed, Mar 06, 2002 at 03:04:51PM -0800, Jeff Zucker wrote:
> >
> > > It's from the PDF of SQL92 I purchased from www.ansi.org.  But here's
> > > the text:
> > >
> > >   13)
> > >   A  and a  are equiva-
> > >   lent if the  of the  (with
> > >   every letter that is a lower-case letter replaced by the equiva-
> > >   lent upper-case letter or letters) and the  > >   body> of the  (with all occurrences of
> > >replaced by  and all occurrences of  > >   blequote symbol> replaced by ), considered as
> > >   the repetition of a  that specifies a
> > >of SQL_TEXT and an implementation-
> > >   defined collation that is sensitive to case, compare equally
> > >   according to the comparison rules in Subclause 8.2, " > >   predicate>".
> > >
> >
> > Perhaps, but the paragraph above seems too daft to be worth living with :)
> 
> Aha, you know what, I think I just figured out why it isn't daft and why
> your suggestion to use mixed case matching (BooB = "BooB") can't work:
> 
> Rule A - Regular identifiers are case INSENSITIVE
> Rule B - Delimited identifiers are case SENSITIVE
> 
> Therefore, these are the same as each other:
> 
>x1 boob
>x2 BooB
>x3 BOOB
> 
> And these are different from each other:
> 
>y1 "boob"
>y2 "BooB"
>y3 "BOOB"
> 
> If we allow group x to be equivalent to group y on the basis of mixed
> case, then we'd end up either with the three identifiers in group x
> different from each other (a violation of rule A) or the 3 identifiers
> in group y as the same as each other (a violation of rule B).  So we can
> never use mixed case to determiine the equivalence.  So we have to
> arbitrarily say that x1,x2,x3 as a group are all equal to one and only
> one of group y, either the lower case or the upper case.  The standards
> says that would be upper case to avoid confusion.
> 
> So, not so daft after all?
> 
> -- 
> Jeff



Re: indentifier case

2002-03-06 Thread Jeff Zucker

Tim Bunce wrote:
> 
> On Wed, Mar 06, 2002 at 03:04:51PM -0800, Jeff Zucker wrote:
>
> > It's from the PDF of SQL92 I purchased from www.ansi.org.  But here's
> > the text:
> >
> >   13)
> >   A  and a  are equiva-
> >   lent if the  of the  (with
> >   every letter that is a lower-case letter replaced by the equiva-
> >   lent upper-case letter or letters) and the  >   body> of the  (with all occurrences of
> >replaced by  and all occurrences of  >   blequote symbol> replaced by ), considered as
> >   the repetition of a  that specifies a
> >of SQL_TEXT and an implementation-
> >   defined collation that is sensitive to case, compare equally
> >   according to the comparison rules in Subclause 8.2, " >   predicate>".
> >
>
> Perhaps, but the paragraph above seems too daft to be worth living with :)

Aha, you know what, I think I just figured out why it isn't daft and why
your suggestion to use mixed case matching (BooB = "BooB") can't work:

Rule A - Regular identifiers are case INSENSITIVE
Rule B - Delimited identifiers are case SENSITIVE

Therefore, these are the same as each other:

   x1 boob
   x2 BooB
   x3 BOOB

And these are different from each other:

   y1 "boob"
   y2 "BooB"
   y3 "BOOB"

If we allow group x to be equivalent to group y on the basis of mixed
case, then we'd end up either with the three identifiers in group x
different from each other (a violation of rule A) or the 3 identifiers
in group y as the same as each other (a violation of rule B).  So we can
never use mixed case to determiine the equivalence.  So we have to
arbitrarily say that x1,x2,x3 as a group are all equal to one and only
one of group y, either the lower case or the upper case.  The standards
says that would be upper case to avoid confusion.

So, not so daft after all?

-- 
Jeff



Re: indentifier case

2002-03-06 Thread Tim Bunce

On Wed, Mar 06, 2002 at 03:04:51PM -0800, Jeff Zucker wrote:
> Tim Bunce wrote:
> > 
> > On Mon, Mar 04, 2002 at 08:50:45AM -0800, Jeff Zucker wrote:
> >
> > > Hmm, I just checked the SQL92 standard and if I'm reading it correctly
> > > the situation is ugly.  Section 5.2.13 appears to say that a regular
> > > identifier and a delimited identifier are equivalent if the *upper case*
> > > version of the regular identifier compares *in a case sensitive manor*
> > > to the delimited identifier.
> > 
> > Got a URL?
> 
> It's from the PDF of SQL92 I purchased from www.ansi.org.  But here's
> the text:
> 
>   13)
>   A  and a  are equiva-
>   lent if the  of the  (with
>   every letter that is a lower-case letter replaced by the equiva-
>   lent upper-case letter or letters) and thebody> of the  (with all occurrences of
>replaced by  and all occurrences ofblequote symbol> replaced by ), considered as
>   the repetition of a  that specifies a
>of SQL_TEXT and an implementation-
>   defined collation that is sensitive to case, compare equally
>   according to the comparison rules in Subclause 8.2, "   predicate>".
> 
> As I understand that, it means regular identifiers match delimited
> identifiers only if the delimited identifier is all upper case and the
> regular identifier, when uppercase is equal to it.  This would be
> regardless of the value of SQL_IDENTIFIER_CASE.
> 
> > Why not go the SQL_IDENTIFIER_CASE = SQL_IC_MIXED route and say
> > that BooG == "BooG". That's fairly natural and obvious and I think
> > it would cause least problems in the long run. And wouldn't that
> > also suit DBD::CSV users?
> 
> That makes sense but I think the paragraph above means something like
> this:
> 
> boog <> "boog"
> BOOG <> "boog"
> boog == "BOOG"
> BOOG == "BOOG"

Perhaps, but the paragraph above seems too daft to be worth living with :)

Tim.



Re: indentifier case

2002-03-06 Thread Jeff Zucker

Tim Bunce wrote:
> 
> On Mon, Mar 04, 2002 at 08:50:45AM -0800, Jeff Zucker wrote:
>
> > Hmm, I just checked the SQL92 standard and if I'm reading it correctly
> > the situation is ugly.  Section 5.2.13 appears to say that a regular
> > identifier and a delimited identifier are equivalent if the *upper case*
> > version of the regular identifier compares *in a case sensitive manor*
> > to the delimited identifier.
> 
> Got a URL?

It's from the PDF of SQL92 I purchased from www.ansi.org.  But here's
the text:

  13)
  A  and a  are equiva-
  lent if the  of the  (with
  every letter that is a lower-case letter replaced by the equiva-
  lent upper-case letter or letters) and the  of the  (with all occurrences of
   replaced by  and all occurrences of  replaced by ), considered as
  the repetition of a  that specifies a
   of SQL_TEXT and an implementation-
  defined collation that is sensitive to case, compare equally
  according to the comparison rules in Subclause 8.2, "".

As I understand that, it means regular identifiers match delimited
identifiers only if the delimited identifier is all upper case and the
regular identifier, when uppercase is equal to it.  This would be
regardless of the value of SQL_IDENTIFIER_CASE.

> Why not go the SQL_IDENTIFIER_CASE = SQL_IC_MIXED route and say
> that BooG == "BooG". That's fairly natural and obvious and I think
> it would cause least problems in the long run. And wouldn't that
> also suit DBD::CSV users?

That makes sense but I think the paragraph above means something like
this:

boog <> "boog"
BOOG <> "boog"
boog == "BOOG"
BOOG == "BOOG"

-- 
Jeff



Re: indentifier case

2002-03-04 Thread Tim Bunce

On Mon, Mar 04, 2002 at 08:50:45AM -0800, Jeff Zucker wrote:
> Tim Bunce wrote:
> > 
> > On Sun, Mar 03, 2002 at 12:41:54PM -0800, Jeff Zucker wrote:
> > >
> > >  $dbh->do(q/ CREATE TABLE x ( OOg INT, "BOOg" CHAR ) /);
> > >
> > >  $sth=$dbh->prepare(q/ SELECT  boog  FROM x /); #4 NO SUCH COLUMN
> > >  $sth=$dbh->prepare(q/ SELECT  BOOg  FROM x /); #5 NO SUCH COLUMN
> > 
> >
> > I think maybe #5 would work for a system where
> > SQL_IDENTIFIER_CASE = SQL_IC_MIXED
> > I'm not sure. And you could argue that #4 would work as well
> 
> Hmm, I just checked the SQL92 standard and if I'm reading it correctly
> the situation is ugly.  Section 5.2.13 appears to say that a regular
> identifier and a delimited identifier are equivalent if the *upper case*
> version of the regular identifier compares *in a case sensitive manor*
> to the delimited identifier.

Got a URL?

> So this would be the way it should work
> according to the standard
> 
>$dbh->do(q/ CREATE TABLE x ( "BOOG" CHAR ) /);
>$sth=$dbh->prepare(q/ SELECT  BooG  FROM x /);  # OK
>$sth=$dbh->prepare(q/ SELECT  boog  FROM x /);  # OK
> 
>$dbh->do(q/ CREATE TABLE x ( "BooG" CHAR ) /);
>$sth=$dbh->prepare(q/ SELECT  BOOG  FROM x /);  # NO SUCH COLUMN
>$sth=$dbh->prepare(q/ SELECT  BooG  FROM x /);  # NO SUCH COLUMN
>$sth=$dbh->prepare(q/ SELECT  boog  FROM x /);  # NO SUCH COLUMN
> 
> That's a little too wierd for me.

That matches Oracle, ie SQL_IDENTIFIER_CASE = SQL_IC_UPPER.

> It would mean that only delimited
> identifiers that are all upper case can be compared to regular
> identifiers.  Unless it seems a gross violation, I think I will keep the
> behaviour that regular identifiers are never equivalent to delimited
> identifiers. -- if it's quoted then you must use quotes to refer to it.

That's a little too wierd for me :)

Why not go the SQL_IDENTIFIER_CASE = SQL_IC_MIXED route and say
that BooG == "BooG". That's fairly natural and obvious and I think
it would cause least problems in the long run. And wouldn't that
also suit DBD::CSV users?

Tim.



Re: indentifier case

2002-03-04 Thread Jeff Zucker

Tim Bunce wrote:
> 
> On Sun, Mar 03, 2002 at 12:41:54PM -0800, Jeff Zucker wrote:
> >
> >  $dbh->do(q/ CREATE TABLE x ( OOg INT, "BOOg" CHAR ) /);
> >
> >  $sth=$dbh->prepare(q/ SELECT  boog  FROM x /); #4 NO SUCH COLUMN
> >  $sth=$dbh->prepare(q/ SELECT  BOOg  FROM x /); #5 NO SUCH COLUMN
> 
>
> I think maybe #5 would work for a system where
> SQL_IDENTIFIER_CASE = SQL_IC_MIXED
> I'm not sure. And you could argue that #4 would work as well

Hmm, I just checked the SQL92 standard and if I'm reading it correctly
the situation is ugly.  Section 5.2.13 appears to say that a regular
identifier and a delimited identifier are equivalent if the *upper case*
version of the regular identifier compares *in a case sensitive manor*
to the delimited identifier.  So this would be the way it should work
according to the standard

   $dbh->do(q/ CREATE TABLE x ( "BOOG" CHAR ) /);
   $sth=$dbh->prepare(q/ SELECT  BooG  FROM x /);  # OK
   $sth=$dbh->prepare(q/ SELECT  boog  FROM x /);  # OK

   $dbh->do(q/ CREATE TABLE x ( "BooG" CHAR ) /);
   $sth=$dbh->prepare(q/ SELECT  BOOG  FROM x /);  # NO SUCH COLUMN
   $sth=$dbh->prepare(q/ SELECT  BooG  FROM x /);  # NO SUCH COLUMN
   $sth=$dbh->prepare(q/ SELECT  boog  FROM x /);  # NO SUCH COLUMN

That's a little too wierd for me.  It would mean that only delimited
identifiers that are all upper case can be compared to regular
identifiers.  Unless it seems a gross violation, I think I will keep the
behaviour that regular identifiers are never equivalent to delimited
identifiers. -- if it's quoted then you must use quotes to refer to it.

> So it defaults to lowercasing unquoted ids (SQL_IC_LOWER) like
> Oracle defaults to uppercasing them (SQL_IC_UPPER). Personally I
> think it makes sense to default to one or the other. Doing mixed
> case matching always feels weird to me.  But maybe I've not spent
> enough time on Windows/DOS boxes :)

I guess I am going to stick with mixed case for the simple reason that
there is already a broad user base of DBD::CSV users who have numerous
scripts which would break if I change that behavior.  Though for
portability, they'd be wise to not depend on that.

> A driver that conforms strictly to SQL-92 should return:
> SQL_IDENTIFIER_CASE = anything *except* SQL_IC_SENSITIVE
> SQL_QUOTED_IDENTIFIER_CASE = SQL_IC_SENSITIVE

Yep, I had my eye on that.

> And don't forget that a key feature of quoting an identifier is that
> it can then include whitespace and other 'special' characters.

Yep, and "" to represent " inside "...".  And can start with numbers or
be the same as reserved word e.g. this is legal:

  q/ SELECT "SELECT" FROM "FROM","WHERE" WHERE ... /

-- 
Jeff "Jeff"



Re: indentifier case

2002-03-04 Thread Tim Bunce

On Sun, Mar 03, 2002 at 12:41:54PM -0800, Jeff Zucker wrote:
> Tim Bunce wrote:
> > 
> >
> > Does double-quoting work, and provide case sensitivity (as per the standard)?
> 
> Let me just check wavelengths here.  Delimited (double quoted)
> identifiers *must* be case sensitive and stored in mixed case.

To be SQL-92 conformant, yes. See below.

> Other
> (non-quoted) identifiers *must not* be case sensitive but can also be
> stored in mixed case (that's left to the implementation).

To be SQL-92 conformant, yes. See below.

> So going that route, this is how I have things working in DBD::CSV:
> 
>  $dbh->do(q/ CREATE TABLE x ( OOg INT, "BOOg" CHAR ) /);
> 
>  $sth=$dbh->prepare(q/ SELECT  oog   FROM x /);  #1 OK
>  $sth=$dbh->prepare(q/ SELECT "BOOg" FROM x /);  #2 OK
>  $sth=$dbh->prepare(q/ SELECT "boog" FROM x /);  #3 NO SUCH COLUMN
>  $sth=$dbh->prepare(q/ SELECT  boog  FROM x /);  #4 NO SUCH COLUMN
>  $sth=$dbh->prepare(q/ SELECT  BOOg  FROM x /);  #5 NO SUCH COLUMN

Well that certainly matches what Oracle would do :)

I think maybe #5 would work for a system where
SQL_IDENTIFIER_CASE = SQL_IC_MIXED
I'm not sure. And you could argue that #4 would work as well
on the basis that if the only diference is letter case and if
the identifiers are not case sensitive (ie for matching) then
lettercase only differences don't matter.


>  my $sth=$dbh->prepare("SELECT * FROM x");
>  print join ',' @{$sth->{NAME}}#6  prints OOg,BOOg

Oracle would print OOG,BOOg because it uppercases unquoted identifiers.

>  $sth->fetchrow_hashref->{OOg} # OK
>  $sth->fetchrow_hashref->{BOOg}# OK
>  $sth->fetchrow_hashref->{oog} # NO SUCH HASH KEY
>  $sth->fetchrow_hashref->{boog}# NO SUCH HASH KEY
>  $sth->fetchrow_hashref->{q/"BOOg"/}   # NO SUCH HASH KEY
> 
> Is that how it should work assuming I want to store in mixed case? (all
> this leaving aside aliases).

Yes, "assuming I want to store in mixed case", ie:
SQL_IDENTIFIER_CASE = SQL_IC_MIXED
But of course people should use $h->{FetchHashKey} = 'NAME_lc' or
'NAME_uc' :)

> DBD::Pg differs from the above in that it uses oog and BOOg in the
> {NAME} and hashref, otherwise the same.

So it defaults to lowercasing unquoted ids (SQL_IC_LOWER) like
Oracle defaults to uppercasing them (SQL_IC_UPPER). Personally I
think it makes sense to default to one or the other. Doing mixed
case matching always feels weird to me.  But maybe I've not spent
enough time on Windows/DOS boxes :)

> DBD::ODBC with Access uses OOg and BOOg in the {NAME} and hashref as
> above, but permits 3,4,5!

4 and 5 are probably okay. But 3 is stretching things a bit! :)

FYI, the relevants parts of the ODBC GetInfo spec are:

SQL_IDENTIFIER_CASE =
SQL_IC_UPPERnot case sensitive, stored in uppercase
SQL_IC_LOWERnot case sensitive, stored in lowercase
SQL_IC_SENSITIVEcase sensitive, stored mixedcase
SQL_IC_MIXEDnot case sensitive, stored mixedcase

plus the same set again for SQL_QUOTED_IDENTIFIER_CASE.

A driver that conforms strictly to SQL-92 should return:
SQL_IDENTIFIER_CASE = anything *except* SQL_IC_SENSITIVE
SQL_QUOTED_IDENTIFIER_CASE = SQL_IC_SENSITIVE

And don't forget that a key feature of quoting an identifier is that
it can then include whitespace and other 'special' characters.

Tim.