Re: [sqlite] "Standard SQL" ?

2020-02-02 Thread Markus Winand


> On 1 Feb 2020, at 23:31, James K. Lowden  wrote:
> 
> Similarly, last I checked, no SQL standard supported LIMIT for SELECT.  

Just FYI:

The the functionality of LIMIT was added to the SQL standard ISO/IEC 9075 with 
the 2008 update in form of

   FETCH FIRST … ROWS ONLY

This clause also support some variants like FETCH FIRST 10 PERCENT ONLY and 
also a WITH TIES modifier.

All of that is still there in the current version of the SQL standard (from 
2016).

I would say LIMIT is widely supported, but it is not standard SQL, maybe common 
SQL.

> 
> On the positive side, some parts of SQL haven't changed since the Late
> Bronze Age.  "SELECT *" still means all columns; "FROM" still takes a
> table argument, whether a tablename, view, or expression.  "WHERE"
> operates on values "before" aggregation; "HAVING" on values "after"
> aggregation.  Any implementation that operates any other way does not
> implement standard SQL.  
> 
> SQL is hardly unique in this regard.  We also refer to "the" C standard
> library, to "Posix", and to "the" C or C++ standard.  Like SQL, there
> are many such and (also like SQL) some parts are unchanged since the
> beginning while, just as usefully, some that weren't part of the
> first standard haven't changed since they were introduced.  

When you say “many standards” do you mean the different releases those 
standards have?

IMHO, there is only one SQL standard, namely ISO/IEC 9075. The current and 
technically only valid version is that of 2016 (even though an extension was 
added in 2019).

-markus
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Generated columns and COLLATE in the AS parens

2020-01-31 Thread Markus Winand
Let me first explain how collations work in standard SQL. I think my answers 
below make more sense then.

In the SQL standard….

- character string **types** have properties such as fix-length vs. variable 
length, the length (limit), character set and also the collation.

- columns have types, which include the collation if it is a character string 
type.
  However, it is not the column that has a collation. The column has a type, 
which might have a collation.

- values have a static type (the so-called “declared type”). Static means it is 
determined at “compile time” from the syntax and the data dictionary.

- Expressions also have a declared type. More elaborate: the result value of an 
expression has a type that is statically determined (again  the “declared 
type”).

- expressions of which the declared type is a character string type, will also 
have all the properties of character string types (including the collation).

- When character strings are compared, the effective collation is determined 
from the declared types of the operands.
   The SQL standard defines rules how to do that (e.g. in 9075-2 9.15 
"Collation determination”)
   The most important rule is the "collation derivation” order: explicit, 
implicit, none. That’s the rule that says
   the COLLATE clause on expressions (“explicit” derivation) is stronger than 
the COLLATE clause following type names (“implicit” derivation).

But note that the last statement is not the reason the collation of the 
generation expression takes precedence of the data type — IT DOESN’T!


As far as I know, this is basically unchanged sind SQL-92. If you do not have 
access to the current release of the standard (of 2016, to which my references 
apply), you can also have a look at SQL-92 here (search for "4.2  Character 
strings”):

https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

Generated columns where introduced with SQL:2003 and are thus not in the linked 
document.


Putting these mechanics of working with collations in context of generated 
columns:

- Generated columns need a type like any other column.
  If that type happens to be a character string type, the type includes the 
collation as part of the types properties.

- Even though generated columns need a type, they allow skipping the type in 
the definition of the generated column—that’s very unique to generated columns.
  E.g. specifying a DEFAULT clause does not lift the requirement to explicitly 
state the type of the column.

- If a generated column doesn’t have an explicit data type mentioned, the the 
data type of the result of the expression is used.
  Whatever data type that is, whatever properties that type has.
  If it happens to be a character string type, it will also have a “collation” 
property.

However, the declared type of the expression (including its properties) is only 
relevant if the column definition doesn’t explicitly state a data type.

These are two different cases in standard SQL:

  C1 TEXT GENERATED ALWAYS AS (…)
  C2  GENERATED ALWAYS AS (…)

The type of C1 is TEXT, including all its default properties.
The type of C2 is the type of the result of the expression, including all its 
properties.

The “what’s new” paper for SQL:2003 mentions that case explicitly.

http://sigmodrecord.org/publications/sigmodRecord/0403/E.JimAndrew-standard.pdf

The example on the last page:

> CREATE TABLE EMPLOYEES (
>  EMP_ID INTEGER,
>  SALARY DECIMAL(7,2),
>  BONUS DECIMAL(7,2),
>  TOTAL_COMP GENERATED ALWAYS AS (SALARY + BONUS)
>  )
> 
> TOTAL_COMP is a generated column of the EMPLOYEES table. The data type of the 
> TOTAL_COMP is the data type of the expression (SALARY_BONUS).

I think that’s a typo and should read (SALARY + BONUS).

> Users may optionally specify a data type for a generated column, in which 
> case the specified data type must match with the data type of the associated 
> expression.

The last part is not 100% in line with the current standard. The types don’t 
need to match, they need to be assignable. I don’t know if this is just a 
simplification for this paper or if SQL:2003 really hat that requirement.

Further comments to your statements below…

> On 30 Jan 2020, at 23:06, Keith Medcalf  wrote:
> 
> 
> On: Wednesday, 29 January, 2020 06:45, Markus Winand 
>  wrote:
> 
>> I think there might be a glitch in the way SQLite 3.31.x derives the
>> collation information from the expression of a generated column.
> 
>> In particular, COLLATE inside the AS parens seems to be ignored, but it
>> is honoured after the parens:
> 
> Carrying the COLLATE from an expression into the column definition is 
> incorrect.  The definition of a generated column is:
> 
>  [type affinity] [GENERATED ALWAYS AS ()] [COLLATE 
> ] [ ...]
> 
> so why would the so including a COLLATE as part of the expression applies to 
> the expression,

Re: [sqlite] "Standard SQL" ?

2020-01-30 Thread Markus Winand
> On 30 Jan 2020, at 21:12, Keith Medcalf  wrote:
> 
> On Thursday, 30 January, 2020 12:20, Simon Slavin  
> wrote:
> 
>> I would appreciate your help.  Reading a technical article today, I came
>> across a casual reference to "Standard SQL" as if it was a well-known
>> thing.  This worried me since I've never heard the term and I'm meant to
>> know about such things.

I’m an author using that term.

What I personally mean with “Standard SQL” is the language described by ISO/IEC 
9075.

(Sorry, I’m saying ISO/IEC 9075 a lot in my response to avoid the term 
“standard” and the related confusion).

> 
> I would say that "Standard SQL" is that variant of SQL is that "minimal 
> common subset" dialect of SQL that is understood by every single 
> implementation claiming to be "SQL compliant" that has ever existed until the 
> present.  In other words, it is the dialect of SQL that will be understood by 
> *any* software claiming to be "SQL compliant".

I think the word for what you describe is “Core SQL”.

I wrote about it here: https://modern-sql.com/standard/levels

> 
>> It doesn't seem to refer to the official standard for SQL, which is huge
>> and contains a plethora of features implemented once or never.  The
>> author seemed to think it was a sort of 'core SQL' – features identically
>> implemented by all, or most, of the well-known SQL engines.

Google, for example, is also using the term “Standard SQL” for products like 
BigQuery. Although Googles “Standard SQL” is more close to ISO/IEC 9075 that 
the other “SQL” dialects they offer, they are sill not conforming to ISO/IEC 
9075 in pretty basic ways.

> 
> There is a core SQL.  It is not defined.  

It is defined in ISO/IEC 9075. There are tables called (quoting):
“Feature taxonomy and definition for mandatory features”, contains a 
taxonomy of the features of SQL language in Core SQL that are specified in this 
part of ISO/IEC 9075.

There is an explicit list of mandatory features for Core SQL. And there are 
many optional features.


> Most SQL (particularly DDL) is "implementation specific".  About the only 
> things that comprise "standard SQL" are the DML constructs:

ISO/IEC 9075 also defines DDL.

Generally ISO/IEC 9075 only describes semantics, but no implementation details. 
E.g. CREATE TABLE is defined, CREATE INDEX not.

ISO/IEC 9075 allows language extensions as long as they use a syntax **not** 
described in ISO/IEC 9075.

> 
> INSERT INTO  () VALUES ();
> SELECT  FROM  [WHERE ] [GROUP BY  list> [HAVING ]] [ORDER BY ]
> DELETE FROM  [WHERE ]
> UPDATE  SET  = [,  = ]... [WHERE 
> ]
> 
>> The one possibility I can think of is SQL:1999.  This is the first
>> version which has features marked as 'mandatory' or 'optional'.  A full
>> implementation of all mandatory features could, I suppose, be called
>> "Standard SQL", but I've never heard of that term being used for that.
> 
> This is very Johnny-lately.  I think the first standard was SQL-85 but even 
> that was chock full of vendor implementation specifics and light on being 
> "standard".  Subsequent versions of the SQL Standard committee output simply 
> became more a practice in documenting vendor implementation specifics rather 
> than defining a standard.  Most standards suffer from this problem.
> 
>> Have any of you been using this term for a meaning other than "Fully
>> conforming to SQL:2019 (or whatever version you think current) ?  Do you
>> have documentation somewhere ?  Or are my suspicions correct and there's
>> no such thing ?
> 
> Yes. "standard SQL" is that subset of SQL that is understood by every 
> implemetation ever without making changes.

Well, that’s now how I’m using this term. Not does Google use it in this way.

-markus

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Generated columns and COLLATE in the AS parens

2020-01-30 Thread Markus Winand


> On 30 Jan 2020, at 18:20, Richard Hipp  wrote:
> 
> On 1/30/20, Markus Winand  wrote:
>> 
>> Unfortunately, the “what would PostgreSQL do” approach doesn’t provide
>> guidance here.
> 
> Maybe it does.  PostgreSQL doesn't allow typeless columns, but it does
> allow columns with unspecified collating sequences, does it not?  

If you do not specify a COLLATE clause in a column definition, the default 
collation of the effective character set is used.

The name ‘default’ is still a collation. Quoting 
https://www.postgresql.org/docs/current/collation.html:

The collation of an expression can be the "default" collation, 
which means the locale settings defined for the database. 


> What
> if you have a normal column X with some collating sequence C and then
> a generated column Y that as just "AS(X)”.

It depends on whether the generated column specifies a  or not (see 
below).


>  If you do comparisons on
> column Y, which collating sequence does it use - the default or C?
> 
> Can you run that experiment for us?

David Raymond did.

Here is what happens, split into the three relevant cases.

> testing=> create table foo (
> testing(> a text,
> testing(> b text collate "C",
> testing(> c text collate "en-US-x-icu”,
> […]
> attnum | attname | attgenerated |  collname
> +-+--+-
>  1 | a   |  | default
>  2 | b   |  | C
>  3 | c   |  | en-US-x-icu

Pretty obvious.

The next columns:

> testing(> d text generated always as (a) stored,
> testing(> e text generated always as (b) stored,
> testing(> f text generated always as (c) stored,

>  4 | d   | s| default
>  5 | e   | s| default
>  6 | f   | s| default

The generated column definitions mention a , in that case the types 
character set’s default collation is used — coincidentally called “default”. 
Thus, it is the same case as column “a”.

Finally:

> testing(> g text collate "C" generated always as (c) stored,
> testing(> h text generated always as (c collate "C") stored,
> testing(> i text collate "C" generated always as (c collate "en-US-x-icu") 
> stored
> testing(> );

>  7 | g   | s| C
>  8 | h   | s| default
>  9 | i   | s| C

Again,  is specified, thus this collation is used. The collation of 
“h” is not “C” because there is an explicit type definition “text”, which 
includes a character set and a default collation.

The collation of the expression is really only relevant if there is no  (and no COLLATE outside the expression).

> 
> Or maybe you are thinking the collating sequence of the expression in
> the AS clause should only be carried through into the generated column
> if it is explicitly stated, and not implied?

There is only one case when the type of the expression, including the character 
set and collation should be carried through into the generated column: if the 
generated column doesn’t explicitly define a type.

> 
> What happens if there is a collating sequence specified in the AS
> clause and also another collating sequence on the column definitions?
> 
>  CREATE TABLE t1(x TEXT, y TEXT AS (x COLLATE nocase) COLLATE rtree);
> 
> Which collating sequence should be used for "SELECT * FROM t1 WHERE y='xyz';"?

There are three places where a COLLATE clause in the definition of a generated 
column is allowed:

(1) After the data type (see 6.1,  in the BNF)
(2) Inside the expression (see 6.31,  in the BNF)
(3) At the very end (see 11.4,  in the BNF).

There is a syntax rule prohibiting (1) and (3) being used at the same time 
(11.4 SR12b). The same rule says that the effect of (1) and (3) is the same:

>   • Otherwise,  shall not be both specified in  type> and immediately contained in . If  
> is immediately contained in , then it is equivalent to 
> specifying an equivalent  in . 

This is the case you were just are asking about.

Case (2) might affect the collation of the expression, which is only relevant 
in case there is neither (1) or (3) specified. In that case, also the collation 
is taken from the expression—via the data type (11.4 SR13c).

> 13)  The declared type of the column is
> Case:
>   • a)  If  is specified, then that data type. If  clause> is also specified, then the declared type of  
> shall be assignable to the declared type of the column.
>   • b)  If  is specified, […]
>   • c)  If  is specified, then the declared type of GE.

As you see, as soon as a  is specified, the collation of the 
expression is irrelevant. This is what can be demonstrated in PostgreS

Re: [sqlite] Generated columns and COLLATE in the AS parens

2020-01-30 Thread Markus Winand

> On 29 Jan 2020, at 14:59, Richard Hipp  wrote:
> 
> On 1/29/20, Markus Winand  wrote:
>> Hi!
>> 
>> I think there might be a glitch in the way SQLite 3.31.x derives the
>> collation information from the expression of a generated column.
> 
> I think the current behavior is correct.
> 
> If you want a column to have a non-standard collating sequence, you
> should add a COLLATE constraint to that column definition.  The fact
> that there is a COLLATE operator on the expression that determines the
> value of that column seems irrelevant.
> 
> Consider this:
> 
> CREATE TABLE t1(a TEXT DEFAULT('xyzzy' COLLATE nocase));
> 
> Would you expect the COLLATE operator in the DEFAULT clause to change
> the collating sequence associated with column a?  Why should a
> GENERATED ALWAYS AS constraint work differently from a DEFAULT
> constraint?

The critical point here is that SQLite accepts a GENERATED clause without 
explicit type:

CREATE TABLE tmp (str VARCHAR(255), str_nc1 GENERATED ALWAYS AS (str COLLATE 
NOCASE), str_nc2 GENERATED ALWAYS AS (str) COLLATE NOCASE);

In that case, the type of the generated column should taken from the result 
type of the expression[0]. As collations are part of the character string 
types[1], also the collation is taken from the expression. The COLLATE clause 
applied to expressions (as opposed to following the name of a data type) just 
changes the collation in of the character string type of that expression[2].

If the GENERATED clause explicitly sets a data type, then the collation of the 
expression is irrelevant as it doesn’t affect the type of the generated column, 
similar to your example with DEFAULT (it is only a value assignment in that 
case).

At the end everything boils down to this question:
What is the type, including the character set and collation, of generated 
columns that don’t specify a type explicitly? I think the only sensible answer 
is that it is the type of the expression, including its character set and 
collation.

Unfortunately, the “what would PostgreSQL do” approach doesn’t provide guidance 
here as PostgreSQL requires an explicit type for generated columns (and so do 
MySQL and MariaDB). In SQL Server, however, it works like I would expect it.

I guess the SQLite approach to types might affect all of my reasoning, but in 
other cases like the following it seems that SQLite also “transports” the 
applicable collation together with the result (type?) of an expression:

sqlite> CREATE TABLE tmp (str VARCHAR(255));
sqlite> INSERT INTO tmp VALUES ('a'), ('A'), ('b'), ('B');
sqlite> SELECT * FROM (SELECT str COLLATE NOCASE FROM tmp) ORDER BY str;
a
A
b
B

-markus

References to SQL:2016, Part 2. I know that this is not a strong argument to 
you, but nevertheless ;)

[0] 11.4 SR 13 c: 
• If  is specified, then the declared type of GE. 
 Whereas GE is defined in SR 10 a:
• Let GE be the  contained in GC.

[1] 4.2.1
   A character string type descriptor contains:
   [….]
• —  The catalog name, schema name, and collation name of the collation 
of the character string type.

[2] 6.31 SR 4a:
• If  is specified, then the declared type of the 
 is the declared type of the , except that 
the declared type collation is the collation identified by , 
and its collation derivation is explicit.


> 
> -- 
> D. Richard Hipp
> drh at sqlite.org
> 

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Generated columns and COLLATE in the AS parens

2020-01-29 Thread Markus Winand
Hi!

I think there might be a glitch in the way SQLite 3.31.x derives the collation 
information from the expression of a generated column.

In particular, COLLATE inside the AS parens seems to be ignored, but it is 
honoured after the parens:

SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE tmp (str VARCHAR(255), str_nc1 GENERATED ALWAYS AS (str 
COLLATE NOCASE), str_nc2 GENERATED ALWAYS AS (str) COLLATE NOCASE);
sqlite> INSERT INTO tmp VALUES ('a'), ('A'), ('b'), ('B');
sqlite> SELECT * FROM tmp ORDER BY str;
A|A|A
B|B|B
a|a|a
b|b|b
sqlite> SELECT * FROM tmp ORDER BY str_nc1;
A|A|A
B|B|B
a|a|a
b|b|b
sqlite> SELECT * FROM tmp ORDER BY str_nc2;
a|a|a
A|A|A
b|b|b
B|B|B
sqlite> SELECT * FROM tmp ORDER BY (str COLLATE NOCASE);
a|a|a
A|A|A
b|b|b
B|B|B
sqlite>

I believe the "ORDER BY str_nc1” should yield the same row order as the last 
two queries.

-markus
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users