Re: WL#946 and Changing time literal format

2009-01-25 Thread Peter Gulutzan
Hi Monty,

Michael Widenius wrote:
 Hi!
 
 Peter == Peter Gulutzan peter.gulut...@sun.com writes:
 
 Peter Hi all,
 Peter On 01/15/2009 03:11 PM Peter Gulutzan wrote:
 
 For a TIME or DATETIME or TIMESTAMP literal, one can use
 '.' instead of ':' and one can skip leading fields. For example:
 INSERT INTO t (datetime_column) VALUES ('1001.01.01 11.22');
 For default MySQL changes the value to  '1001-01-01 11:22:00'.

 The MySQL Reference Manual calls this relaxed form.
 http://dev.mysql.com/doc/refman/5.1/en/using-date.html

 For WL#946 TIME/TIMESTAMP/DATETIME with fractional seconds,
 '.' means something else: decimal point. The natural reading
 of '11.22' is going to be '11.22 seconds' for many people.
 The WL#946 HLS says That [relaxed form] will no longer be
 possible, '.' must indicate that a fraction follows.

 I think we should consider these alternatives now:
 
 1. Deprecate use of '.' as a substitute for standard
 punctuation characters. The other relaxed form stuff
 can remain. The manual should say don't use '.' etc..
 
 Peter Roy says it is not strictly necessary.
 
 Peter Bernt says I would go for 1) 
 
 Peter Konstantin says This should be OK.
 
 Peter Unless more comments appear before January 31, this is the
 Peter winning option. Trudy wrote guidelines in 2006
 Peter https://inside.mysql.com/wiki/DeprecatingServerFeatures
 Peter I will try to follow them, except that instead of getting
 Peter approval from architecture team I will ask ServerPT.
 
 As Dmitri pointed out, we shouldn't deprecate '.' as substitute for
 dates.
 

Yes, '.' is fine in a date. But how do we know it's in a date?
Answer: because all the fields are there. So this applies for 2) not 1).

 Another things is that we should stop making decisions about
 incompatible changes without listening to the MySQL users.  They know
 more than we how MySQL is used and they are directly affected of any
 incompatible change we force upon them.
 

If you refer to the deprecation guidelines that I cited, you'll see
that there is a requirement to consult interested parties (Support,
PS) before submitting to a committee, etc.

 2. Insist that '.' will continue to be a substitute for
 standard punctuation characters if any field is missing,
 but '.' will mean decimal if and only if all fields are
 present and have no substitutions. Thus '11.22' means
 11 hours 22 minutes but '00:00:11.22' means 11.22
 seconds.
 
 Peter Roy said option 2) will work quite well.
 
 The question here is how PostgreSQL and ANSI does this and also what
 is the logical interpretation of the number.
 

PostgreSQL accepts various forms but '.' can only mean decimal point.
http://www.postgresql.org/docs/8.3/static/datatype-datetime.html

Bernt has answered re ANSI.

Roy has also answered re ANSI.

I doubt that there would be agreement about logical interpretation.

 I think that if we go with 1), 11.22 should mean 11.22 seconds for a
 time field.
 

Yes, if we can't use '.' as a substitute for other punctuation
characters, then it's safe to have it in the place the
standard allows, as a decimal point. That's the point of 1).

And now for Dmitri's question, which was:
AFAIK usage of '.' as a field separator in dates is quite wide spread
in real world (although I am not not sure how often it is used in this
role in SQL statements executed by our users).
May be it makes sense to keep '.' as allowed separator in date part
and prohibit it only in time part of datetime value?

Yes, provided it's possible to distinguish the date part and the
time part. That could be done by looking for ' ', or by using
one set of rules for DATE and another set of rules for TIME.
But I believe the essential idea is: when you know which field
it is by noting whether it's first / second / third / fourth / etc.
within the string, then you don't need to worry about choice of
punctuation character.

So '1.1.1 1.1.1.1' could be correctly interpreted as
'0001-01-01 01:01:01.1'. In fact that's happening now, already:

mysql select cast('1-1-1 1.1.1.1' as datetime);
+---+
| cast('1-1-1 1.1.1.1' as datetime) |
+---+
| 0001-01-01 01:01:01.10|
+---+
1 row in set (0.01 sec)

 cut
 
 3. Add a new mode, @@sql_mode=monty's_revenge. If it's on
 (which will never be the default), then relaxed mode
 is still possible (you can skip fields and you can use
 any punctuation other than '.'), but '.' means decimal
 point, so '11.22' means '11.22 seconds'.
 
 Peter Roy said it is not strictly necessary.
 Peter Bernt said with the addition of 3) for backwards compatability.
 Peter Konstantin said No new sql modes please.
 
 I agree with Konstantin that we should avoid new sql modes as much as
 possible. However, if we do break a lot of applications when we
 deprecate the usage of . as a separator for TIME fields, then we
 should add a temporary mode to help people move their applications
 forward until

Re: Hungarian collation

2007-01-31 Thread Peter Gulutzan
On Tue, 2006-10-17 at 10:15 -0600, Peter Gulutzan wrote:
 Hi,
 
 MySQL is looking for an authoritative, official statement
 which states all the current Hungarian collation rules.
 Please let other MySQL-using Hungarians (especially if you
 know a user group in Hungary) know about these
 questions. Best of all would be a translation of the
 Hungarian government standard, if there is one.
cut

I'm happy to say that several Hungarian experts
(acknowledged in the document below) helped us.
The proper direction for MySQL is clearer now.

I'm unhappy to say that implementing the new
Hungarian collations will be difficult. One
of the rules requires support for multiple
weights. That is a separate task for which
MySQL will need a long, long time.

The rest of this letter is Worklog task
high level specification WL#2993 titled
Add Hungarian collations for cp1250, latin2,
utf8, ucs2.

MySQL failed to find an authoritative, official statement
which states all the current Hungarian collation rules.
There is one, but it's not translated from Hungarian
(see References section later in this document).

MySQL also asked for help from Hungarians on lists.mysql.com.
Thanks to Peter Ritzinger, Csongor Fagyal, József Kozell,
András Bártházi, ImRe, and all others who helped.
This document reflects their interpretation of the standard
and their opinions, though no consensus exists.

1. Not deprecating latin2_hungarian_ci

The current latin2_hungarian_ci collation
is a chart in sql/share/charsets/latin2.xml.
Alexander Barkov has provided an easy-to-read web page: 
http://myoffice.izhnet.ru/bar/~bar/charts/latin2_hungarian_ci.html
This collation is unlike the Hungarian dictionaries,
collation descriptions, or products that we've seen.
For example the first letter is:
Latin Capital Letter A
= Latin Small Letter A
= control Single Shift 3
= No-Break Space
= Latin Small Letter L with caron
= Latin Capital Letter A with acute
= Latin Small Letter A with acute
But there is no reason that small L with caron
(which is Slovak not Hungarian) ever sorts with
A, there's some dispute whether A with acute
should sort with A, and all other
accented variants of A should be in this list too.
MySQL considered deprecating this
collation (which implies that MySQL will eventually
remove it), after introducing a new and more correct
one. However, that deprecation will not occur.

2. Hungarian alphabet chart

This is the chart of the Hungarian alphabet;
a á b c cs d dz dzs e é f g gy h i í j k l ly m n
ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs

Unicode names of special characters in the chart are:
á U+00E1 LATIN SMALL LETTER A WITH ACUTE
é U+00E9 LATIN SMALL LETTER E WITH ACUTE
í U+00ED LATIN SMALL LETTER I WITH ACUTE
ó U+00F3 LATIN SMALL LETTER O WITH ACUTE
ö U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
ő U+0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE
ú U+00FA LATIN SMALL LETTER U WITH ACUTE
ü U+00FC LATIN SMALL LETTER U WITH DIAERESIS
ű U+0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE

Each of the items in the chart is a Hungarian
letter. The chart is in order, except for special
cases noted in later sections. Each character has
an upper case equivalent (not shown) which has the
same position in the collating sequence -- all the
new collations will be case insensitive.

The DOUBLE ACUTE letters ő and ű are sometimes shown
as õ and û but that is either an error or a workaround
for character sets that do not contain ő and ű.
MySQL will treat all letters that do not appear
on the chart as non-Hungarian letters.

3. Special short/long vowel pairs rule

These are short/long vowel pairs:
a á, e é, i í, o ó, ö ő, u ú, ü ű
That is, á is the long form of a, and so on. 

There is a special rule for these short/long
vowel pairs:
long = short usually (primary rule),
but long  short if all other letters are equal (secondary rule).

Therefore:
'BÁ''BA' is true
'BÁ''BAC' is false

To put it another way:
First compare the strings without paying attention to
short-versus-long difference. Then, if they're equal,
compare them again paying attention to short-versus-long
difference.

Comparison is left-to-right. Therefore:
'BÁA''BAÁ' is true

This rule is controversial. Some Hungarians disagree.
Also, Simonsen's rules say A-acute  A, etc.:
http://std.dkuug.dk/i18n/locales/hu_HU
Also, there has been argument about the same thing for glibc:
http://sources.redhat.com/ml/libc-locales/2005-q4/msg2.html
But there seems to be agreement that the short/long
vowel pairs rule is standard and MySQL will have to follow it.

3. Special digraph/trigraph rule

A digraph is two letters treated as a single letter.
A trigraph is three letters treated as a single letter.

Hungarian has eight digraphs: cs dz gy ly ny sz ty zs,
and one trigraph: dzs. These letters appear according
to the order in the chart in Section 2. Therefore
'GYA'  'GZA' is true

4. Special double digraph/trigraph rule

There is a further detail when you see the first part of a
digraph followed by the digraph

RE: Hungarian collation

2006-10-30 Thread Peter Gulutzan
Hi,

On Thu, 2006-10-19 at 18:02 +0300, [EMAIL PROTECTED] wrote:
  From: Peter Gulutzan [EMAIL PROTECTED]
  
  MySQL is looking for an authoritative, official statement which states 
  all the current Hungarian collation rules.
 
 According to the Reference Level Description of the hungarian language (ISBN
 9634206441 or the hungarian version on line:
 http://bme-tk.bme.hu/other/kuszob/hangok.htm ) the rules are 
 the following:
 

Apparently http://bme-tk.bme.hu/other/kuszob/hangok.htm is an
educational site (something to do with the council of Europe)
as opposed to an official standards site, if I'm understanding
correctly.

 - The basic order of the alphabet is a á b c cs d dz dzs e é f g gy h i í j
 k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs
 - For the short-long vowel pairs (a á, e é, i í, o ó, ö ő, u ú, ü ű)  long =
 short usually, but long  short if all else 
 is equal. E.g., kád  kar  kár  kard

So far, this seems to be the opinion of a majority, although not
everyone describes the rule the same way. If MySQL adopts this rule,
SELECT * FROM t WHERE column1 = 'kár';
will not return rows where column1 = 'kar'. But perhaps
SELECT * FROM t WHERE column LIKE 'ká%'
will return rows where column1 = 'kar'

 - The long double consonants are sorting as if they would have been
 expanded.  I.e., ggy as gygy, nny as nyny

So 'ccs sorts with cscs' is true, i.e. ccs  cds

I expect that there is no rule which could apply for all LIKE searches.

 - Composit words are sorted according to word parts. I.e., meggyújt  meglát
  megy  meggy
 

I don't see a way to determine what is a composite word. So MySQL would
return meglát  megy  meggy  meggyújt

 An alternative collation sometimes used (in libraries, and some dictionaries
 and lexica) is according to the basic latin alphabet, whit the accented
 letters having the same value as the not accented.  Or anything in between.
 E.g., honoring the digraphs and the trigraph, but leaving the accents out of
 the business.
  
 I hope this helps.
 

Yes, and thank you. I'm grateful for the help MySQL is getting on this
question. We are still hoping for more responses.

 ImRe
 
 
-- 
Peter Gulutzan, Senior Software Architect
MySQL AB, www.mysql.com
Office: +1 780 472-6838
Mobile: +1 780 904-0297
VoIP:   +1 408 213-6654



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Hungarian collation

2006-10-17 Thread Peter Gulutzan
Hi,

MySQL is looking for an authoritative, official statement
which states all the current Hungarian collation rules.
Please let other MySQL-using Hungarians (especially if you
know a user group in Hungary) know about these
questions. Best of all would be a translation of the
Hungarian government standard, if there is one.

MySQL has received several complaints/suggestions about
Hungarian collation. For example these three people
contacted us via a public MySQL mailing list or bugs forum:
RITZINGER Peter (http://bugs.mysql.com/bug.php?id=12519)
BÁRTHÁZI András (http://lists.mysql.com/mysql/191427)
Csongor Fagyal (http://bugs.mysql.com/bug.php?id=22337)
In what follows I will refer to what seems to be agreed,
and what seems to be disputed.

The current latin2_hungarian_ci collation
is a chart in sql/share/charsets/latin2.xml,
and Mr Barkov has provided an easy-to-read web page: 
http://myoffice.izhnet.ru/bar/~bar/charts/latin2_hungarian_ci.html
This collation is unlike the Hungarian dictionaries,
collation descriptions, or products that we've seen.
For example the first letter is:
Latin Capital Letter A
= Latin Small Letter A
= control Single Shift 3
= No-Break Space
= Latin Small Letter L with caron
= Latin Capital Letter A with acute
= Latin Small Letter A with acute
But there is no reason that small L with caron
(which is Slovak not Hungarian) ever sorts with
A, there's some dispute whether A with acute
should sort with A, and all other
accented variants of A should be in this list too.
It is likely that MySQL will deprecate this
collation (which implies that MySQL will eventually
remove it), after introducing a new and more correct
one.

Most people agree that this is the Hungarian alphabet;
a á b c cs d dz dzs e é f g gy h i í j k l ly m n
ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs

(The DOUBLE ACUTE letters ő and ű are sometimes shown
as õ and û but I suspect that is a conversion error.)

Some people also say there's a secondary sort
rule for these short/long vowel pairs:
a á, e é, i í, o ó, ö ő, u ú, ü ű
For these pairs, long = short usually, but long  short
if all else is equal.
I have seen comments showing that Oracle seems to follow this rule:
'BÁ''BA' is true
'BÁ''BAC' is false
but the commenter, though Hungarian, didn't like what Oracle did.
(thread 'nlssort' on comp.databases.oracle.server 2002-11-10)
One commenter wrote to us about a similar thing, saying
it's a mistake that SELECT 'hal' LIKE 'hál' is true.
Unfortunately, the same person also disagrees, saying
that we should have two collations, one where
long  short, one where long = short.
I have also seen Simonsen's rules:
http://std.dkuug.dk/i18n/locales/hu_HU
They suggest that A-acute  A, etc.
I have also seen argument about the same thing for glibc:
http://sources.redhat.com/ml/libc-locales/2005-q4/msg2.html

Apparently all Hungarians agree that these digraphs are letters:
cs dz dzs gy ly ny sz ty zs
That's bad but not very bad. MySQL handles digraphs in Spanish.
There is also one trigraph:
dzs
That's very bad. Luckily dzs is rare, it's mostly for
English words with a j sound (bridge is 'briddz',
gin is 'dzsinn') (so I'm told).

There is a special rule when you see the first part of a
digraph followed by the digraph. For example, in 'ggy',
'g' is the first part of 'gy' and it's followed by 'gy'
... and MySQL treats it as a repetition of the digraph, i.e.
as if it's 'gygy'. This applies to all the letters listed
in the previous paragraph, so:
ccs = cscs, ddz = dzdz, ddsz = dzsdzs, ggy = gygy,
lly = lyly, nny = nyny, ssz = szsz, tty = tyty, zzs = zszs.
For example, Mr Ritzinger says that
'tty  tz' because tty is expanded to tyty.
I know that other products handle the situation, but I've
seen them called double compressions, which worries me --
do some people think that 'cscs sorts with ccs' rather than
'ccs sorts with cscs'?

A collation which follows the single-character rules, but
ignores digraphs and trigraphs, sounds somewhat like what I
see in Kaplan's remarks on Microsoft's Hungarian Technical Sort:
http://blogs.msdn.com/michkap/archive/2005/11/26/495072.aspx
One of the above-listed people would accept this, he says he doesn't
care about digraphs or trigraphs. But I have no idea whether
Microsoft was following some technical standard.

All characters outside the Hungarian alphabet should be done
according to UCA 4.0.0 (until MySQL switches to the newer UCA).

For Unicode support, I suggest names for the new collations
should be: ucs2_hungarian2_ci, utf8_hungarian2_ci. The only
other character sets that may have Hungarian collations are
latin2 and cp1250.

Our concern at this time is only for the primary sort, the
collation necessary for searches. The secondary sort or
tertiary sort rules, the ones that affect only ORDER BY,
are of interest but will only be of importance in the future.

-- 
Peter Gulutzan, Senior Software Architect
MySQL AB, www.mysql.com
Office: +1 780 472-6838
Mobile: +1 780 904-0297
VoIP:   +1 408 213

Re: Need help comparing MySQL to MS SQL Server

2003-11-18 Thread Peter Gulutzan
KEVIN ZEMBOWER wrote:

The organization hired an outside consultant to
evaluate which SQL engine to go with. This is
what he sent to us:

...


SQL Server 2000 is a complete Relational Database
Management System (RDBMS) that also includes integrated
analysis functionality for OLAP and data mining. SQL Server
2000 meets the data and analysis storage requirements of the
largest data processing systems and commercial Web sites, yet
at the same time can provide easy-to-use data storage services
to an individual or small business.


I don't know whether the consultant wrote this himself, or
if it came from somewhere.

Answer:

It came from somewhere. The above paragraph is a
word-for-word quote from this Microsoft document:

http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/deploy/upgrdmigrate/mysql.asp



Regards,
PeterG


-- 
Peter Gulutzan, Software Architect
MySQL AB, www.mysql.com


Are you MySQL certified?  www.mysql.com/certification


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



MySQL: standard SQL compliancy: primary keys: default: should be NOT NULL

2003-06-05 Thread Peter Gulutzan
Hi,   
   
We agree that statements of the form  
CREATE TABLE table-name (column1 INT PRIMARY KEY)  
should be legal -- it should not be necessary to say  
CREATE TABLE table-name (column1 INT PRIMARY KEY NOT NULL)  
  
The requirement, that primary keys should explicitly be  
declared as NOT NULL, was once necessary: that is the  
SQL-92 entry-level requirement.  In SQL-92  
intermediate, and in SQL-99, the NOT NULL is implied. 
MySQL is now moving to SQL-99 compliance, therefore 
(1) it's true, NOT NULL should be assumed 
(2) the behaviour has been changed, starting with version 
4.0.13. 
 
So, in the current MySQL release, 
CREATE TABLE t (s1 INT PRIMARY KEY) 
does not return an error message. 
 
Regards, 
Peter Gulutzan 
2003-06-04 
   
--
For technical support contracts, visit https://order.mysql.com/?ref=mpgu   
   __  ___ ___   __   
  /  |/  /_ __/ __/ __ \/ /Mr. Peter Gulutzan [EMAIL PROTECTED]   
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Software Architect   
/_/  /_/\_, /___/\___\_\___/   Edmonton, Canada   
   ___/   www.mysql.com   Office: +1 780 472-6838   
   

-
This mail sent through IMP: http://horde.org/imp/

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]