Re: WL#946 and Changing time literal format
Hi Monty, Michael Widenius wrote: Hi! Peter == Peter Gulutzan peter.gulut...@sun.com writes: Peter Hi all, Peter On 01/15/2009 03:11 PM Peter Gulutzan wrote: For a TIME or DATETIME or TIMESTAMP literal, one can use '.' instead of ':' and one can skip leading fields. For example: INSERT INTO t (datetime_column) VALUES ('1001.01.01 11.22'); For default MySQL changes the value to '1001-01-01 11:22:00'. The MySQL Reference Manual calls this relaxed form. http://dev.mysql.com/doc/refman/5.1/en/using-date.html For WL#946 TIME/TIMESTAMP/DATETIME with fractional seconds, '.' means something else: decimal point. The natural reading of '11.22' is going to be '11.22 seconds' for many people. The WL#946 HLS says That [relaxed form] will no longer be possible, '.' must indicate that a fraction follows. I think we should consider these alternatives now: 1. Deprecate use of '.' as a substitute for standard punctuation characters. The other relaxed form stuff can remain. The manual should say don't use '.' etc.. Peter Roy says it is not strictly necessary. Peter Bernt says I would go for 1) Peter Konstantin says This should be OK. Peter Unless more comments appear before January 31, this is the Peter winning option. Trudy wrote guidelines in 2006 Peter https://inside.mysql.com/wiki/DeprecatingServerFeatures Peter I will try to follow them, except that instead of getting Peter approval from architecture team I will ask ServerPT. As Dmitri pointed out, we shouldn't deprecate '.' as substitute for dates. Yes, '.' is fine in a date. But how do we know it's in a date? Answer: because all the fields are there. So this applies for 2) not 1). Another things is that we should stop making decisions about incompatible changes without listening to the MySQL users. They know more than we how MySQL is used and they are directly affected of any incompatible change we force upon them. If you refer to the deprecation guidelines that I cited, you'll see that there is a requirement to consult interested parties (Support, PS) before submitting to a committee, etc. 2. Insist that '.' will continue to be a substitute for standard punctuation characters if any field is missing, but '.' will mean decimal if and only if all fields are present and have no substitutions. Thus '11.22' means 11 hours 22 minutes but '00:00:11.22' means 11.22 seconds. Peter Roy said option 2) will work quite well. The question here is how PostgreSQL and ANSI does this and also what is the logical interpretation of the number. PostgreSQL accepts various forms but '.' can only mean decimal point. http://www.postgresql.org/docs/8.3/static/datatype-datetime.html Bernt has answered re ANSI. Roy has also answered re ANSI. I doubt that there would be agreement about logical interpretation. I think that if we go with 1), 11.22 should mean 11.22 seconds for a time field. Yes, if we can't use '.' as a substitute for other punctuation characters, then it's safe to have it in the place the standard allows, as a decimal point. That's the point of 1). And now for Dmitri's question, which was: AFAIK usage of '.' as a field separator in dates is quite wide spread in real world (although I am not not sure how often it is used in this role in SQL statements executed by our users). May be it makes sense to keep '.' as allowed separator in date part and prohibit it only in time part of datetime value? Yes, provided it's possible to distinguish the date part and the time part. That could be done by looking for ' ', or by using one set of rules for DATE and another set of rules for TIME. But I believe the essential idea is: when you know which field it is by noting whether it's first / second / third / fourth / etc. within the string, then you don't need to worry about choice of punctuation character. So '1.1.1 1.1.1.1' could be correctly interpreted as '0001-01-01 01:01:01.1'. In fact that's happening now, already: mysql select cast('1-1-1 1.1.1.1' as datetime); +---+ | cast('1-1-1 1.1.1.1' as datetime) | +---+ | 0001-01-01 01:01:01.10| +---+ 1 row in set (0.01 sec) cut 3. Add a new mode, @@sql_mode=monty's_revenge. If it's on (which will never be the default), then relaxed mode is still possible (you can skip fields and you can use any punctuation other than '.'), but '.' means decimal point, so '11.22' means '11.22 seconds'. Peter Roy said it is not strictly necessary. Peter Bernt said with the addition of 3) for backwards compatability. Peter Konstantin said No new sql modes please. I agree with Konstantin that we should avoid new sql modes as much as possible. However, if we do break a lot of applications when we deprecate the usage of . as a separator for TIME fields, then we should add a temporary mode to help people move their applications forward until
Re: Hungarian collation
On Tue, 2006-10-17 at 10:15 -0600, Peter Gulutzan wrote: Hi, MySQL is looking for an authoritative, official statement which states all the current Hungarian collation rules. Please let other MySQL-using Hungarians (especially if you know a user group in Hungary) know about these questions. Best of all would be a translation of the Hungarian government standard, if there is one. cut I'm happy to say that several Hungarian experts (acknowledged in the document below) helped us. The proper direction for MySQL is clearer now. I'm unhappy to say that implementing the new Hungarian collations will be difficult. One of the rules requires support for multiple weights. That is a separate task for which MySQL will need a long, long time. The rest of this letter is Worklog task high level specification WL#2993 titled Add Hungarian collations for cp1250, latin2, utf8, ucs2. MySQL failed to find an authoritative, official statement which states all the current Hungarian collation rules. There is one, but it's not translated from Hungarian (see References section later in this document). MySQL also asked for help from Hungarians on lists.mysql.com. Thanks to Peter Ritzinger, Csongor Fagyal, József Kozell, András Bártházi, ImRe, and all others who helped. This document reflects their interpretation of the standard and their opinions, though no consensus exists. 1. Not deprecating latin2_hungarian_ci The current latin2_hungarian_ci collation is a chart in sql/share/charsets/latin2.xml. Alexander Barkov has provided an easy-to-read web page: http://myoffice.izhnet.ru/bar/~bar/charts/latin2_hungarian_ci.html This collation is unlike the Hungarian dictionaries, collation descriptions, or products that we've seen. For example the first letter is: Latin Capital Letter A = Latin Small Letter A = control Single Shift 3 = No-Break Space = Latin Small Letter L with caron = Latin Capital Letter A with acute = Latin Small Letter A with acute But there is no reason that small L with caron (which is Slovak not Hungarian) ever sorts with A, there's some dispute whether A with acute should sort with A, and all other accented variants of A should be in this list too. MySQL considered deprecating this collation (which implies that MySQL will eventually remove it), after introducing a new and more correct one. However, that deprecation will not occur. 2. Hungarian alphabet chart This is the chart of the Hungarian alphabet; a á b c cs d dz dzs e é f g gy h i í j k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs Unicode names of special characters in the chart are: á U+00E1 LATIN SMALL LETTER A WITH ACUTE é U+00E9 LATIN SMALL LETTER E WITH ACUTE í U+00ED LATIN SMALL LETTER I WITH ACUTE ó U+00F3 LATIN SMALL LETTER O WITH ACUTE ö U+00F6 LATIN SMALL LETTER O WITH DIAERESIS ő U+0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE ú U+00FA LATIN SMALL LETTER U WITH ACUTE ü U+00FC LATIN SMALL LETTER U WITH DIAERESIS ű U+0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE Each of the items in the chart is a Hungarian letter. The chart is in order, except for special cases noted in later sections. Each character has an upper case equivalent (not shown) which has the same position in the collating sequence -- all the new collations will be case insensitive. The DOUBLE ACUTE letters ő and ű are sometimes shown as õ and û but that is either an error or a workaround for character sets that do not contain ő and ű. MySQL will treat all letters that do not appear on the chart as non-Hungarian letters. 3. Special short/long vowel pairs rule These are short/long vowel pairs: a á, e é, i í, o ó, ö ő, u ú, ü ű That is, á is the long form of a, and so on. There is a special rule for these short/long vowel pairs: long = short usually (primary rule), but long short if all other letters are equal (secondary rule). Therefore: 'BÁ''BA' is true 'BÁ''BAC' is false To put it another way: First compare the strings without paying attention to short-versus-long difference. Then, if they're equal, compare them again paying attention to short-versus-long difference. Comparison is left-to-right. Therefore: 'BÁA''BAÁ' is true This rule is controversial. Some Hungarians disagree. Also, Simonsen's rules say A-acute A, etc.: http://std.dkuug.dk/i18n/locales/hu_HU Also, there has been argument about the same thing for glibc: http://sources.redhat.com/ml/libc-locales/2005-q4/msg2.html But there seems to be agreement that the short/long vowel pairs rule is standard and MySQL will have to follow it. 3. Special digraph/trigraph rule A digraph is two letters treated as a single letter. A trigraph is three letters treated as a single letter. Hungarian has eight digraphs: cs dz gy ly ny sz ty zs, and one trigraph: dzs. These letters appear according to the order in the chart in Section 2. Therefore 'GYA' 'GZA' is true 4. Special double digraph/trigraph rule There is a further detail when you see the first part of a digraph followed by the digraph
RE: Hungarian collation
Hi, On Thu, 2006-10-19 at 18:02 +0300, [EMAIL PROTECTED] wrote: From: Peter Gulutzan [EMAIL PROTECTED] MySQL is looking for an authoritative, official statement which states all the current Hungarian collation rules. According to the Reference Level Description of the hungarian language (ISBN 9634206441 or the hungarian version on line: http://bme-tk.bme.hu/other/kuszob/hangok.htm ) the rules are the following: Apparently http://bme-tk.bme.hu/other/kuszob/hangok.htm is an educational site (something to do with the council of Europe) as opposed to an official standards site, if I'm understanding correctly. - The basic order of the alphabet is a á b c cs d dz dzs e é f g gy h i í j k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs - For the short-long vowel pairs (a á, e é, i í, o ó, ö ő, u ú, ü ű) long = short usually, but long short if all else is equal. E.g., kád kar kár kard So far, this seems to be the opinion of a majority, although not everyone describes the rule the same way. If MySQL adopts this rule, SELECT * FROM t WHERE column1 = 'kár'; will not return rows where column1 = 'kar'. But perhaps SELECT * FROM t WHERE column LIKE 'ká%' will return rows where column1 = 'kar' - The long double consonants are sorting as if they would have been expanded. I.e., ggy as gygy, nny as nyny So 'ccs sorts with cscs' is true, i.e. ccs cds I expect that there is no rule which could apply for all LIKE searches. - Composit words are sorted according to word parts. I.e., meggyújt meglát megy meggy I don't see a way to determine what is a composite word. So MySQL would return meglát megy meggy meggyújt An alternative collation sometimes used (in libraries, and some dictionaries and lexica) is according to the basic latin alphabet, whit the accented letters having the same value as the not accented. Or anything in between. E.g., honoring the digraphs and the trigraph, but leaving the accents out of the business. I hope this helps. Yes, and thank you. I'm grateful for the help MySQL is getting on this question. We are still hoping for more responses. ImRe -- Peter Gulutzan, Senior Software Architect MySQL AB, www.mysql.com Office: +1 780 472-6838 Mobile: +1 780 904-0297 VoIP: +1 408 213-6654 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Hungarian collation
Hi, MySQL is looking for an authoritative, official statement which states all the current Hungarian collation rules. Please let other MySQL-using Hungarians (especially if you know a user group in Hungary) know about these questions. Best of all would be a translation of the Hungarian government standard, if there is one. MySQL has received several complaints/suggestions about Hungarian collation. For example these three people contacted us via a public MySQL mailing list or bugs forum: RITZINGER Peter (http://bugs.mysql.com/bug.php?id=12519) BÁRTHÁZI András (http://lists.mysql.com/mysql/191427) Csongor Fagyal (http://bugs.mysql.com/bug.php?id=22337) In what follows I will refer to what seems to be agreed, and what seems to be disputed. The current latin2_hungarian_ci collation is a chart in sql/share/charsets/latin2.xml, and Mr Barkov has provided an easy-to-read web page: http://myoffice.izhnet.ru/bar/~bar/charts/latin2_hungarian_ci.html This collation is unlike the Hungarian dictionaries, collation descriptions, or products that we've seen. For example the first letter is: Latin Capital Letter A = Latin Small Letter A = control Single Shift 3 = No-Break Space = Latin Small Letter L with caron = Latin Capital Letter A with acute = Latin Small Letter A with acute But there is no reason that small L with caron (which is Slovak not Hungarian) ever sorts with A, there's some dispute whether A with acute should sort with A, and all other accented variants of A should be in this list too. It is likely that MySQL will deprecate this collation (which implies that MySQL will eventually remove it), after introducing a new and more correct one. Most people agree that this is the Hungarian alphabet; a á b c cs d dz dzs e é f g gy h i í j k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs (The DOUBLE ACUTE letters ő and ű are sometimes shown as õ and û but I suspect that is a conversion error.) Some people also say there's a secondary sort rule for these short/long vowel pairs: a á, e é, i í, o ó, ö ő, u ú, ü ű For these pairs, long = short usually, but long short if all else is equal. I have seen comments showing that Oracle seems to follow this rule: 'BÁ''BA' is true 'BÁ''BAC' is false but the commenter, though Hungarian, didn't like what Oracle did. (thread 'nlssort' on comp.databases.oracle.server 2002-11-10) One commenter wrote to us about a similar thing, saying it's a mistake that SELECT 'hal' LIKE 'hál' is true. Unfortunately, the same person also disagrees, saying that we should have two collations, one where long short, one where long = short. I have also seen Simonsen's rules: http://std.dkuug.dk/i18n/locales/hu_HU They suggest that A-acute A, etc. I have also seen argument about the same thing for glibc: http://sources.redhat.com/ml/libc-locales/2005-q4/msg2.html Apparently all Hungarians agree that these digraphs are letters: cs dz dzs gy ly ny sz ty zs That's bad but not very bad. MySQL handles digraphs in Spanish. There is also one trigraph: dzs That's very bad. Luckily dzs is rare, it's mostly for English words with a j sound (bridge is 'briddz', gin is 'dzsinn') (so I'm told). There is a special rule when you see the first part of a digraph followed by the digraph. For example, in 'ggy', 'g' is the first part of 'gy' and it's followed by 'gy' ... and MySQL treats it as a repetition of the digraph, i.e. as if it's 'gygy'. This applies to all the letters listed in the previous paragraph, so: ccs = cscs, ddz = dzdz, ddsz = dzsdzs, ggy = gygy, lly = lyly, nny = nyny, ssz = szsz, tty = tyty, zzs = zszs. For example, Mr Ritzinger says that 'tty tz' because tty is expanded to tyty. I know that other products handle the situation, but I've seen them called double compressions, which worries me -- do some people think that 'cscs sorts with ccs' rather than 'ccs sorts with cscs'? A collation which follows the single-character rules, but ignores digraphs and trigraphs, sounds somewhat like what I see in Kaplan's remarks on Microsoft's Hungarian Technical Sort: http://blogs.msdn.com/michkap/archive/2005/11/26/495072.aspx One of the above-listed people would accept this, he says he doesn't care about digraphs or trigraphs. But I have no idea whether Microsoft was following some technical standard. All characters outside the Hungarian alphabet should be done according to UCA 4.0.0 (until MySQL switches to the newer UCA). For Unicode support, I suggest names for the new collations should be: ucs2_hungarian2_ci, utf8_hungarian2_ci. The only other character sets that may have Hungarian collations are latin2 and cp1250. Our concern at this time is only for the primary sort, the collation necessary for searches. The secondary sort or tertiary sort rules, the ones that affect only ORDER BY, are of interest but will only be of importance in the future. -- Peter Gulutzan, Senior Software Architect MySQL AB, www.mysql.com Office: +1 780 472-6838 Mobile: +1 780 904-0297 VoIP: +1 408 213
Re: Need help comparing MySQL to MS SQL Server
KEVIN ZEMBOWER wrote: The organization hired an outside consultant to evaluate which SQL engine to go with. This is what he sent to us: ... SQL Server 2000 is a complete Relational Database Management System (RDBMS) that also includes integrated analysis functionality for OLAP and data mining. SQL Server 2000 meets the data and analysis storage requirements of the largest data processing systems and commercial Web sites, yet at the same time can provide easy-to-use data storage services to an individual or small business. I don't know whether the consultant wrote this himself, or if it came from somewhere. Answer: It came from somewhere. The above paragraph is a word-for-word quote from this Microsoft document: http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/deploy/upgrdmigrate/mysql.asp Regards, PeterG -- Peter Gulutzan, Software Architect MySQL AB, www.mysql.com Are you MySQL certified? www.mysql.com/certification -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
MySQL: standard SQL compliancy: primary keys: default: should be NOT NULL
Hi, We agree that statements of the form CREATE TABLE table-name (column1 INT PRIMARY KEY) should be legal -- it should not be necessary to say CREATE TABLE table-name (column1 INT PRIMARY KEY NOT NULL) The requirement, that primary keys should explicitly be declared as NOT NULL, was once necessary: that is the SQL-92 entry-level requirement. In SQL-92 intermediate, and in SQL-99, the NOT NULL is implied. MySQL is now moving to SQL-99 compliance, therefore (1) it's true, NOT NULL should be assumed (2) the behaviour has been changed, starting with version 4.0.13. So, in the current MySQL release, CREATE TABLE t (s1 INT PRIMARY KEY) does not return an error message. Regards, Peter Gulutzan 2003-06-04 -- For technical support contracts, visit https://order.mysql.com/?ref=mpgu __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Peter Gulutzan [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Software Architect /_/ /_/\_, /___/\___\_\___/ Edmonton, Canada ___/ www.mysql.com Office: +1 780 472-6838 - This mail sent through IMP: http://horde.org/imp/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]