Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Ur, are you comparing the number of SELECT statement with TRIM and CREATE statements with `CHAR`? > I looked up our usage logs (sorry I can't share this publicly) and trim has at least four orders of magnitude higher usage than char. We need to discuss more about what to do. This thread is what

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
BTW I'm not opposing us sticking to SQL standard (I'm in general for it). I was merely pointing out that if we deviate away from SQL standard in any way we are considered "wrong" or "incorrect". That argument itself is flawed when plenty of other popular database systems also deviate away from

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
I looked up our usage logs (sorry I can't share this publicly) and trim has at least four orders of magnitude higher usage than char. On Mon, Mar 16, 2020 at 5:27 PM, Dongjoon Hyun < dongjoon.h...@gmail.com > wrote: > > Thank you, Stephen and Reynold. > > > To Reynold. > > > The way I see

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Thank you, Stephen and Reynold. To Reynold. The way I see the following is a little different. > CHAR is an undocumented data type without clearly defined semantics. Let me describe in Apache Spark User's View point. Apache Spark started to claim `HiveContext` (and `hql/hiveql`

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Stephen Coy
Hi there, I’m kind of new around here, but I have had experience with all of all the so called “big iron” databases such as Oracle, IBM DB2 and Microsoft SQL Server as well as Postgresql. They all support the notion of “ANSI padding” for CHAR columns - which means that such columns are always

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
I haven't spent enough time thinking about it to give a strong opinion, but this is of course very different from TRIM. TRIM is a publicly documented function with two arguments, and we silently swapped the two arguments. And trim is also quite commonly used from a long time ago. CHAR is an

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
Hi, Reynold. (And +Michael Armbrust) If you think so, do you think it's okay that we change the return value silently? Then, I'm wondering why we reverted `TRIM` functions then? > Are we sure "not padding" is "incorrect"? Bests, Dongjoon. On Sun, Mar 15, 2020 at 11:15 PM Gourav Sengupta

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Gourav Sengupta
Hi, 100% agree with Reynold. Regards, Gourav Sengupta On Mon, Mar 16, 2020 at 3:31 AM Reynold Xin wrote: > Are we sure "not padding" is "incorrect"? > > I don't know whether ANSI SQL actually requires padding, but plenty of > databases don't actually pad. > >

Re: FYI: The evolution on `CHAR` type behavior

2020-03-15 Thread Reynold Xin
Are we sure "not padding" is "incorrect"? I don't know whether ANSI SQL actually requires padding, but plenty of databases don't actually pad. https://docs.snowflake.net/manuals/sql-reference/data-types-text.html (

Re: FYI: The evolution on `CHAR` type behavior

2020-03-15 Thread Dongjoon Hyun
Hi, Reynold. Please see the following for the context. https://issues.apache.org/jira/browse/SPARK-31136 "Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax" I raised the above issue according to the new rubric, and the banning was the proposed alternative to reduce

Re: FYI: The evolution on `CHAR` type behavior

2020-03-14 Thread Reynold Xin
I don’t understand this change. Wouldn’t this “ban” confuse the hell out of both new and old users? For old users, their old code that was working for char(3) would now stop working. For new users, depending on whether the underlying metastore char(3) is either supported but different from ansi

FYI: The evolution on `CHAR` type behavior

2020-03-14 Thread Dongjoon Hyun
Hi, All. Apache Spark has been suffered from a known consistency issue on `CHAR` type behavior among its usages and configurations. However, the evolution direction has been gradually moving forward to be consistent inside Apache Spark because we don't have `CHAR` offically. The following is the