Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Reynold Xin
I agree it sucks. We started with some decision that might have made sense back in 2013 (let's use Hive as the default source, and guess what, pick the slowest possible serde by default). We are paying that debt ever since. Thanks for bringing this thread up though. We don't have a clear

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Dongjoon Hyun
Technically, I has been suffered with (1) `CREATE TABLE` due to many difference for a long time (since 2017). So, I had a wrong assumption for the implication of that "(2) FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", Reynold. I admit that. You may not feel in the

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Reynold Xin
You are joking when you said " informed widely and discussed in many ways twice" right? This thread doesn't even talk about char/varchar:  https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E (Yes it talked about changing the

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Dongjoon Hyun
+1 for Wenchen's suggestion. I believe that the difference and effects are informed widely and discussed in many ways twice. First, this was shared on last December. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", 2019/12/06

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-19 Thread Jungtaek Lim
Anything would be OK if the create table DDL provides a "clear way" to expect the table provider "before" they run the query. Great news that it doesn't require major rework - looking forward to the PR. Thanks again to jump in and sort this out. - Jungtaek Lim (HeartSaVioR) On Fri, Mar 20, 2020

Re: Spark-3.0 - performance degradation

2020-03-19 Thread beliefer
I test it and cannot reproduce the issue. I build Spark-3.1.0 and Spark2.3.1. After many tests, it is found that there is little difference between them, and they win and lose each other. And from the view of event timeline, Spark-3.1.0 looks more accurate. -- Sent from:

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-19 Thread Ryan Blue
I have an update to the parser that unifies the CREATE TABLE rules. It took surprisingly little work to get the parser updated to produce CreateTableStatement and CreateTableAsSelectStatement with the Hive info. And the only fields I need to add to those statements were serde: SerdeInfo and

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-19 Thread Wenchen Fan
Big +1 to have one single unified CREATE TABLE syntax. In general, we can say there are 2 ways to specify the table provider: USING clause and ROW FORMAT/STORED AS clause. These 2 ways are mutually exclusive. If none is specified, it implicitly indicates USING defaultSource . I'm fine with a few