[jira] [Updated] (SPARK-11398) unnecessary def dialectClassName in HiveContext, and misleading dialect conf at the start of spark-sql

Zhenhua Wang (JIRA) Mon, 02 Nov 2015 22:59:20 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhenhua Wang updated SPARK-11398:
---------------------------------
    Description: 
1. def dialectClassName in HiveContext is unnecessary. 

In HiveContext, if conf.dialect == "hiveql", getSQLDialect() will return new 
HiveQLDialect(this);
else it will use super.getSQLDialect(). Then in super.getSQLDialect(), it calls 
dialectClassName, which is overriden in HiveContext and still return 
super.dialectClassName.

So we'll never reach the code "classOf[HiveQLDialect].getCanonicalName" of def 
dialectClassName in HiveContext.

2. When we start bin/spark-sql, the default context is HiveContext, and the 
corresponding dialect is hiveql.
However, if we type "set spark.sql.dialect;", the result is "sql", which is 
inconsistent with the actual dialect and is misleading. For example, we can use 
sql like "create table" which is only allowed in hiveql, but this dialect conf 
shows it's "sql".

Although this problem will not cause any execution error, it's misleading to 
spark sql users. Therefore I think we should fix it.

In this pr, instead of overriding def dialect in conf of HiveContext, I set the 
SQLConf.DIALECT directly as "hiveql", such that result of "set 
spark.sql.dialect;" will be "hiveql", not "sql". After the change, we can still 
use "sql" as the dialect in HiveContext through "set spark.sql.dialect=sql". 
Then the conf.dialect in HiveContext will become sql. Because in SQLConf, def 
dialect = getConf(), and now the dialect in "settings" becomes "sql".

  was:
1. def dialectClassName in HiveContext is unnecessary. 
In HiveContext, if conf.dialect == "hiveql", getSQLDialect() will return new 
HiveQLDialect(this);
else it will use super.getSQLDialect(). Then in super.getSQLDialect(), it calls 
dialectClassName, which is overriden in HiveContext and still return 
super.dialectClassName.
So we'll never reach the code "classOf[HiveQLDialect].getCanonicalName" of def 
dialectClassName in HiveContext.

2. When we start bin/spark-sql, the default context is HiveContext, and the 
corresponding dialect is hiveql.
However, if we type "set spark.sql.dialect;", the result is "sql", which is 
inconsistent with the actual dialect and is misleading. For example, we can use 
sql like "create table" which is only allowed in hiveql, but this dialect conf 
shows it's "sql".
Although this problem will not cause any execution error, it's misleading to 
spark sql users. Therefore I think we should fix it.
In this pr, instead of overriding def dialect in conf of HiveContext, I set the 
SQLConf.DIALECT directly as "hiveql", such that result of "set 
spark.sql.dialect;" will be "hiveql", not "sql". After the change, we can still 
use "sql" as the dialect in HiveContext through "set spark.sql.dialect=sql". 
Then the conf.dialect in HiveContext will become sql. Because in SQLConf, def 
dialect = getConf(), and now the dialect in "settings" becomes "sql".


> unnecessary def dialectClassName in HiveContext, and misleading dialect conf 
> at the start of spark-sql
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-11398
>                 URL: https://issues.apache.org/jira/browse/SPARK-11398
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Zhenhua Wang
>            Priority: Minor
>
> 1. def dialectClassName in HiveContext is unnecessary. 
> In HiveContext, if conf.dialect == "hiveql", getSQLDialect() will return new 
> HiveQLDialect(this);
> else it will use super.getSQLDialect(). Then in super.getSQLDialect(), it 
> calls dialectClassName, which is overriden in HiveContext and still return 
> super.dialectClassName.
> So we'll never reach the code "classOf[HiveQLDialect].getCanonicalName" of 
> def dialectClassName in HiveContext.
> 2. When we start bin/spark-sql, the default context is HiveContext, and the 
> corresponding dialect is hiveql.
> However, if we type "set spark.sql.dialect;", the result is "sql", which is 
> inconsistent with the actual dialect and is misleading. For example, we can 
> use sql like "create table" which is only allowed in hiveql, but this dialect 
> conf shows it's "sql".
> Although this problem will not cause any execution error, it's misleading to 
> spark sql users. Therefore I think we should fix it.
> In this pr, instead of overriding def dialect in conf of HiveContext, I set 
> the SQLConf.DIALECT directly as "hiveql", such that result of "set 
> spark.sql.dialect;" will be "hiveql", not "sql". After the change, we can 
> still use "sql" as the dialect in HiveContext through "set 
> spark.sql.dialect=sql". Then the conf.dialect in HiveContext will become sql. 
> Because in SQLConf, def dialect = getConf(), and now the dialect in 
> "settings" becomes "sql".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-11398) unnecessary def dialectClassName in HiveContext, and misleading dialect conf at the start of spark-sql

Reply via email to