Miklos Szurap created HIVE-25466:
------------------------------------

             Summary: Trim spaces from db and table names
                 Key: HIVE-25466
                 URL: https://issues.apache.org/jira/browse/HIVE-25466
             Project: Hive
          Issue Type: Bug
          Components: Parser, SQL
            Reporter: Miklos Szurap


If we create databases and tables with leading/trailing whitespaces (using 
backticks) the behavior is inconsistent and leads to multiple problems.

Creating database with spaces makes it part of the database name, from then on 
they must be used with backticks.
{code}
0: jdbc:hive2://hs2> create database `mydb1 `;
INFO  : OK
0: jdbc:hive2://hs2> desc database `mydb1 `;
Location: "/warehouse/tablespace/external/hive/mydb1 .db"
{code}
With leading spaces the database can be created, but it can't be referenced 
anymore:
{code}
0: jdbc:hive2://hs2> create database ` mydb2`;
INFO  : OK
0: jdbc:hive2://hs2> desc database ` mydb2`;
Error: Error while compiling statement: FAILED: SemanticException [Error 
10072]: Database does not exist:  mydb2 (state=42000,code=10072)

0: jdbc:hive2://hs2> !outputformat xmlattr
0: jdbc:hive2://hs2> show databases;
<resultset>
  <result database_name=" mydb2"/>
  <result database_name="default"/>
  <result database_name="information_schema"/>
  <result database_name="mydb1 "/>
  <result database_name="sys"/>
</resultset>
{code}
For tables the spaces are trimmed - the tables are created without leading or 
trailing spaces in their names. However as the below example shows the space is 
kept within the table location.
{code}
0: jdbc:hive2://hs2> create external table `mytbl1 ` (col1 string);
INFO  : OK
0: jdbc:hive2://hs2> desc formatted `mytbl1 `;
Location: "hdfs://namenode:8020/warehouse/tablespace/external/hive/mytbl1 "

0: jdbc:hive2://hs2> create external table ` mytbl2` (col1 string);
INFO  : OK
0: jdbc:hive2://hs2> desc formatted ` mytbl2`;
Location: "hdfs://namenode:8020/warehouse/tablespace/external/hive/ mytbl2"

0: jdbc:hive2://hs2> show tables;
<resultset>
  <result tab_name="mytbl1"/>
  <result tab_name="mytbl2"/>
</resultset>
{code}
Interestingly during table creation or other operations like "use" the database 
name's is trimmed.
{code}
0: jdbc:hive2://hs2> create database mydb3;
INFO  : OK
0: jdbc:hive2://hs2> create table ` mydb3`.`mytbl3` (col1 string);
INFO  : OK
0: jdbc:hive2://hs2> use `  mydb3  `;
INFO  : OK
0: jdbc:hive2://hs2> show tables;
<resultset>
  <result tab_name="mytbl3"/>
</resultset>
{code}
One can validate with hdfs commands that the locations have even the trailing 
spaces. Keeping the space in the HDFS location is inconsistent with the table 
names and also confusing in multiple ways (like you cannot see the trailing 
space), and sounds a very bad pattern.

This is even more problematic because in the underlying HMS database the 
{{NOTIFICATION_LOG}} entries are created _with_ the spaces as it has been 
passed in the SQL statement even if the table name is trimmed - which is 
providing incorrect information to the other components relying on the 
{{NOTIFICATION_LOG}}.

Hive should trim completely the database and table names in the SQL statements 
- without propagating that forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to