Sahil Takiar created HIVE-15367:
-----------------------------------

             Summary: CTAS with LOCATION should write temp data under location 
directory rather than database location
                 Key: HIVE-15367
                 URL: https://issues.apache.org/jira/browse/HIVE-15367
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar


For regular CTAS queries, temp data from a SELECT query will be written to to a 
staging directory under the database location. The code to control this is in 
{{SemanticAnalyzer.java}}

{code}
             // allocate a temporary output dir on the location of the table
              String tableName = getUnescapedName((ASTNode) ast.getChild(0));
              String[] names = Utilities.getDbTableName(tableName);
              Path location;
              try {
                Warehouse wh = new Warehouse(conf);
                //Use destination table's db location.
                String destTableDb = qb.getTableDesc() != null? 
qb.getTableDesc().getDatabaseName(): null;
                if (destTableDb == null) {
                  destTableDb = names[0];
                }
                location = wh.getDatabasePath(db.getDatabase(destTableDb));
              } catch (MetaException e) {
                throw new SemanticException(e);
              }
{code}

However, CTAS queries allow specifying a {{LOCATION}} for the new table. Its 
possible for this location to be on a different filesystem than the database 
location. If this happens temp data will be written to the database filesystem 
and will be copied to the table filesystem in {{MoveTask}}.

This extra copying of data can drastically affect performance. Rather than 
always use the database location as the staging dir for CTAS queries, Hive 
should first check if there is an explicit {{LOCATION}} specified in the CTAS 
query. If there is, staging data should be stored under the {{LOCATION}} 
directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to