[ 
https://issues.apache.org/jira/browse/HIVE-11118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617271#comment-14617271
 ] 

Sushanth Sowmyan commented on HIVE-11118:
-----------------------------------------

I have a question here - I will open another bug if need be, but if it's a 
simple misunderstanding, it won't matter.

>From the patch, I see the following bit:

{code}
        337       private void ensureFileFormatsMatch(TableSpec ts, URI 
fromURI) throws SemanticException {
338         Class<? extends InputFormat> destInputFormat = 
ts.tableHandle.getInputFormatClass();
339         // Other file formats should do similar check to make sure file 
formats match
340         // when doing LOAD DATA .. INTO TABLE
341         if (OrcInputFormat.class.equals(destInputFormat)) {
342           Path inputFilePath = new Path(fromURI);
343           try {
344             FileSystem fs = FileSystem.get(fromURI, conf);
345             // just creating orc reader is going to do sanity checks to 
make sure its valid ORC file
346             OrcFile.createReader(fs, inputFilePath);
347           } catch (FileFormatException e) {
348             throw new 
SemanticException(ErrorMsg.INVALID_FILE_FORMAT_IN_LOAD.getMsg("Destination" +
349                 " table is stored as ORC but the file being loaded is not a 
valid ORC file."));
350           } catch (IOException e) {
351             throw new SemanticException("Unable to load data to destination 
table." +
352                 " Error: " + e.getMessage());
353           }
354         }
355       }
{code}

Now, it's entirely possible that the table in question is an ORC table, but the 
partition being loaded is of another format, such as Text - Hive supports mixed 
partition scenarios. In fact, this is a likely scenario in the case of a 
replication of a table that used to be Text, but has been converted to Orc, so 
that all new partitions will be orc. Then, in that case, the destination table 
will be a MANAGED_TABLE, and will be an "orc" table, but import will try to 
load a text partition on to it.

Shouldn't this refer to a partitionspec rather than the table's inputformat for 
this check to work with that scenario?

> Load data query should validate file formats with destination tables
> --------------------------------------------------------------------
>
>                 Key: HIVE-11118
>                 URL: https://issues.apache.org/jira/browse/HIVE-11118
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>             Fix For: 1.3.0, 2.0.0
>
>         Attachments: HIVE-11118.2.patch, HIVE-11118.3.patch, 
> HIVE-11118.4.patch, HIVE-11118.patch
>
>
> Load data local inpath queries does not do any validation wrt file format. If 
> the destination table is ORC and if we try to load files that are not ORC, 
> the load will succeed but querying such tables will result in runtime 
> exceptions. We can do some simple sanity checks to prevent loading of files 
> that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to