[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617271#comment-14617271 ] Sushanth Sowmyan commented on HIVE-8: - I have a question here - I will open another bug if need be, but if it's a simple misunderstanding, it won't matter. From the patch, I see the following bit: {code} 337 private void ensureFileFormatsMatch(TableSpec ts, URI fromURI) throws SemanticException { 338 Class? extends InputFormat destInputFormat = ts.tableHandle.getInputFormatClass(); 339 // Other file formats should do similar check to make sure file formats match 340 // when doing LOAD DATA .. INTO TABLE 341 if (OrcInputFormat.class.equals(destInputFormat)) { 342 Path inputFilePath = new Path(fromURI); 343 try { 344 FileSystem fs = FileSystem.get(fromURI, conf); 345 // just creating orc reader is going to do sanity checks to make sure its valid ORC file 346 OrcFile.createReader(fs, inputFilePath); 347 } catch (FileFormatException e) { 348 throw new SemanticException(ErrorMsg.INVALID_FILE_FORMAT_IN_LOAD.getMsg(Destination + 349 table is stored as ORC but the file being loaded is not a valid ORC file.)); 350 } catch (IOException e) { 351 throw new SemanticException(Unable to load data to destination table. + 352 Error: + e.getMessage()); 353 } 354 } 355 } {code} Now, it's entirely possible that the table in question is an ORC table, but the partition being loaded is of another format, such as Text - Hive supports mixed partition scenarios. In fact, this is a likely scenario in the case of a replication of a table that used to be Text, but has been converted to Orc, so that all new partitions will be orc. Then, in that case, the destination table will be a MANAGED_TABLE, and will be an orc table, but import will try to load a text partition on to it. Shouldn't this refer to a partitionspec rather than the table's inputformat for this check to work with that scenario? Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 1.3.0, 2.0.0 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617264#comment-14617264 ] Sushanth Sowmyan commented on HIVE-8: - Thanks, [~leftylev]! Added. Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 1.3.0, 2.0.0 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617317#comment-14617317 ] Prasanth Jayachandran commented on HIVE-8: -- [~sushanth] Thanks for looking into this. Yes. Its entirely possible that table desc is ORC and partition desc is of other formats. Hive supports that. I missed that part when I put up the patch. The check should use the partition desc for partitioned table instead of table desc throughout. Can you please create a separate bug for it? I will address it shortly. Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 1.3.0, 2.0.0 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604843#comment-14604843 ] Lefty Leverenz commented on HIVE-8: --- Nudge: This needs to show Fix Versions 1.3.0 and 2.0.0. (Commits 49da35903f8334d6dd0c597563c34388772914cc d373962de475ea9f3ef7b2594fbc5d8488636af0.) Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603885#comment-14603885 ] Hive QA commented on HIVE-8: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742179/HIVE-8.4.patch {color:green}SUCCESS:{color} +1 9030 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4402/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4402/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4402/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742179 - PreCommit-HIVE-TRUNK-Build Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)