[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables

2015-07-07 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617271#comment-14617271
 ] 

Sushanth Sowmyan commented on HIVE-8:
-

I have a question here - I will open another bug if need be, but if it's a 
simple misunderstanding, it won't matter.

From the patch, I see the following bit:

{code}
337   private void ensureFileFormatsMatch(TableSpec ts, URI 
fromURI) throws SemanticException {
338 Class? extends InputFormat destInputFormat = 
ts.tableHandle.getInputFormatClass();
339 // Other file formats should do similar check to make sure file 
formats match
340 // when doing LOAD DATA .. INTO TABLE
341 if (OrcInputFormat.class.equals(destInputFormat)) {
342   Path inputFilePath = new Path(fromURI);
343   try {
344 FileSystem fs = FileSystem.get(fromURI, conf);
345 // just creating orc reader is going to do sanity checks to 
make sure its valid ORC file
346 OrcFile.createReader(fs, inputFilePath);
347   } catch (FileFormatException e) {
348 throw new 
SemanticException(ErrorMsg.INVALID_FILE_FORMAT_IN_LOAD.getMsg(Destination +
349  table is stored as ORC but the file being loaded is not a 
valid ORC file.));
350   } catch (IOException e) {
351 throw new SemanticException(Unable to load data to destination 
table. +
352  Error:  + e.getMessage());
353   }
354 }
355   }
{code}

Now, it's entirely possible that the table in question is an ORC table, but the 
partition being loaded is of another format, such as Text - Hive supports mixed 
partition scenarios. In fact, this is a likely scenario in the case of a 
replication of a table that used to be Text, but has been converted to Orc, so 
that all new partitions will be orc. Then, in that case, the destination table 
will be a MANAGED_TABLE, and will be an orc table, but import will try to 
load a text partition on to it.

Shouldn't this refer to a partitionspec rather than the table's inputformat for 
this check to work with that scenario?

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables

2015-07-07 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617264#comment-14617264
 ] 

Sushanth Sowmyan commented on HIVE-8:
-

Thanks, [~leftylev]! Added.

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables

2015-07-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617317#comment-14617317
 ] 

Prasanth Jayachandran commented on HIVE-8:
--

[~sushanth] Thanks for looking into this. Yes. Its entirely possible that table 
desc is ORC and partition desc is of other formats. Hive supports that. I 
missed that part when I put up the patch. The check should use the partition 
desc for partitioned table instead of table desc throughout. Can you please 
create a separate bug for it? I will address it shortly.

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables

2015-06-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604843#comment-14604843
 ] 

Lefty Leverenz commented on HIVE-8:
---

Nudge:  This needs to show Fix Versions 1.3.0 and 2.0.0.

(Commits 49da35903f8334d6dd0c597563c34388772914cc  
d373962de475ea9f3ef7b2594fbc5d8488636af0.)

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables

2015-06-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603885#comment-14603885
 ] 

Hive QA commented on HIVE-8:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12742179/HIVE-8.4.patch

{color:green}SUCCESS:{color} +1 9030 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4402/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4402/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4402/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12742179 - PreCommit-HIVE-TRUNK-Build

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)