[jira] [Commented] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it

2015-07-23 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639272#comment-14639272
 ] 

Mithun Radhakrishnan commented on HIVE-11344:
-

Ah, that's a good point. I didn't realize that {{HCatSplit}} or {{PartInfo}} 
might be serialized in situations other than M/R / Tez serialization of splits.

At the time I wrote this, I did intend to check {{partitionSchema}}, 
{{inputFormatClassName}}, etc. for null, in their respective getters, and 
return the values from {{this.tableInfo}}. One "optimization" too far.

+1 to Solution (a).

> HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are 
> unusable after it
> ---
>
> Key: HIVE-11344
> URL: https://issues.apache.org/jira/browse/HIVE-11344
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11344.patch
>
>
> HIVE-9845 introduced a notion of compression for HCatSplits so that when 
> serializing, it finds commonalities between PartInfo and TableInfo objects, 
> and if the two are identical, it nulls out that field in PartInfo, thus 
> making sure that when PartInfo is then serialized, info is not repeated.
> This, however, has the side effect of making the PartInfo object unusable if 
> HCatSplit.write has been called.
> While this does not affect M/R directly, since they do not know about the 
> PartInfo objects and once serialized, the HCatSplit object is recreated by 
> deserializing on the backend, which does restore the split and its PartInfo 
> objects, this does, however, affect framework users of HCat that try to mimic 
> M/R and then use the PartInfo objects to instantiate distinct readers.
> Thus, we need to make it so that PartInfo is still usable after 
> HCatSplit.write is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it

2015-07-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638172#comment-14638172
 ] 

Hive QA commented on HIVE-11344:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12746652/HIVE-11344.patch

{color:green}SUCCESS:{color} +1 9257 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4698/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4698/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4698/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12746652 - PreCommit-HIVE-TRUNK-Build

> HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are 
> unusable after it
> ---
>
> Key: HIVE-11344
> URL: https://issues.apache.org/jira/browse/HIVE-11344
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11344.patch
>
>
> HIVE-9845 introduced a notion of compression for HCatSplits so that when 
> serializing, it finds commonalities between PartInfo and TableInfo objects, 
> and if the two are identical, it nulls out that field in PartInfo, thus 
> making sure that when PartInfo is then serialized, info is not repeated.
> This, however, has the side effect of making the PartInfo object unusable if 
> HCatSplit.write has been called.
> While this does not affect M/R directly, since they do not know about the 
> PartInfo objects and once serialized, the HCatSplit object is recreated by 
> deserializing on the backend, which does restore the split and its PartInfo 
> objects, this does, however, affect framework users of HCat that try to mimic 
> M/R and then use the PartInfo objects to instantiate distinct readers.
> Thus, we need to make it so that PartInfo is still usable after 
> HCatSplit.write is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)