[jira] [Commented] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it
[ https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639272#comment-14639272 ] Mithun Radhakrishnan commented on HIVE-11344: - Ah, that's a good point. I didn't realize that {{HCatSplit}} or {{PartInfo}} might be serialized in situations other than M/R / Tez serialization of splits. At the time I wrote this, I did intend to check {{partitionSchema}}, {{inputFormatClassName}}, etc. for null, in their respective getters, and return the values from {{this.tableInfo}}. One "optimization" too far. +1 to Solution (a). > HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are > unusable after it > --- > > Key: HIVE-11344 > URL: https://issues.apache.org/jira/browse/HIVE-11344 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-11344.patch > > > HIVE-9845 introduced a notion of compression for HCatSplits so that when > serializing, it finds commonalities between PartInfo and TableInfo objects, > and if the two are identical, it nulls out that field in PartInfo, thus > making sure that when PartInfo is then serialized, info is not repeated. > This, however, has the side effect of making the PartInfo object unusable if > HCatSplit.write has been called. > While this does not affect M/R directly, since they do not know about the > PartInfo objects and once serialized, the HCatSplit object is recreated by > deserializing on the backend, which does restore the split and its PartInfo > objects, this does, however, affect framework users of HCat that try to mimic > M/R and then use the PartInfo objects to instantiate distinct readers. > Thus, we need to make it so that PartInfo is still usable after > HCatSplit.write is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it
[ https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638172#comment-14638172 ] Hive QA commented on HIVE-11344: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746652/HIVE-11344.patch {color:green}SUCCESS:{color} +1 9257 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4698/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4698/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4698/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12746652 - PreCommit-HIVE-TRUNK-Build > HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are > unusable after it > --- > > Key: HIVE-11344 > URL: https://issues.apache.org/jira/browse/HIVE-11344 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-11344.patch > > > HIVE-9845 introduced a notion of compression for HCatSplits so that when > serializing, it finds commonalities between PartInfo and TableInfo objects, > and if the two are identical, it nulls out that field in PartInfo, thus > making sure that when PartInfo is then serialized, info is not repeated. > This, however, has the side effect of making the PartInfo object unusable if > HCatSplit.write has been called. > While this does not affect M/R directly, since they do not know about the > PartInfo objects and once serialized, the HCatSplit object is recreated by > deserializing on the backend, which does restore the split and its PartInfo > objects, this does, however, affect framework users of HCat that try to mimic > M/R and then use the PartInfo objects to instantiate distinct readers. > Thus, we need to make it so that PartInfo is still usable after > HCatSplit.write is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)