[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567336#comment-13567336 ] Namit Jain commented on HIVE-3833: -- [~jakobhoman], this was definitely not intentional. Unfortunately, there was no test case, so I missed this. Can you provide me a complete testcase ? I will take a look. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.11.0 > > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, > hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, > hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, > hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, > hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566913#comment-13566913 ] Jakob Homan commented on HIVE-3833: --- This patch has broken Avro (and probably HBase and Cassandra) for partitioned tables since it no longer passes the table properties down to the serde: {noformat}+Properties partProps = +(pd.getPartSpec() == null || pd.getPartSpec().isEmpty()) ? +pd.getTableDesc().getProperties() : pd.getProperties();{noformat} Was this intentional? If so, it's a breaking change and should be marked as such. If not, since it's not been in a release yet, can we revert the patch? See HIVE-3953. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.11.0 > > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, > hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, > hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, > hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, > hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562697#comment-13562697 ] Hudson commented on HIVE-3833: -- Integrated in Hive-trunk-hadoop2 #86 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/86/]) HIVE-3833 : object inspectors should be initialized based on partition metadata (Namit Jain via Ashutosh Chauhan) (Revision 1438111) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438111 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapper.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectPair.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/metadata/TestPartition.java * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat10.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat11.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat12.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat13.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat14.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat8.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat9.q * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin12.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin13.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin7.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin8.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin9.q.out * /hive/trunk/ql/src/test/results/clientpositive/columnstats_partlvl.q.out * /hive/trunk/ql/src/test/results/clientpositive/combine2_hadoop20.q.out * /hive/trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_sort_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/input23.q.out * /hive/trunk/ql/src/test/results/clientpositive/input42.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part1.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part2.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part7.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part9.q.out * /hive/trunk/ql/src/test/results/clientpositive/join26.q.out * /hive/trunk/ql/src/test/results/clientpositive/join33.q.out * /hive/trunk/ql/src/test/results/clientpositive/join9.q.out * /hive/trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part8.q.out * /hive/trunk/ql/src/test/resul
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562545#comment-13562545 ] Hudson commented on HIVE-3833: -- Integrated in Hive-trunk-h0.21 #1935 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1935/]) HIVE-3833 : object inspectors should be initialized based on partition metadata (Namit Jain via Ashutosh Chauhan) (Revision 1438111) Result = SUCCESS hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438111 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapper.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectPair.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/metadata/TestPartition.java * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat10.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat11.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat12.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat13.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat14.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat8.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat9.q * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin12.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin13.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin7.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin8.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin9.q.out * /hive/trunk/ql/src/test/results/clientpositive/columnstats_partlvl.q.out * /hive/trunk/ql/src/test/results/clientpositive/combine2_hadoop20.q.out * /hive/trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_sort_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/input23.q.out * /hive/trunk/ql/src/test/results/clientpositive/input42.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part1.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part2.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part7.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part9.q.out * /hive/trunk/ql/src/test/results/clientpositive/join26.q.out * /hive/trunk/ql/src/test/results/clientpositive/join33.q.out * /hive/trunk/ql/src/test/results/clientpositive/join9.q.out * /hive/trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part8.q.out * /hive/trunk/ql/src/test/resul
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562009#comment-13562009 ] Hudson commented on HIVE-3833: -- Integrated in hive-trunk-hadoop1 #41 (See [https://builds.apache.org/job/hive-trunk-hadoop1/41/]) HIVE-3833 : object inspectors should be initialized based on partition metadata (Namit Jain via Ashutosh Chauhan) (Revision 1438111) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438111 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapper.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectPair.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/metadata/TestPartition.java * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat10.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat11.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat12.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat13.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat14.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat8.q * /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat9.q * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin12.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin13.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin7.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin8.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin9.q.out * /hive/trunk/ql/src/test/results/clientpositive/columnstats_partlvl.q.out * /hive/trunk/ql/src/test/results/clientpositive/combine2_hadoop20.q.out * /hive/trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_sort_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/input23.q.out * /hive/trunk/ql/src/test/results/clientpositive/input42.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part1.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part2.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part7.q.out * /hive/trunk/ql/src/test/results/clientpositive/input_part9.q.out * /hive/trunk/ql/src/test/results/clientpositive/join26.q.out * /hive/trunk/ql/src/test/results/clientpositive/join33.q.out * /hive/trunk/ql/src/test/results/clientpositive/join9.q.out * /hive/trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out * /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part8.q.out * /hive/trunk/ql/src/test/resul
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561535#comment-13561535 ] Namit Jain commented on HIVE-3833: -- Yes, the tests passed for me for .23 > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, > hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, > hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, > hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, > hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560983#comment-13560983 ] Ashutosh Chauhan commented on HIVE-3833: +1 for latest patch. Is .23 the latest complete patch ? Running tests on that now. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, > hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, > hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, > hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, > hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559982#comment-13559982 ] Ashutosh Chauhan commented on HIVE-3833: bq. For the above, it is fairly difficult to address. In a follow-up, I can add a serde level property, which indicates that the serde can handle different datatypes (for eg. lazySimpleSerde) - if all the partitions of the table have serde's with this property, then we can use identityConverter. This is kind of hacky, and am not sure if it is useful, since it should not be a common case. Usually, the partition schema should match the table schema. I think this really is a common case. Folks usually change the serde of an existing table usually when they find a better FileFormat or sometime when there is a better serde, both of which is a rare occurrence. So, I think we need to think about optimizing this case. Though I agree approach you suggested is hacky. We need to think of a better approach, probably in a follow-up jira. Also thanks for updating the patch. Some more comments on latest patch are on phabricator. Also are we going to loose any lazy aspects of deserialization here? I guess not, because we are just wiring up OIs. But, want to make sure. Can you verify? > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, > hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.2.patch, > hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, > hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559745#comment-13559745 ] Namit Jain commented on HIVE-3833: -- The tests finished fine. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, > hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.2.patch, > hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, > hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559572#comment-13559572 ] Namit Jain commented on HIVE-3833: -- bq. In case of identity converter, there is no conversion cost, but in case of non-identity this will be worse than current impl, since converter will examine every single column value, which wasn't the case earlier. However, it's not clear how expensive this would be? For the above, it is fairly difficult to address. In a follow-up, I can add a serde level property, which indicates that the serde can handle different datatypes (for eg. lazySimpleSerde) - if all the partitions of the table have serde's with this property, then we can use identityConverter. This is kind of hacky, and am not sure if it is useful, since it should not be a common case. Usually, the partition schema should match the table schema. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, > hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.2.patch, > hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, > hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559568#comment-13559568 ] Namit Jain commented on HIVE-3833: -- Addressed the comments including the last one. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, > hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.2.patch, > hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, > hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559487#comment-13559487 ] Ashutosh Chauhan commented on HIVE-3833: Comments on phabricator. * I have made bunch of requests to rename functions, feel free to use better names than what I suggested if you feel like. * I have not reviewed the new tests that you have added. I assume you have verified those. * pm.retrieveAll() change in ObjectStore() is of concern to me. If the comments I made there are valid, please take time to see if we can do something better there. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.1.patch, hive.3833.2.patch, > hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, > hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558827#comment-13558827 ] Namit Jain commented on HIVE-3833: -- Refreshed, tests passed. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.16.path, hive.3833.17.patch, hive.3833.1.patch, hive.3833.2.patch, > hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, > hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557284#comment-13557284 ] Namit Jain commented on HIVE-3833: -- Only if the 2 schemas are different, otherwise it is identityConverter > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, > hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, > hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557258#comment-13557258 ] Ashutosh Chauhan commented on HIVE-3833: Could this possibly result in performance hit (CPU)? Earlier, data was deserialized per table schema, now it will be first deserialized per partition schema and than converted to comply with table schema. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, > hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, > hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555910#comment-13555910 ] Namit Jain commented on HIVE-3833: -- [~ashutoshc], I am not refreshing, so some of the test results may need to be updated. Refreshing should not lead to major code changes, so you can still review the code changes. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, > hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, > hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554837#comment-13554837 ] Namit Jain commented on HIVE-3833: -- Today, pathToPartitionInfo actually contains path -> TableInfo numPartitions is part of TableInfo. Since, I have changed it to be path -> partitionInfo, numPartitions is going away. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, > hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, > hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554828#comment-13554828 ] Ashutosh Chauhan commented on HIVE-3833: Thanks for explaining. Makes sense. I will take a closer look at this tomorrow. Briefly looking at diffs, from many .q.out files {{numPartitions}} is getting removed. That looks like loss of info. Whats the reason for that? > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, > hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, > hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554811#comment-13554811 ] Namit Jain commented on HIVE-3833: -- bq. Seems to me, this patch will take away the flexibility of combining partitions of different schemas in one split. That sounds like lesser flexibility instead of more. No, I am not sure whether I added a test for that, but that should be possible. We know when a partition is being changed. bq. Shouldn't we be fixing LazyColumnarBinarySerde in that case, instead of restricting combining of partitions of different schemas in one split? That is not the problem (combining partitions) - the problem is that any binary serde will use the datatypes for serialization, i.e it will have different storage for int and string - otherwise, what is the point of it being binary ? In case case, unless we use the partition schema (instead of table schema), we can get wrong results. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, > hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, > hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554785#comment-13554785 ] Ashutosh Chauhan commented on HIVE-3833: bq. Instead of that, different partitions should be part of the same split, only if the partition schemas exactly match. That would give greater flexibility Seems to me, this patch will take away the flexibility of combining partitions of different schemas in one split. That sounds like lesser flexibility instead of more. bq. And, we dont allow to change the schema for LazyColumnarBinarySerDe. Shouldn't we be fixing LazyColumnarBinarySerde in that case, instead of restricting combining of partitions of different schemas in one split? > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, > hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, > hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552825#comment-13552825 ] Namit Jain commented on HIVE-3833: -- Refreshed, all the tests passed. https://issues.apache.org/jira/secure/attachment/12564718/hive.3833.14.patch contains all the changes. The phabricatr entry does not contain the changes for test results compiler files, since it was exceeding the limit. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.10.patch, hive.3833.11.patch, > hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, > hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, > hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, > hive.3833.9.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540400#comment-13540400 ] Namit Jain commented on HIVE-3833: -- Running tests now. The basic idea is: use partition metadata to read the data. Convert it to use table metadata, the rest of the stack does not need to know about the conversion. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539888#comment-13539888 ] Namit Jain commented on HIVE-3833: -- https://reviews.facebook.net/D7653 > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.3833.1.patch > > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539588#comment-13539588 ] Namit Jain commented on HIVE-3833: -- The object inspectors need to be initialized based on partition metadata. That leaves us with the following options: 1. Create an operator tree per partition 2. Create a dummy operator after table scan (which converts the partition data into table data). This operator will be different for different inputs. Option 2. seems like a better option. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539203#comment-13539203 ] Namit Jain commented on HIVE-3833: -- The possible options are to not allow the schema to be changed with LazyColumnarSerDe (only allow additions), or use partition metadata for inspectors. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata
[ https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539201#comment-13539201 ] Namit Jain commented on HIVE-3833: -- Consider the following test: set hive.input.format = org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; create table partition_test_partitioned(key string, value string) partitioned by (dt string) stored as rcfile; alter table partition_test_partitioned set serde 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe'; insert overwrite table partition_test_partitioned partition(dt='1') select * from src where key = 238; alter table partition_test_partitioned change key key int; The query: select * from partition_test_partitioned where dt is not null; returns: 50 val_238 1 50 val_238 1 This is due to the fact that the key column was serialized as a string column, and is now being read as a integer. > object inspectors should be initialized based on partition metadata > --- > > Key: HIVE-3833 > URL: https://issues.apache.org/jira/browse/HIVE-3833 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > > Currently, different partitions can be picked up for the same input split > based on the > serdes' etc. And, we dont allow to change the schema for > LazyColumnarBinarySerDe. > Instead of that, different partitions should be part of the same split, only > if the > partition schemas exactly match. The operator tree object inspectors should > be based > on the partition schema. That would give greater flexibility and also help > using binary serde with rcfile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira