[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567336#comment-13567336
 ] 

Namit Jain commented on HIVE-3833:
--

[~jakobhoman], this was definitely not intentional. Unfortunately, there was no 
test case, so I missed this.
Can you provide me a complete testcase ? I will take a look.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.11.0
>
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
> hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, 
> hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, 
> hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, 
> hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-30 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566913#comment-13566913
 ] 

Jakob Homan commented on HIVE-3833:
---

This patch has broken Avro (and probably HBase and Cassandra) for partitioned 
tables since it no longer passes the table properties down to the serde:
{noformat}+Properties partProps =
+(pd.getPartSpec() == null || pd.getPartSpec().isEmpty()) ?
+pd.getTableDesc().getProperties() : pd.getProperties();{noformat}
Was this intentional?  If so, it's a breaking change and should be marked as 
such.  If not, since it's not been in a release yet, can we revert the patch?  
See HIVE-3953.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.11.0
>
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
> hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, 
> hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, 
> hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, 
> hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562697#comment-13562697
 ] 

Hudson commented on HIVE-3833:
--

Integrated in Hive-trunk-hadoop2 #86 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/86/])
HIVE-3833 : object inspectors should be initialized based on partition 
metadata (Namit Jain via Ashutosh Chauhan) (Revision 1438111)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438111
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapper.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectPair.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/metadata/TestPartition.java
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat10.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat11.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat12.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat13.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat14.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat8.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat9.q
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin12.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin13.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/columnstats_partlvl.q.out
* /hive/trunk/ql/src/test/results/clientpositive/combine2_hadoop20.q.out
* /hive/trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_sort_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input23.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input42.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join26.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join33.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part8.q.out
* /hive/trunk/ql/src/test/resul

[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562545#comment-13562545
 ] 

Hudson commented on HIVE-3833:
--

Integrated in Hive-trunk-h0.21 #1935 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1935/])
HIVE-3833 : object inspectors should be initialized based on partition 
metadata (Namit Jain via Ashutosh Chauhan) (Revision 1438111)

 Result = SUCCESS
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438111
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapper.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectPair.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/metadata/TestPartition.java
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat10.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat11.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat12.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat13.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat14.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat8.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat9.q
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin12.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin13.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/columnstats_partlvl.q.out
* /hive/trunk/ql/src/test/results/clientpositive/combine2_hadoop20.q.out
* /hive/trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_sort_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input23.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input42.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join26.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join33.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part8.q.out
* /hive/trunk/ql/src/test/resul

[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562009#comment-13562009
 ] 

Hudson commented on HIVE-3833:
--

Integrated in hive-trunk-hadoop1 #41 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/41/])
HIVE-3833 : object inspectors should be initialized based on partition 
metadata (Namit Jain via Ashutosh Chauhan) (Revision 1438111)

 Result = ABORTED
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438111
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapper.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectPair.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/metadata/TestPartition.java
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat10.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat11.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat12.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat13.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat14.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat8.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_wise_fileformat9.q
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin12.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin13.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/columnstats_partlvl.q.out
* /hive/trunk/ql/src/test/results/clientpositive/combine2_hadoop20.q.out
* /hive/trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_ppr.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_sort_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input23.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input42.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input_part9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join26.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join33.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out
* /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part8.q.out
* /hive/trunk/ql/src/test/resul

[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-24 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561535#comment-13561535
 ] 

Namit Jain commented on HIVE-3833:
--

Yes, the tests passed for me for .23

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
> hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, 
> hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, 
> hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, 
> hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560983#comment-13560983
 ] 

Ashutosh Chauhan commented on HIVE-3833:


+1 for latest patch. Is .23 the latest complete patch ? Running tests on that 
now.


> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
> hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, 
> hive.3833.21.patch, hive.3833.22.patch, hive.3833.23.patch, 
> hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, 
> hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559982#comment-13559982
 ] 

Ashutosh Chauhan commented on HIVE-3833:


bq. For the above, it is fairly difficult to address. In a follow-up, I can add 
a serde level property, which indicates that the serde can handle different 
datatypes (for eg. lazySimpleSerde) - if all the partitions of the table have 
serde's with this property, then we can use identityConverter. This is kind of 
hacky, and am not sure if it is useful, since it should not be a common case. 
Usually, the partition schema should match the table schema.

I think this really is a common case. Folks usually change the serde of an 
existing table usually when they find a better FileFormat or sometime when 
there is a better serde, both of which is a rare occurrence. So, I think we 
need to think about optimizing this case. Though I agree approach you suggested 
is hacky. We need to think of a better approach, probably in a follow-up jira.

Also thanks for updating the patch.  Some more comments on latest patch are on 
phabricator. Also are we going to loose any lazy aspects of deserialization 
here? I guess not, because we are just wiring up OIs. But, want to make sure. 
Can you verify?


> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
> hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.2.patch, 
> hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, 
> hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559745#comment-13559745
 ] 

Namit Jain commented on HIVE-3833:
--

The tests finished fine.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
> hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.2.patch, 
> hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, 
> hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559572#comment-13559572
 ] 

Namit Jain commented on HIVE-3833:
--

bq. In case of identity converter, there is no conversion cost, but in case of 
non-identity this will be worse than current impl, since converter will examine 
every single column value, which wasn't the case earlier. However, it's not 
clear how expensive this would be?

For the above, it is fairly difficult to address. In a follow-up, I can add a 
serde level property, which indicates that the serde can handle different 
datatypes (for eg.
lazySimpleSerde) - if all the partitions of the table have serde's with this 
property, then we can use identityConverter. This is kind of hacky, and am not 
sure if it is
useful, since it should not be a common case. Usually, the partition schema 
should match the table schema.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
> hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.2.patch, 
> hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, 
> hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559568#comment-13559568
 ] 

Namit Jain commented on HIVE-3833:
--

Addressed the comments including the last one.



> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.18.patch, 
> hive.3833.19.patch, hive.3833.1.patch, hive.3833.20.patch, hive.3833.2.patch, 
> hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, 
> hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559487#comment-13559487
 ] 

Ashutosh Chauhan commented on HIVE-3833:


Comments on phabricator.
  * I have made bunch of requests to rename functions, feel free to use better 
names than what I suggested if you feel like.
  * I have not reviewed the new tests that you have added. I assume you have 
verified those.
  * pm.retrieveAll() change in ObjectStore() is of concern to me. If the 
comments I made there are valid, please take time to see if we can do something 
better there.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.1.patch, hive.3833.2.patch, 
> hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, 
> hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558827#comment-13558827
 ] 

Namit Jain commented on HIVE-3833:
--

Refreshed, tests passed.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.16.path, hive.3833.17.patch, hive.3833.1.patch, hive.3833.2.patch, 
> hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, 
> hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557284#comment-13557284
 ] 

Namit Jain commented on HIVE-3833:
--

Only if the 2 schemas are different, otherwise it is identityConverter

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557258#comment-13557258
 ] 

Ashutosh Chauhan commented on HIVE-3833:


Could this possibly result in performance hit (CPU)? Earlier, data was 
deserialized per table schema, now it will be first deserialized per partition 
schema and than converted to comply with table schema.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555910#comment-13555910
 ] 

Namit Jain commented on HIVE-3833:
--

[~ashutoshc], I am not refreshing, so some of the test results may need to be 
updated.
Refreshing should not lead to major code changes, so you can still review the 
code changes.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554837#comment-13554837
 ] 

Namit Jain commented on HIVE-3833:
--

Today, pathToPartitionInfo actually contains path -> TableInfo
numPartitions is part of TableInfo.

Since, I have changed it to be path -> partitionInfo, numPartitions is going 
away.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554828#comment-13554828
 ] 

Ashutosh Chauhan commented on HIVE-3833:


Thanks for explaining. Makes sense. I will take a closer look at this tomorrow. 
Briefly looking at diffs, from many .q.out files {{numPartitions}} is getting 
removed. That looks like loss of info. Whats the reason for that?

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-15 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554811#comment-13554811
 ] 

Namit Jain commented on HIVE-3833:
--

bq. Seems to me, this patch will take away the flexibility of combining 
partitions of different schemas in one split. That sounds like lesser 
flexibility instead of more.

No, I am not sure whether I added a test for that, but that should be possible. 
We know when a partition is being changed.

bq. Shouldn't we be fixing LazyColumnarBinarySerde in that case, instead of 
restricting combining of partitions of different schemas in one split?

That is not the problem (combining partitions) - the problem is that any binary 
serde will use the datatypes for serialization, i.e it will have different 
storage for int and string - otherwise, what is the point of it being binary ? 
In case case, unless we use the partition schema (instead of
table schema), we can get wrong results.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554785#comment-13554785
 ] 

Ashutosh Chauhan commented on HIVE-3833:


bq. Instead of that, different partitions should be part of the same split, 
only if the partition schemas exactly match. That would give greater flexibility

Seems to me, this patch will take away the flexibility of combining partitions 
of different schemas in one split. That sounds like lesser flexibility instead 
of more.

bq.  And, we dont allow to change the schema for LazyColumnarBinarySerDe.
Shouldn't we be fixing LazyColumnarBinarySerde in that case, instead of 
restricting combining of partitions of different schemas in one split?


> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-14 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552825#comment-13552825
 ] 

Namit Jain commented on HIVE-3833:
--

Refreshed, all the tests passed.

https://issues.apache.org/jira/secure/attachment/12564718/hive.3833.14.patch 
contains all the changes.

The phabricatr entry does not contain the changes for test results compiler 
files, since it was exceeding the limit.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540400#comment-13540400
 ] 

Namit Jain commented on HIVE-3833:
--

Running tests now.

The basic idea is: use partition metadata to read the data. Convert it to use 
table metadata, the rest of the
stack does not need to know about the conversion.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539888#comment-13539888
 ] 

Namit Jain commented on HIVE-3833:
--

https://reviews.facebook.net/D7653

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.1.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-26 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539588#comment-13539588
 ] 

Namit Jain commented on HIVE-3833:
--

The object inspectors need to be initialized based on partition metadata.
That leaves us with the following options:
1. Create an operator tree per partition
2. Create a dummy operator after table scan (which converts the partition data 
into table data).
   This operator will be different for different inputs.

Option 2. seems like a better option.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539203#comment-13539203
 ] 

Namit Jain commented on HIVE-3833:
--

The possible options are to not allow the schema to be changed with 
LazyColumnarSerDe (only allow additions),
or use partition metadata for inspectors.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539201#comment-13539201
 ] 

Namit Jain commented on HIVE-3833:
--

Consider the following test:

set hive.input.format = org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

create table partition_test_partitioned(key string, value string) partitioned 
by (dt string) stored as rcfile;

alter table partition_test_partitioned set serde 
'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe';
insert overwrite table partition_test_partitioned partition(dt='1') select * 
from src where key = 238;

alter table partition_test_partitioned change key key int; 


The query:
select * from partition_test_partitioned where dt is not null;

returns:

50  val_238 1
50  val_238 1

This is due to the fact that the key column was serialized as a string column, 
and is now being read as a integer.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira