[jira] Commented: (HIVE-1538) FilterOperator is applied twice with ppd on.
[ https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925777#action_12925777 ] Amareshwari Sriramadasu commented on HIVE-1538: --- bq. I think the solution is to collect the operators who are contributing the predicates for final candidates of predicare pushdown and remove them from the final operator graph. This does not work as I thought earlier, because all the predicates in the FilterOperator may not be pushed. We might have to reconstruct the FilterOperator with un-pushed predicates. FilterOperator is applied twice with ppd on. Key: HIVE-1538 URL: https://issues.apache.org/jira/browse/HIVE-1538 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu With hive.optimize.ppd set to true, FilterOperator is applied twice. And it seems second operator is always filtering zero rows. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1694) Accelerate query execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikhil Deshpande updated HIVE-1694: --- Status: Patch Available (was: Open) This is a patch to demonstrate query performance gains using indexes (added in HIVE-417). The patch is over latest hive trunk. ChangeLog for the patch: - Implements a new rewrite for a certain set of queries with GROUP BY to speed those queries by running them on index data instead of base table. - Implements a skeleton generic rewrite engine. - Implements the rewrite rule for a GroupBy queries set (mentioned above). More details in the class comment GbToCompactSumIdxRewrite. - Rewrite needs to be currently explicitly enabled with a flag hive.ql.rw.gb_to_idx. - Modifies metastore metadata API for getting some index info. - Modifies QB metadata parseblock code to add some rewrite assist methods. - Inserts a rewrite hook into Semantic Analyzer. - Fixes a bug in ql QTestUtil to clean-up indexed tables properly - Contains new test for Group By rewrite using indexes: ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q Quick performance test results on a very small Hadoop cluster: 2 queries (chosen to demonstrate perf gains) run on TPC-H benchmark data lineitem table. Timings in seconds, data set size (1M, 1G etc.) is TPC-H scale factor. {noformat} --- 1M 1G 10G 30G --- q1_no_idx 24.161 76.790 506.005 1551.555 q1_with_idx 21.268 27.292 35.50286.133 --- q1_no_idx 73.660 130.587 764.619 2146.423 q2_with_idx 69.393 75.493 92.867 190.619 --- {noformat} Hadoop cluster description used for above perf test: - 2 server class machines (each box: CentOS 5.x Linux, 5 SAS disks in RAID5, 16GB RAM) - 2-node Hadoop cluster (0.20.2), un-tuned and un-optimized, data not partitioned and clustered, Hive tables stored in row-store format, HDFS replication factor: 2 - Sun JDK 1.6 (server mode JVM, JVM_HEAP_SIZE:4GB RAM) - Queries on TPC-H Data (lineitem table: 70% of TPC-H data size, e.g. TPC-H 30GB data: 21GB lineitem, ~180Million tuples) These changes are being maintained at http://github.com/prafullat/hive Accelerate query execution using indexes Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to hold the information about index based plans operator implementations for above mentioned cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1756) failures in fatal.q in TestNegativeCliDriver
[ https://issues.apache.org/jira/browse/HIVE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HIVE-1756: - Attachment: Hive-1756.patch remove fatal.q failures in fatal.q in TestNegativeCliDriver Key: HIVE-1756 URL: https://issues.apache.org/jira/browse/HIVE-1756 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Attachments: Hive-1756.patch This is probably caused by HIVE-1641 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1756) failures in fatal.q in TestNegativeCliDriver
[ https://issues.apache.org/jira/browse/HIVE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1756. -- Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Reviewed] Committed. Thanks Liyin failures in fatal.q in TestNegativeCliDriver Key: HIVE-1756 URL: https://issues.apache.org/jira/browse/HIVE-1756 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Liyin Tang Fix For: 0.7.0 Attachments: Hive-1756.patch This is probably caused by HIVE-1641 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.20 #406
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/406/changes Changes: [namit] HIVE-1757 Test cleanup for 1641 (Liyin Tang via namit) [namit] HIVE-1755 Update broken test outputs due to 1641 (He Yongqiang via namit) [namit] HIVE-474 Support for distinct selection on two or more columns (Amareshwari Sriramadasu via namit) -- [...truncated 15243 lines...] [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/complex.seq [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/json.txt [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out [junit] Done query: unknown_table1.q [junit] Begin query: unknown_table2.q [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] rmr: cannot remove phttps://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/data/warehouse/srcpart/ds=2008-04-08/hr=11: No such file or directory. [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] rmr: cannot remove phttps://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/data/warehouse/srcpart/ds=2008-04-08/hr=12: No such file or directory. [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] rmr: cannot remove phttps://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/data/warehouse/srcpart/ds=2008-04-09/hr=11: No such file or directory. [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] rmr: cannot remove phttps://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/data/warehouse/srcpart/ds=2008-04-09/hr=12: No such file or directory. [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket0.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Copying data
Re: release 0.6.0 wrapup
Ed mentioned in IRC that he would not be able to get to this for a few days, so I'll see if I can update it today so that you can announce the release. JVS On Oct 27, 2010, at 10:19 PM, Carl Steinbach wrote: Hey, I'd like to reference Hive Releases page in the 0.6.0 release announcement email. Ed, can you please update this page? (http://hive.apache.org/releases.html) It looks like the links for the old releases need to be updated, and for the 0.6.0 release we should provide a link to the JIRA release notes page instead of to the CHANGES document, i.e. https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843styleName=Htmlversion=12314524 Thanks. Carl On Tue, Oct 26, 2010 at 7:01 PM, Carl Steinbach c...@cloudera.com wrote: Carl, as release manager, can you send out the release announcement once everything is ready? I'll be at ApacheCon US next week in Atlanta and will be spreading the word on the release there. Will do!
[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase
[ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925915#action_12925915 ] John Sichi commented on HIVE-1634: -- Thanks Basab, I'm going to try to take a look at this one next week. Allow access to Primitive types stored in binary format in HBase Key: HIVE-1634 URL: https://issues.apache.org/jira/browse/HIVE-1634 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.7.0 Reporter: Basab Maulik Assignee: Basab Maulik Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java This addresses HIVE-1245 in part, for atomic or primitive types. The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a specification of the storage option for the corresponding column in the serde property hbase.columns.mapping. Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property hbase.table.default.storage.type = string to specify a table level default storage type. The other valid specification is binary. The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below. hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.691 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 NULLNULLNULLNULLNULLTest-String NULLNULL Time taken: 0.346 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.139 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,b,b,b ) tblproperties ( hbase.table.name = TestHiveHBaseExternalTable, hbase.table.default.storage.type = string); OK Time taken: 0.139 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291 Time taken: 0.151 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.154 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,-,b,b ) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.347 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291 Time taken: 0.245 seconds hive -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1758) optimize group by hash map memory
optimize group by hash map memory - Key: HIVE-1758 URL: https://issues.apache.org/jira/browse/HIVE-1758 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Siying Dong Group By map side's hash map consumes a lot of memory, thereby decreasing its effectiveness. We can use some of the optimizations from map-join to reduce the memory footprint: class KeyWrapper { int hashcode; ArrayListObject keys; // decide whether this is already in hashmap (keys in hashmap are deepcopied // version, and we need to use 'currentKeyObjectInspector'). boolean copy = false; 1. Changes keys to Array 2. Optimize the scenario when keys is of a small size (1,2) etc Let us start profiling it and take it from there -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE INDEX
[ https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sichi updated HIVE-1498: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Marquis et al! support IDXPROPERTIES on CREATE INDEX - Key: HIVE-1498 URL: https://issues.apache.org/jira/browse/HIVE-1498 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Marquis Wang Fix For: 0.7.0 Attachments: 1498.2.patch, 1498.patch, hive-1498.prelim.patch It's partially there in the grammar but not hooked in; should work pretty much the same as TBLPROPERTIES. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.