[jira] Commented: (HIVE-1538) FilterOperator is applied twice with ppd on.

2010-10-28 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925777#action_12925777
 ] 

Amareshwari Sriramadasu commented on HIVE-1538:
---

bq. I think the solution is to collect the operators who are contributing the 
predicates for final candidates of predicare pushdown and remove them from 
the final operator graph.
This does not work as I thought earlier, because all the predicates in the 
FilterOperator may not be pushed. We might have to reconstruct the 
FilterOperator with un-pushed predicates.

 FilterOperator is applied twice with ppd on.
 

 Key: HIVE-1538
 URL: https://issues.apache.org/jira/browse/HIVE-1538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
 seems second operator is always filtering zero rows.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1694) Accelerate query execution using indexes

2010-10-28 Thread Nikhil Deshpande (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Deshpande updated HIVE-1694:
---

Status: Patch Available  (was: Open)

This is a patch to demonstrate query performance gains using indexes
(added in HIVE-417). The patch is over latest hive trunk.

ChangeLog for the patch:
- Implements a new rewrite for a certain set of queries with GROUP BY to speed 
those queries by running them on index data instead of base table.
- Implements a skeleton generic rewrite engine.
- Implements the rewrite rule for a GroupBy queries set (mentioned above).  
More details in the class comment GbToCompactSumIdxRewrite.
- Rewrite needs to be currently explicitly enabled with a flag 
hive.ql.rw.gb_to_idx.
- Modifies metastore  metadata API for getting some index info.
- Modifies QB metadata  parseblock code to add some rewrite assist methods.
- Inserts a rewrite hook into Semantic Analyzer.
- Fixes a bug in ql QTestUtil to clean-up indexed tables properly
- Contains new test for Group By rewrite using indexes: 
ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q

Quick performance test results on a very small Hadoop cluster:

2 queries (chosen to demonstrate perf gains) run on TPC-H benchmark data 
lineitem table.

Timings in seconds, data set size (1M, 1G etc.) is TPC-H scale factor.
{noformat}
---
   1M  1G   10G  30G 
---
  q1_no_idx  24.161   76.790  506.005  1551.555
q1_with_idx  21.268   27.292   35.50286.133
---
  q1_no_idx  73.660  130.587  764.619  2146.423
q2_with_idx  69.393   75.493   92.867   190.619
---
{noformat}

Hadoop cluster description used for above perf test:
- 2 server class machines (each box: CentOS 5.x Linux, 5 SAS disks in RAID5, 
16GB RAM)
- 2-node Hadoop cluster (0.20.2), un-tuned and un-optimized, data not 
partitioned and clustered, Hive tables stored in row-store format, HDFS 
replication factor: 2
- Sun JDK 1.6 (server mode JVM, JVM_HEAP_SIZE:4GB RAM)
- Queries on TPC-H Data (lineitem table: 70% of TPC-H data size, e.g. TPC-H 
30GB data: 21GB lineitem, ~180Million tuples)


These changes are being maintained at http://github.com/prafullat/hive

 Accelerate query execution using indexes
 

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande

 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1756) failures in fatal.q in TestNegativeCliDriver

2010-10-28 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1756:
-

Attachment: Hive-1756.patch

remove fatal.q

 failures in fatal.q in TestNegativeCliDriver
 

 Key: HIVE-1756
 URL: https://issues.apache.org/jira/browse/HIVE-1756
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Liyin Tang
 Attachments: Hive-1756.patch


 This is probably caused by HIVE-1641

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1756) failures in fatal.q in TestNegativeCliDriver

2010-10-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1756.
--

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]

Committed. Thanks Liyin

 failures in fatal.q in TestNegativeCliDriver
 

 Key: HIVE-1756
 URL: https://issues.apache.org/jira/browse/HIVE-1756
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: Hive-1756.patch


 This is probably caused by HIVE-1641

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.20 #406

2010-10-28 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/406/changes

Changes:

[namit] HIVE-1757 Test cleanup for 1641
(Liyin Tang via namit)

[namit] HIVE-1755 Update broken test outputs due to 1641
(He Yongqiang via namit)

[namit] HIVE-474 Support for distinct selection on two or more columns
(Amareshwari Sriramadasu via namit)

--
[...truncated 15243 lines...]
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/complex.seq
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/json.txt
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] rmr: cannot remove 
phttps://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/data/warehouse/srcpart/ds=2008-04-08/hr=11:
 No such file or directory.
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] rmr: cannot remove 
phttps://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/data/warehouse/srcpart/ds=2008-04-08/hr=12:
 No such file or directory.
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] rmr: cannot remove 
phttps://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/data/warehouse/srcpart/ds=2008-04-09/hr=11:
 No such file or directory.
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] rmr: cannot remove 
phttps://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/data/warehouse/srcpart/ds=2008-04-09/hr=12:
 No such file or directory.
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket0.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data 

Re: release 0.6.0 wrapup

2010-10-28 Thread John Sichi
Ed mentioned in IRC that he would not be able to get to this for a few days, so 
I'll see if I can update it today so that you can announce the release.

JVS

On Oct 27, 2010, at 10:19 PM, Carl Steinbach wrote:

 Hey,
 
 I'd like to reference Hive Releases page in the 0.6.0 release announcement
 email. Ed, can you please update this page? 
 (http://hive.apache.org/releases.html)
 It looks like the links for the old releases need to be updated, and for the 
 0.6.0 release
 we should provide a link to the JIRA release notes page instead of to the 
 CHANGES document,
 i.e. 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843styleName=Htmlversion=12314524
 
 Thanks.
 
 Carl
 
 
 On Tue, Oct 26, 2010 at 7:01 PM, Carl Steinbach c...@cloudera.com wrote:
 
 Carl, as release manager, can you send out the release announcement once 
 everything is ready?  I'll be at ApacheCon US next week in Atlanta and will 
 be spreading the word on the release there.
 
 Will do! 
 



[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2010-10-28 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925915#action_12925915
 ] 

John Sichi commented on HIVE-1634:
--

Thanks Basab, I'm going to try to take a look at this one next week.

 Allow access to Primitive types stored in binary format in HBase
 

 Key: HIVE-1634
 URL: https://issues.apache.org/jira/browse/HIVE-1634
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: Basab Maulik
Assignee: Basab Maulik
 Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java


 This addresses HIVE-1245 in part, for atomic or primitive types.
 The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a 
 specification of the storage option for the corresponding column in the serde 
 property hbase.columns.mapping. Allowed values are '-' for table default, 
 's' for standard string storage, and 'b' for binary storage as would be 
 obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
 use a colon separated pair such as 's:b' for the key and value part 
 specifiers respectively. See the test cases and queries for HBase handler for 
 additional examples.
 There is also a table property hbase.table.default.storage.type = string 
 to specify a table level default storage type. The other valid specification 
 is binary. The table level default is overridden by a column level 
 specification.
 This control is available for the boolean, tinyint, smallint, int, bigint, 
 float, and double primitive types. The attached patch also relaxes the 
 mapping of map types to HBase column families to allow any primitive type to 
 be the map key.
 Attached is a program for creating a table and populating it in HBase. The 
 external table in Hive can access the data as shown in the example below.
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double)
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.691 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 NULLNULLNULLNULLNULLTest-String NULLNULL
 Time taken: 0.346 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.139 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,b,b,b )
   tblproperties (
   hbase.table.name = TestHiveHBaseExternalTable,
   hbase.table.default.storage.type = string);
 OK
 Time taken: 0.139 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768  -2147483648 -9223372036854775808
 Test-String -2.1793132E-11  2.01345E291
 Time taken: 0.151 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.154 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,-,b,b )
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.347 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768  -2147483648 -9223372036854775808
 Test-String -2.1793132E-11  2.01345E291
 Time taken: 0.245 seconds
 hive 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1758) optimize group by hash map memory

2010-10-28 Thread Namit Jain (JIRA)
optimize group by hash map memory
-

 Key: HIVE-1758
 URL: https://issues.apache.org/jira/browse/HIVE-1758
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong


Group By map side's hash map consumes a lot of memory, thereby decreasing its 
effectiveness.

We can use some of the optimizations from map-join to reduce the memory 
footprint:

  class KeyWrapper {
int hashcode;
ArrayListObject keys;
// decide whether this is already in hashmap (keys in hashmap are deepcopied
// version, and we need to use 'currentKeyObjectInspector').
boolean copy = false;

1. Changes keys to Array
2. Optimize the scenario when keys is of a small size (1,2) etc

Let us start profiling it and take it from there

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1498) support IDXPROPERTIES on CREATE INDEX

2010-10-28 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1498:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed.  Thanks Marquis et al!


 support IDXPROPERTIES on CREATE INDEX
 -

 Key: HIVE-1498
 URL: https://issues.apache.org/jira/browse/HIVE-1498
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Marquis Wang
 Fix For: 0.7.0

 Attachments: 1498.2.patch, 1498.patch, hive-1498.prelim.patch


 It's partially there in the grammar but not hooked in; should work pretty 
 much the same as TBLPROPERTIES.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.