[jira] [Updated] (HIVE-10495) Hive index creation code throws NPE if index table is null

2015-05-08 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-10495:
---
Attachment: (was: HIVE-10495.1.patch)

 Hive index creation code throws NPE if index table is null
 --

 Key: HIVE-10495
 URL: https://issues.apache.org/jira/browse/HIVE-10495
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Bing Li
Assignee: Bing Li

 The stack trace would be:
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_index(HiveMetaStore.java:2870)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
 at java.lang.reflect.Method.invoke(Method.java:611)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
 at $Proxy9.add_index(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createIndex(HiveMetaStoreClient.java:962)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

2015-05-08 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533955#comment-14533955
 ] 

Jesus Camacho Rodriguez commented on HIVE-9069:
---

[~mmokhtar], it seems the plan is right and the predicates are correctly pushed 
to the sources. Could you let me know which predicate should still be pushed 
down that is not? Thanks

 Simplify filter predicates for CBO
 --

 Key: HIVE-9069
 URL: https://issues.apache.org/jira/browse/HIVE-9069
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.14.1


 Simplify predicates for disjunctive predicates so that can get pushed down to 
 the scan.
 Looks like this is still an issue, some of the filters can be pushed down to 
 the scan.
 {code}
 set hive.cbo.enable=true
 set hive.stats.fetch.column.stats=true
 set hive.exec.dynamic.partition.mode=nonstrict
 set hive.tez.auto.reducer.parallelism=true
 set hive.auto.convert.join.noconditionaltask.size=32000
 set hive.exec.reducers.bytes.per.reducer=1
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
 set hive.support.concurrency=false
 set hive.tez.exec.print.summary=true
 explain  
 select  substr(r_reason_desc,1,20) as r
,avg(ws_quantity) wq
,avg(wr_refunded_cash) ref
,avg(wr_fee) fee
  from web_sales, web_returns, web_page, customer_demographics cd1,
   customer_demographics cd2, customer_address, date_dim, reason 
  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
and web_sales.ws_item_sk = web_returns.wr_item_sk
and web_sales.ws_order_number = web_returns.wr_order_number
and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
and reason.r_reason_sk = web_returns.wr_reason_sk
and
(
 (
  cd1.cd_marital_status = 'M'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = '4 yr Degree'
  and 
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 100.00 and 150.00
 )
or
 (
  cd1.cd_marital_status = 'D'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Primary' 
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 50.00 and 100.00
 )
or
 (
  cd1.cd_marital_status = 'U'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Advanced Degree'
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 150.00 and 200.00
 )
)
and
(
 (
  ca_country = 'United States'
  and
  ca_state in ('KY', 'GA', 'NM')
  and ws_net_profit between 100 and 200  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('MT', 'OR', 'IN')
  and ws_net_profit between 150 and 300  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('WI', 'MO', 'WV')
  and ws_net_profit between 50 and 250  
 )
)
 group by r_reason_desc
 order by r, wq, ref, fee
 limit 100
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 9 - Map 1 (BROADCAST_EDGE)
 Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
 Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
 Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
 Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
 Reducer 7 - Reducer 6 (SIMPLE_EDGE)
 Reducer 8 - Reducer 7 (SIMPLE_EDGE)
   DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: web_page
   filterExpr: wp_web_page_sk is not null (type: boolean)
   Statistics: Num rows: 4602 Data size: 2696178 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: wp_web_page_sk is not null (type: boolean)
 Statistics: Num rows: 4602 Data size: 18408 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: wp_web_page_sk (type: int)
   

[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

2015-05-08 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533971#comment-14533971
 ] 

Mostafa Mokhtar commented on HIVE-9069:
---

[~jcamachorodriguez]

Check web_sales for instance, it has the following predicates, all of which can 
be pushed down to the scan as a PPD or filter.
Same thing applies for customer_demographics cd1 and customer_address 
customer_demographics cd1 doesn't get any filter pushed down while 
customer_address get {code} ca_country = 'United States' {code} pushed.

{code}
and
   (
(
ws_sales_price between 100.00 and 150.00
)
   or
(
ws_sales_price between 50.00 and 100.00
)
   or
(
ws_sales_price between 150.00 and 200.00
)
   )
   and
   (
(
ws_net_profit between 100 and 200  
)
or
(
ws_net_profit between 150 and 300  
)
or
(
ws_net_profit between 50 and 250  
)
   )
{code}

 Simplify filter predicates for CBO
 --

 Key: HIVE-9069
 URL: https://issues.apache.org/jira/browse/HIVE-9069
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.14.1


 Simplify predicates for disjunctive predicates so that can get pushed down to 
 the scan.
 Looks like this is still an issue, some of the filters can be pushed down to 
 the scan.
 {code}
 set hive.cbo.enable=true
 set hive.stats.fetch.column.stats=true
 set hive.exec.dynamic.partition.mode=nonstrict
 set hive.tez.auto.reducer.parallelism=true
 set hive.auto.convert.join.noconditionaltask.size=32000
 set hive.exec.reducers.bytes.per.reducer=1
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
 set hive.support.concurrency=false
 set hive.tez.exec.print.summary=true
 explain  
 select  substr(r_reason_desc,1,20) as r
,avg(ws_quantity) wq
,avg(wr_refunded_cash) ref
,avg(wr_fee) fee
  from web_sales, web_returns, web_page, customer_demographics cd1,
   customer_demographics cd2, customer_address, date_dim, reason 
  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
and web_sales.ws_item_sk = web_returns.wr_item_sk
and web_sales.ws_order_number = web_returns.wr_order_number
and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
and reason.r_reason_sk = web_returns.wr_reason_sk
and
(
 (
  cd1.cd_marital_status = 'M'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = '4 yr Degree'
  and 
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 100.00 and 150.00
 )
or
 (
  cd1.cd_marital_status = 'D'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Primary' 
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 50.00 and 100.00
 )
or
 (
  cd1.cd_marital_status = 'U'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Advanced Degree'
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 150.00 and 200.00
 )
)
and
(
 (
  ca_country = 'United States'
  and
  ca_state in ('KY', 'GA', 'NM')
  and ws_net_profit between 100 and 200  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('MT', 'OR', 'IN')
  and ws_net_profit between 150 and 300  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('WI', 'MO', 'WV')
  and ws_net_profit between 50 and 250  
 )
)
 group by r_reason_desc
 order by r, wq, ref, fee
 limit 100
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 9 - Map 1 (BROADCAST_EDGE)
 Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
 Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
 Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
 Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
 Reducer 7 - Reducer 6 (SIMPLE_EDGE)
 Reducer 8 - Reducer 7 (SIMPLE_EDGE)
   DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
   Vertices:
 Map 1 
 Map Operator Tree:
  

[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-08 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534035#comment-14534035
 ] 

Lefty Leverenz commented on HIVE-9736:
--

Doc note:  This adds configuration parameter 
*hive.authprovider.hdfs.liststatus.batch.size* to HiveConf.java, so it needs to 
be documented in the wiki (for whatever release it ends up in).

* [Configuration Properties -- Authentication/Authorization | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Authentication/Authorization]

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9456) Make Hive support unicode with MSSQL as Metastore backend

2015-05-08 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534092#comment-14534092
 ] 

Lefty Leverenz commented on HIVE-9456:
--

Does this need documentation?

 Make Hive support unicode with MSSQL as Metastore backend
 -

 Key: HIVE-9456
 URL: https://issues.apache.org/jira/browse/HIVE-9456
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Fix For: 1.2.0

 Attachments: HIVE-9456.1.patch, HIVE-9456.2.patch, HIVE-9456.3.patch, 
 HIVE-9456.branch-1.2.patch


 There are significant issues when Hive uses MSSQL as metastore backend to 
 support unicode, since MSSQL handles varchar and nvarchar datatypes 
 differently. Hive 0.14 metastore mssql script DDL was using varchar as 
 datatype, which can't handle multi-bytes/unicode characters, e.g., Chinese 
 chars. This JIRA is going to track implementation of unicode support in that 
 case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10588) implement hashCode method for HWISessionItem

2015-05-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534187#comment-14534187
 ] 

Hive QA commented on HIVE-10588:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731033/HIVE-10588.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8919 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3808/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3808/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3808/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731033 - PreCommit-HIVE-TRUNK-Build

 implement hashCode method for HWISessionItem
 

 Key: HIVE-10588
 URL: https://issues.apache.org/jira/browse/HIVE-10588
 Project: Hive
  Issue Type: Improvement
  Components: Web UI
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10588.1.patch, rb33796.patch


 HWISessionItem overwrites equals method but not hashCode method.
 It violates java contract below:
 If two objects are equal according to the equals(Object) method, then 
 calling the hashCode method on each of the two objects must produce the same 
 integer result.
 Currently equals and compareTo methods use sessionName in their 
 implementation.
 sessionName.hashcode() can be used in HWISessionItem.hashCode as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-08 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9736:
-
Labels: TODOC1.2  (was: )

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10621) serde typeinfo equals methods are not symmetric

2015-05-08 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534003#comment-14534003
 ] 

Alexander Pivovarov commented on HIVE-10621:


testCliDriver_encryption_insert_partition_static failed in many recent builds.
So, tests look good

 serde typeinfo equals methods are not symmetric
 ---

 Key: HIVE-10621
 URL: https://issues.apache.org/jira/browse/HIVE-10621
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10621.1.patch, rb33880.patch


 correct equals method implementation should start with
 {code}
   if (this == other) {
 return true;
   }
   if (other == null || getClass() != other.getClass()) {
 return false;
   }
 {code}
 DecimalTypeInfo, PrimitiveTypeInfo, VarcharTypeInfo, CharTypeInfo, 
 HiveDecimalWritable equals method implementation starts with
 {code}
   if (other == null || !(other instanceof class_name)) {
 return false
   }
 {code}
 - first of all check for null is redundant
 - the second issue is that other instanceof class_name check is not 
 symmetric.
 contract of equals() implies that, a.equals(b) is true if and only if 
 b.equals(a) is true
 Current implementation violates this contract.
 e.g.
 DecimalTypeInfo instanceof PrimitiveTypeInfo is true
 but
 PrimitiveTypeInfo instanceof DecimalTypeInfo is false
 See more details here 
 http://stackoverflow.com/questions/6518534/equals-method-overrides-equals-in-superclass-and-may-not-be-symmetric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

2015-05-08 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534098#comment-14534098
 ] 

Jesus Camacho Rodriguez commented on HIVE-9069:
---

The case for the condition {{ca_country='United States'}} is a bit different, 
since it can get effectively pushed down.

What you are proposing for {{ws_sales_price}} would actually be a reduction, 
right? Pushing the predicates down, but still leaving the filter with the 
condition on top. For instance, you could push to the scan the condition
{noformat}
ws_sales_price between 50.00 and 200.00
{noformat}
but you still need to leave the other conditions in the tree in order to keep 
the correct semantics e.g.
{noformat}
(
 cd1.cd_marital_status = 'M'
 and
 cd1.cd_marital_status = cd2.cd_marital_status
 and
 cd1.cd_education_status = '4 yr Degree'
 and 
 cd1.cd_education_status = cd2.cd_education_status
 and
 ws_sales_price between 100.00 and 150.00
) or ...
{noformat}

 Simplify filter predicates for CBO
 --

 Key: HIVE-9069
 URL: https://issues.apache.org/jira/browse/HIVE-9069
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.14.1


 Simplify predicates for disjunctive predicates so that can get pushed down to 
 the scan.
 Looks like this is still an issue, some of the filters can be pushed down to 
 the scan.
 {code}
 set hive.cbo.enable=true
 set hive.stats.fetch.column.stats=true
 set hive.exec.dynamic.partition.mode=nonstrict
 set hive.tez.auto.reducer.parallelism=true
 set hive.auto.convert.join.noconditionaltask.size=32000
 set hive.exec.reducers.bytes.per.reducer=1
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
 set hive.support.concurrency=false
 set hive.tez.exec.print.summary=true
 explain  
 select  substr(r_reason_desc,1,20) as r
,avg(ws_quantity) wq
,avg(wr_refunded_cash) ref
,avg(wr_fee) fee
  from web_sales, web_returns, web_page, customer_demographics cd1,
   customer_demographics cd2, customer_address, date_dim, reason 
  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
and web_sales.ws_item_sk = web_returns.wr_item_sk
and web_sales.ws_order_number = web_returns.wr_order_number
and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
and reason.r_reason_sk = web_returns.wr_reason_sk
and
(
 (
  cd1.cd_marital_status = 'M'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = '4 yr Degree'
  and 
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 100.00 and 150.00
 )
or
 (
  cd1.cd_marital_status = 'D'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Primary' 
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 50.00 and 100.00
 )
or
 (
  cd1.cd_marital_status = 'U'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Advanced Degree'
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 150.00 and 200.00
 )
)
and
(
 (
  ca_country = 'United States'
  and
  ca_state in ('KY', 'GA', 'NM')
  and ws_net_profit between 100 and 200  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('MT', 'OR', 'IN')
  and ws_net_profit between 150 and 300  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('WI', 'MO', 'WV')
  and ws_net_profit between 50 and 250  
 )
)
 group by r_reason_desc
 order by r, wq, ref, fee
 limit 100
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 9 - Map 1 (BROADCAST_EDGE)
 Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
 Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
 Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
 Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
 Reducer 7 - Reducer 6 (SIMPLE_EDGE)
 Reducer 8 - Reducer 7 (SIMPLE_EDGE)
   DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
   Vertices:
 Map 1 

[jira] [Updated] (HIVE-10627) Queries fail with Failed to breakup Windowing invocations into Groups

2015-05-08 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10627:
---
Attachment: HIVE-10627.02.patch

New patch addresses comments by [~jpullokkaran]

 Queries fail with Failed to breakup Windowing invocations into Groups
 -

 Key: HIVE-10627
 URL: https://issues.apache.org/jira/browse/HIVE-10627
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-10627.01.patch, HIVE-10627.01.patch, 
 HIVE-10627.02.patch, HIVE-10627.patch


 TPC-DS queries 51 fails with Failed to breakup Windowing invocations into 
 Groups. At least 1 group must only depend on input columns. Also check for 
 circular dependencies.
 {code}
 explain  
 WITH web_v1 as (
 select
   ws_item_sk item_sk, d_date, sum(ws_sales_price),
   sum(sum(ws_sales_price))
   over (partition by ws_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from web_sales
 ,date_dim
 where ws_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ws_item_sk is not NULL
 group by ws_item_sk, d_date),
 store_v1 as (
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date)
  select  *
 from (select item_sk
  ,d_date
  ,web_sales
  ,store_sales
  ,max(web_sales)
  over (partition by item_sk order by d_date rows between unbounded 
 preceding and current row) web_cumulative
  ,max(store_sales)
  over (partition by item_sk order by d_date rows between unbounded 
 preceding and current row) store_cumulative
  from (select case when web.item_sk is not null then web.item_sk else 
 store.item_sk end item_sk
  ,case when web.d_date is not null then web.d_date else 
 store.d_date end d_date
  ,web.cume_sales web_sales
  ,store.cume_sales store_sales
from web_v1 web full outer join store_v1 store on (web.item_sk = 
 store.item_sk
   and web.d_date = 
 store.d_date)
   )x )y
 where web_cumulative  store_cumulative
 order by item_sk
 ,d_date
 limit 100;
 {code}
 Exception 
 {code}
 org.apache.hadoop.hive.ql.parse.SemanticException: Failed to breakup 
 Windowing invocations into Groups. At least 1 group must only depend on input 
 columns. Also check for circular dependencies. 
 Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 
 0:-1 Invalid column reference '$f2' 
   at 
 org.apache.hadoop.hive.ql.parse.WindowingComponentizer.next(WindowingComponentizer.java:94)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genWindowingPlan(SemanticAnalyzer.java:11538)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8514)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8472)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9304)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9189)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9210)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9592)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208)
 

[jira] [Commented] (HIVE-10628) Incorrect result when vectorized native mapjoin is enabled using null safe operators =

2015-05-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534352#comment-14534352
 ] 

Hive QA commented on HIVE-10628:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731058/HIVE-10628.01.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3809/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3809/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3809/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731058 - PreCommit-HIVE-TRUNK-Build

 Incorrect result when vectorized native mapjoin is enabled using null safe 
 operators =
 

 Key: HIVE-10628
 URL: https://issues.apache.org/jira/browse/HIVE-10628
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10628.01.patch


 Incorrect results for this query:
 {noformat}
 select count(*) from store_sales ss join store_returns sr on (sr.sr_item_sk 
 = ss.ss_item_sk and sr.sr_customer_sk = ss.ss_customer_sk and 
 sr.sr_item_sk = ss.ss_item_sk) where ss.ss_net_paid  1000;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)

2015-05-08 Thread Reuben Kuhnert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: HIVE-10190.12.patch

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, 
 HIVE-10190.10.patch, HIVE-10190.11.patch, HIVE-10190.12.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10255) Parquet PPD support TIMESTAMP

2015-05-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534711#comment-14534711
 ] 

Sergio Peña commented on HIVE-10255:


Is there a way to detect what data type is the Parquet file using for Timestamp 
(int96 or timestamp_millis), and use a specific leaf filter for this?
I am just thinking if we can support older versions of Parquet.

 Parquet PPD support TIMESTAMP
 -

 Key: HIVE-10255
 URL: https://issues.apache.org/jira/browse/HIVE-10255
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10255-parquet.1.patch, HIVE-10255-parquet.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10253) Parquet PPD support DATE

2015-05-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534719#comment-14534719
 ] 

Sergio Peña commented on HIVE-10253:


+1

 Parquet PPD support DATE
 

 Key: HIVE-10253
 URL: https://issues.apache.org/jira/browse/HIVE-10253
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10253-parquet.patch, HIVE-10253.patch


 Hive should handle the DATE data type when generating and pushing the 
 predicate to Parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10624) Update the initial script to make beeline bucked cli as default and allow user choose old hive cli by env

2015-05-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534647#comment-14534647
 ] 

Hive QA commented on HIVE-10624:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731079/HIVE-10624.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8919 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3811/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3811/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3811/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731079 - PreCommit-HIVE-TRUNK-Build

 Update the initial script to make beeline bucked cli as default and allow 
 user choose old hive cli by env
 -

 Key: HIVE-10624
 URL: https://issues.apache.org/jira/browse/HIVE-10624
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10624.patch


 As discussed in the dev-list, we should update the script to make new beeline 
 bucked cli default and allow user to change to old cli by environment 
 variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10256) Filter row groups based on the block statistics in Parquet

2015-05-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534715#comment-14534715
 ] 

Sergio Peña commented on HIVE-10256:


Is this method name correct {{recordReader.getFiltedBlocks()}} ?  Isn't 
getFilteredBlocks?

 Filter row groups based on the block statistics in Parquet
 --

 Key: HIVE-10256
 URL: https://issues.apache.org/jira/browse/HIVE-10256
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-10256-parquet.patch


 In Parquet PPD, the not matched row groups should be eliminated. See 
 {{TestOrcSplitElimination}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10639) create SHA1 UDF

2015-05-08 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10639:
---
Attachment: HIVE-10639.2.patch

patch #2
- removed copyBytes operation which should improve performance

 create SHA1 UDF
 ---

 Key: HIVE-10639
 URL: https://issues.apache.org/jira/browse/HIVE-10639
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10639.1.patch, HIVE-10639.2.patch


 Calculates an SHA-1 160-bit checksum for the string and binary, as described 
 in RFC 3174 (Secure Hash Algorithm). The value is returned as a string of 40 
 hex digits, or NULL if the argument was NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8696) HCatClientHMSImpl doesn't use a Retrying-HiveMetastoreClient.

2015-05-08 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534863#comment-14534863
 ] 

Thiruvel Thirumoolan commented on HIVE-8696:


Thanks Sushanth!

 HCatClientHMSImpl doesn't use a Retrying-HiveMetastoreClient.
 -

 Key: HIVE-8696
 URL: https://issues.apache.org/jira/browse/HIVE-8696
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Thiruvel Thirumoolan
 Fix For: 1.2.0

 Attachments: HIVE-8696.1.patch, HIVE-8696.2.patch, HIVE-8696.3.patch, 
 HIVE-8696.4.patch, HIVE-8696.5.patch, HIVE-8696.poc.patch


 The HCatClientHMSImpl doesn't use a RetryingHiveMetastoreClient. Users of the 
 HCatClient API that log in through keytabs will fail without retry, when 
 their TGTs expire.
 The fix is inbound. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO

2015-05-08 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534736#comment-14534736
 ] 

Mostafa Mokhtar commented on HIVE-9069:
---

[~jcamachorodriguez]
Yes, this is correct. 

 Simplify filter predicates for CBO
 --

 Key: HIVE-9069
 URL: https://issues.apache.org/jira/browse/HIVE-9069
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.14.1


 Simplify predicates for disjunctive predicates so that can get pushed down to 
 the scan.
 Looks like this is still an issue, some of the filters can be pushed down to 
 the scan.
 {code}
 set hive.cbo.enable=true
 set hive.stats.fetch.column.stats=true
 set hive.exec.dynamic.partition.mode=nonstrict
 set hive.tez.auto.reducer.parallelism=true
 set hive.auto.convert.join.noconditionaltask.size=32000
 set hive.exec.reducers.bytes.per.reducer=1
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
 set hive.support.concurrency=false
 set hive.tez.exec.print.summary=true
 explain  
 select  substr(r_reason_desc,1,20) as r
,avg(ws_quantity) wq
,avg(wr_refunded_cash) ref
,avg(wr_fee) fee
  from web_sales, web_returns, web_page, customer_demographics cd1,
   customer_demographics cd2, customer_address, date_dim, reason 
  where web_sales.ws_web_page_sk = web_page.wp_web_page_sk
and web_sales.ws_item_sk = web_returns.wr_item_sk
and web_sales.ws_order_number = web_returns.wr_order_number
and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998
and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk 
and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk
and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk
and reason.r_reason_sk = web_returns.wr_reason_sk
and
(
 (
  cd1.cd_marital_status = 'M'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = '4 yr Degree'
  and 
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 100.00 and 150.00
 )
or
 (
  cd1.cd_marital_status = 'D'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Primary' 
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 50.00 and 100.00
 )
or
 (
  cd1.cd_marital_status = 'U'
  and
  cd1.cd_marital_status = cd2.cd_marital_status
  and
  cd1.cd_education_status = 'Advanced Degree'
  and
  cd1.cd_education_status = cd2.cd_education_status
  and
  ws_sales_price between 150.00 and 200.00
 )
)
and
(
 (
  ca_country = 'United States'
  and
  ca_state in ('KY', 'GA', 'NM')
  and ws_net_profit between 100 and 200  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('MT', 'OR', 'IN')
  and ws_net_profit between 150 and 300  
 )
 or
 (
  ca_country = 'United States'
  and
  ca_state in ('WI', 'MO', 'WV')
  and ws_net_profit between 50 and 250  
 )
)
 group by r_reason_desc
 order by r, wq, ref, fee
 limit 100
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 9 - Map 1 (BROADCAST_EDGE)
 Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE)
 Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
 Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
 Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 
 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE)
 Reducer 7 - Reducer 6 (SIMPLE_EDGE)
 Reducer 8 - Reducer 7 (SIMPLE_EDGE)
   DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: web_page
   filterExpr: wp_web_page_sk is not null (type: boolean)
   Statistics: Num rows: 4602 Data size: 2696178 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: wp_web_page_sk is not null (type: boolean)
 Statistics: Num rows: 4602 Data size: 18408 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: wp_web_page_sk (type: int)
   outputColumnNames: _col0
   Statistics: Num rows: 4602 Data size: 18408 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce 

[jira] [Commented] (HIVE-10641) create CRC32 UDF

2015-05-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534869#comment-14534869
 ] 

Hive QA commented on HIVE-10641:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731095/HIVE-10641.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8914 tests executed
*Failed tests:*
{noformat}
TestSparkClient - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser
org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3812/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3812/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3812/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731095 - PreCommit-HIVE-TRUNK-Build

 create CRC32 UDF
 

 Key: HIVE-10641
 URL: https://issues.apache.org/jira/browse/HIVE-10641
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10641.1.patch


 CRC32 computes a cyclic redundancy check value for string or binary argument 
 and returns bigint value. The result is NULL if the argument is NULL.
 MySQL has similar function 
 https://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_crc32



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10580) Fix impossible cast in GenericUDF.getConstantLongValue

2015-05-08 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534767#comment-14534767
 ] 

Ashutosh Chauhan commented on HIVE-10580:
-

This method is actually not used at all. We can just remove it.

 Fix impossible cast in GenericUDF.getConstantLongValue
 --

 Key: HIVE-10580
 URL: https://issues.apache.org/jira/browse/HIVE-10580
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10580.1.patch


 line 548-549
 {code}
 if (constValue instanceof IntWritable) {
   v = ((LongWritable) constValue).get();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10656) Beeline set var=value not carrying over to queries

2015-05-08 Thread Reuben Kuhnert (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535085#comment-14535085
 ] 

Reuben Kuhnert commented on HIVE-10656:
---

This appears to be a problem with variable ambiguity:

{code}
set key=value
{code} 

expands to: 
{code}
set hiveconf:key=value 
{code}

however,

{code}
select * from ${key}
{code}

expands to:

{code}
select* from ${hiveconf:key}
{code}

The question is basically, should we allow users to enter ambiguous properties, 
and if so should the {{key}} default to {{hiveconf:key}} or {{hivevar:key}}?

 Beeline set var=value not carrying over to queries
 --

 Key: HIVE-10656
 URL: https://issues.apache.org/jira/browse/HIVE-10656
 Project: Hive
  Issue Type: Bug
Reporter: Reuben Kuhnert
Priority: Minor

 After performing a {{set name=value}} I would expect that the variable name 
 would carry over to all locations within the session. It appears to work when 
 querying the value via {{set;}}, but not when trying to do actual sql 
 statements.
 Example:
 {code}
 0: jdbc:hive2://localhost:1 set foo;
 +--+--+
 |   set|
 +--+--+
 | foo=bar  |
 +--+--+
 1 row selected (0.932 seconds)
 0: jdbc:hive2://localhost:1 select * from ${foo};
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10001]: Line 1:14 Table not found 'bar' (state=42S02,code=10001)
 0: jdbc:hive2://localhost:1 show tables;
 ++--+
 |  tab_name  |
 ++--+
 | my |
 | purchases  |
 ++--+
 2 rows selected (0.437 seconds)
 0: jdbc:hive2://localhost:1 set foo=my;
 No rows affected (0.017 seconds)
 0: jdbc:hive2://localhost:1 set foo;
 +-+--+
 |   set   |
 +-+--+
 | foo=my  |
 +-+--+
 1 row selected (0.02 seconds)
 0: jdbc:hive2://localhost:1 select * from ${foo};
 select * from ${foo};
 Error: Error while compiling statement: FAILED: SemanticException [Error 
 10001]: Line 1:14 Table not found 'bar' (state=42S02,code=10001)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10643) Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following)

2015-05-08 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10643:

Attachment: HIVE-10643.patch

 Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers 
 (1 for number of preceding and 1 for number of following)
 ---

 Key: HIVE-10643
 URL: https://issues.apache.org/jira/browse/HIVE-10643
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor
 Attachments: HIVE-10643.patch


 The functionality should not be affected. Instead of passing 2 numbers (1 for 
 # of preceding rows and 1 for # of following rows), we will pass 
 WindowFrameDef object around. In the following subtasks, it will be used for 
 the cases of {{rows between x preceding and y preceding}} and {{rows between 
 x following and y following}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10651) ORC file footer cache should be bounded

2015-05-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535357#comment-14535357
 ] 

Sergey Shelukhin commented on HIVE-10651:
-

This /might/ also affect LLAP when running w/o IO elevator.

 ORC file footer cache should be bounded
 ---

 Key: HIVE-10651
 URL: https://issues.apache.org/jira/browse/HIVE-10651
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-10651.1.patch


 ORC's file footer cache is currently unbounded and is a soft reference cache. 
 The cache size got from config is used to set initial capacity. We should 
 bound the cache from growing too big and to get a predictable performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10643) Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following)

2015-05-08 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10643:

Attachment: (was: HIVE-10643.patch)

 Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers 
 (1 for number of preceding and 1 for number of following)
 ---

 Key: HIVE-10643
 URL: https://issues.apache.org/jira/browse/HIVE-10643
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor
 Attachments: HIVE-10643.patch


 The functionality should not be affected. Instead of passing 2 numbers (1 for 
 # of preceding rows and 1 for # of following rows), we will pass 
 WindowFrameDef object around. In the following subtasks, it will be used for 
 the cases of {{rows between x preceding and y preceding}} and {{rows between 
 x following and y following}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10657) Remove copyBytes operation from MD5 UDF

2015-05-08 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10657:
---
Description: 
Current MD5 UDF implementation uses Apache Commons  DigestUtils.md5Hex method 
to get md5 hex.
DigestUtils does not provide md5Hex method with signature (byte[], start, 
length). This is why copyBytes method was added to UDFMd5 to get bytes[] from 
BytesWritable.

To avoid copying bytes from BytesWritable to new byte array we can use java 
MessageDigest API directly.
MessageDigest has method update(byte[], start, length)

  was:
Current implementation uses Apache Commons  DigestUtils.md5Hex method to get 
md5 hex.
DigestUtils does not provide md5Hex method with signature (byte[], start, 
length). This is why copyBytes method was added to get bytes[] from 
BytesWritable.

To avoid copying bytes from BytesWritable to new byte array we can use java 
MessageDigest API directly.
MessageDigest has method update(byte[], start, length)


 Remove copyBytes operation from MD5 UDF
 ---

 Key: HIVE-10657
 URL: https://issues.apache.org/jira/browse/HIVE-10657
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor

 Current MD5 UDF implementation uses Apache Commons  DigestUtils.md5Hex method 
 to get md5 hex.
 DigestUtils does not provide md5Hex method with signature (byte[], start, 
 length). This is why copyBytes method was added to UDFMd5 to get bytes[] from 
 BytesWritable.
 To avoid copying bytes from BytesWritable to new byte array we can use java 
 MessageDigest API directly.
 MessageDigest has method update(byte[], start, length)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10563) MiniTezCliDriver tests ordering issues

2015-05-08 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10563:
-
Attachment: HIVE-10563.4.patch

uploading the rebased patch. 

Thanks
Hari

 MiniTezCliDriver tests ordering issues
 --

 Key: HIVE-10563
 URL: https://issues.apache.org/jira/browse/HIVE-10563
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10563.1.patch, HIVE-10563.2.patch, 
 HIVE-10563.3.patch, HIVE-10563.4.patch


 There are a bunch of tests related to TestMiniTezCliDriver which gives 
 ordering issues when run on Centos/Windows/OSX



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10657) Remove copyBytes operation from MD5 UDF

2015-05-08 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10657:
---
Attachment: HIVE-10657.1.patch

patch #1

 Remove copyBytes operation from MD5 UDF
 ---

 Key: HIVE-10657
 URL: https://issues.apache.org/jira/browse/HIVE-10657
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10657.1.patch


 Current MD5 UDF implementation uses Apache Commons  DigestUtils.md5Hex method 
 to get md5 hex.
 DigestUtils does not provide md5Hex method with signature (byte[], start, 
 length). This is why copyBytes method was added to UDFMd5 to get bytes[] from 
 BytesWritable.
 To avoid copying bytes from BytesWritable to new byte array we can use java 
 MessageDigest API directly.
 MessageDigest has method update(byte[], start, length)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10626) Spark paln need to be updated [Spark Branch]

2015-05-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535343#comment-14535343
 ] 

Hive QA commented on HIVE-10626:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12730949/HIVE-10626.2-spark.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8721 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did 
not produce a TEST-*.xml file
TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not 
produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more
 - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/850/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/850/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-850/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12730949 - PreCommit-HIVE-SPARK-Build

 Spark paln need to be updated [Spark Branch]
 

 Key: HIVE-10626
 URL: https://issues.apache.org/jira/browse/HIVE-10626
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-10626-spark.patch, HIVE-10626.1-spark.patch, 
 HIVE-10626.2-spark.patch


 [HIVE-8858] basic patch was committed, latest patch need to be committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10643) Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers (1 for number of preceding and 1 for number of following)

2015-05-08 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10643:

Attachment: HIVE-10643.patch

 Refactoring Windowing for sum() to pass WindowFrameDef instead of two numbers 
 (1 for number of preceding and 1 for number of following)
 ---

 Key: HIVE-10643
 URL: https://issues.apache.org/jira/browse/HIVE-10643
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor
 Attachments: HIVE-10643.patch


 The functionality should not be affected. Instead of passing 2 numbers (1 for 
 # of preceding rows and 1 for # of following rows), we will pass 
 WindowFrameDef object around. In the following subtasks, it will be used for 
 the cases of {{rows between x preceding and y preceding}} and {{rows between 
 x following and y following}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10591) Support limited integer type promotion in ORC

2015-05-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10591:
-
Attachment: HIVE-10591.3.patch

 Support limited integer type promotion in ORC
 -

 Key: HIVE-10591
 URL: https://issues.apache.org/jira/browse/HIVE-10591
 Project: Hive
  Issue Type: New Feature
Affects Versions: 1.3.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch, 
 HIVE-10591.2.patch, HIVE-10591.3.patch, HIVE-10591.3.patch, HIVE-10591.3.patch


 ORC currently does not support schema-on-read. If we alter an ORC table with 
 'int' type to 'bigint' and if we query the altered table ClassCastException 
 will be thrown as the schema on read from table descriptor will expect 
 LongWritable whereas ORC will return IntWritable based on file schema stored 
 within ORC file. OrcSerde currently doesn't do any type conversions or type 
 promotions for performance reasons in inner loop. Since smallints, ints and 
 bigints are stored in the same way in ORC, it will be possible be allow such 
 type promotions without hurting performance. Following type promotions can be 
 supported without any casting
 smallint - int
 smallint - bigint
 int - bigint
 Tinyint promotion is not possible without casting as tinyints are stored 
 using RLE byte writer whereas smallints, ints and bigints are stored using 
 RLE integer writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases

2015-05-08 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10542:
--
Fix Version/s: 1.3.0
   1.2.0

 Full outer joins in tez produce incorrect results in certain cases
 --

 Key: HIVE-10542
 URL: https://issues.apache.org/jira/browse/HIVE-10542
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Blocker
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10542.1.patch, HIVE-10542.2.patch, 
 HIVE-10542.3.patch, HIVE-10542.4.patch, HIVE-10542.5.patch, 
 HIVE-10542.6.patch, HIVE-10542.7.patch, HIVE-10542.8.patch, HIVE-10542.9.patch


 If there is no records for one of the tables in the full outer join, we do 
 not read the other input and end up not producing rows which we should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10568) Select count(distinct()) can have more optimal execution plan

2015-05-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10568:

Attachment: HIVE-10568.2.patch

Addressed review comments.

 Select count(distinct()) can have more optimal execution plan
 -

 Key: HIVE-10568
 URL: https://issues.apache.org/jira/browse/HIVE-10568
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 
 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, 
 HIVE-10568.patch, HIVE-10568.patch


 {code:sql}
 select count(distinct ss_ticket_number) from store_sales;
 {code}
 can be rewritten as
 {code:sql}
 select count(1) from (select distinct ss_ticket_number from store_sales) a;
 {code}
 which may run upto 3x faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10394) LLAP: Notify AM of pre-emption

2015-05-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10394:
-
Attachment: (was: HIVE-10394.1.patch)

 LLAP: Notify AM of pre-emption
 --

 Key: HIVE-10394
 URL: https://issues.apache.org/jira/browse/HIVE-10394
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran

 Pre-empted tasks should be notified to AM as killed/interrupted by system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10660) Fix typo in Type.getType(TTypeId) exception message

2015-05-08 Thread Keegan Witt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Witt updated HIVE-10660:
---
Attachment: HIVE-10660.patch

 Fix typo in Type.getType(TTypeId) exception message
 ---

 Key: HIVE-10660
 URL: https://issues.apache.org/jira/browse/HIVE-10660
 Project: Hive
  Issue Type: Bug
Reporter: Keegan Witt
Assignee: Keegan Witt
Priority: Trivial
 Attachments: HIVE-10660.patch


 {{org.apache.hive.service.cli.Type.getType(org.apache.hive.service.cli.thrift.TTypeId)}}
  throws an _IllegalArgumentException_ with 'Unrecognized' misspelled as 
 'Unrecognized'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10658) ACID operation expose encrypted data

2015-05-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535431#comment-14535431
 ] 

Sergio Peña commented on HIVE-10658:


Doesn't ACID get the scratch directory from the Context object?

The {{SemanticAnalyzer.getMetaData()}} gets the encrypted or /tmp directory 
from getStagingDirectoryPathname() and sets the value to the Context.
This might help. See the line {{Path stagingPath = 
getStagingDirectoryPathname(qb);}}

 ACID operation expose encrypted data
 

 Key: HIVE-10658
 URL: https://issues.apache.org/jira/browse/HIVE-10658
 Project: Hive
  Issue Type: Sub-task
Reporter: Eugene Koifman

 Insert/Update/Delete operations all use temporary tables.
 the data in temp tables is stored under the hive.exec.scratchdir which is not 
 usually encrypted.  This is a similar issue to using scratchdir for staging 
 query results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535528#comment-14535528
 ] 

Sushanth Sowmyan commented on HIVE-9736:


I did not find this in the precommit queue, so I've manually added it in now : 
build#3815 should test this.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9544) Error dropping fully qualified partitioned table - Internal error processing get_partition_names

2015-05-08 Thread Dipankar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535530#comment-14535530
 ] 

Dipankar commented on HIVE-9544:


Alternate way of doing this is :
hive -hiveconf schema=mydb -e 'drop table ${hiveconf:schema}.my_table_name'

I.e .. pass the schema/database name as hive conf.

 Error dropping fully qualified partitioned table - Internal error processing 
 get_partition_names
 

 Key: HIVE-9544
 URL: https://issues.apache.org/jira/browse/HIVE-9544
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
 Environment: HDP 2.2
Reporter: Hari Sekhon
Priority: Minor

 When attempting to drop a partitioned table using a fully qualified name I 
 get this error:
 {code}
 hive -e 'drop table myDB.my_table_name;'
 Logging initialized using configuration in 
 file:/etc/hive/conf/hive-log4j.properties
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. 
 org.apache.thrift.TApplicationException: Internal error processing 
 get_partition_names
 {code}
 It succeeds if I instead do:
 {code}hive -e 'use myDB; drop table my_table_name;'{code}
 Regards,
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535528#comment-14535528
 ] 

Sushanth Sowmyan edited comment on HIVE-9736 at 5/8/15 9:06 PM:


I did not find this in the precommit queue, so I've manually added it in now : 
build#3833 should test this.


was (Author: sushanth):
I did not find this in the precommit queue, so I've manually added it in now : 
build#3815 should test this.

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10394) LLAP: Notify AM of pre-emption

2015-05-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10394:
-
Attachment: HIVE-10394.1.patch

 LLAP: Notify AM of pre-emption
 --

 Key: HIVE-10394
 URL: https://issues.apache.org/jira/browse/HIVE-10394
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10394.1.patch


 Pre-empted tasks should be notified to AM as killed/interrupted by system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-6424) webhcat.jar no longer includes webhcat-lo4j.properties

2015-05-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-6424.
--
Resolution: Implemented
  Assignee: Eugene Koifman

The same changes as in the attached patch are already present in the codebase.

 webhcat.jar no longer includes webhcat-lo4j.properties
 --

 Key: HIVE-6424
 URL: https://issues.apache.org/jira/browse/HIVE-6424
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure, WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: hive6424.patch


 pre Maven switch, webhcat-log4j.properties and webhcat-default.xml were at 
 the root of hive-webhcat-0.13.0-SNAPSHOT.jar.  They are no longer there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10651) ORC file footer cache should be bounded

2015-05-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535371#comment-14535371
 ] 

Sergey Shelukhin commented on HIVE-10651:
-

+1

 ORC file footer cache should be bounded
 ---

 Key: HIVE-10651
 URL: https://issues.apache.org/jira/browse/HIVE-10651
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-10651.1.patch


 ORC's file footer cache is currently unbounded and is a soft reference cache. 
 The cache size got from config is used to set initial capacity. We should 
 bound the cache from growing too big and to get a predictable performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10548) Remove dependency to s3 repository in root pom

2015-05-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535482#comment-14535482
 ] 

Hive QA commented on HIVE-10548:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731098/HIVE-10548.2.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8915 tests executed
*Failed tests:*
{noformat}
TestSchedulerQueue - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3813/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3813/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3813/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731098 - PreCommit-HIVE-TRUNK-Build

 Remove dependency to s3 repository in root pom
 --

 Key: HIVE-10548
 URL: https://issues.apache.org/jira/browse/HIVE-10548
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Szehon Ho
Assignee: Chengxiang Li
 Attachments: HIVE-10548.2.patch, HIVE-10548.2.patch, HIVE-10548.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10394) LLAP: Notify AM of pre-emption

2015-05-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10394:
-
Attachment: (was: HIVE-10394.1.patch)

 LLAP: Notify AM of pre-emption
 --

 Key: HIVE-10394
 URL: https://issues.apache.org/jira/browse/HIVE-10394
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran

 Pre-empted tasks should be notified to AM as killed/interrupted by system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10394) LLAP: Notify AM of pre-emption

2015-05-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10394:
-
Attachment: HIVE-10394.1.patch

The notification to AM is yet to be hooked up. This patch currently adds 
pre-emption of requests that are already in wait queue.

 LLAP: Notify AM of pre-emption
 --

 Key: HIVE-10394
 URL: https://issues.apache.org/jira/browse/HIVE-10394
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10394.1.patch


 Pre-empted tasks should be notified to AM as killed/interrupted by system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10660) Fix typo in Type.getType(TTypeId) exception message

2015-05-08 Thread Keegan Witt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Witt updated HIVE-10660:
---
Description: 
{{org.apache.hive.service.cli.Type.getType(org.apache.hive.service.cli.thrift.TTypeId)}}
 throws an _IllegalArgumentException_ with 'Unrecognized' misspelled as 
'Unregonized'.  (was: 
{{org.apache.hive.service.cli.Type.getType(org.apache.hive.service.cli.thrift.TTypeId)}}
 throws an _IllegalArgumentException_ with 'Unrecognized' misspelled as 
'Unrecognized'.)

 Fix typo in Type.getType(TTypeId) exception message
 ---

 Key: HIVE-10660
 URL: https://issues.apache.org/jira/browse/HIVE-10660
 Project: Hive
  Issue Type: Bug
Reporter: Keegan Witt
Assignee: Keegan Witt
Priority: Trivial
 Attachments: HIVE-10660.patch


 {{org.apache.hive.service.cli.Type.getType(org.apache.hive.service.cli.thrift.TTypeId)}}
  throws an _IllegalArgumentException_ with 'Unrecognized' misspelled as 
 'Unregonized'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10568) Select count(distinct()) can have more optimal execution plan

2015-05-08 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535603#comment-14535603
 ] 

Laljo John Pullokkaran commented on HIVE-10568:
---

+1

 Select count(distinct()) can have more optimal execution plan
 -

 Key: HIVE-10568
 URL: https://issues.apache.org/jira/browse/HIVE-10568
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 
 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, 
 HIVE-10568.patch, HIVE-10568.patch


 {code:sql}
 select count(distinct ss_ticket_number) from store_sales;
 {code}
 can be rewritten as
 {code:sql}
 select count(1) from (select distinct ss_ticket_number from store_sales) a;
 {code}
 which may run upto 3x faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10639) create SHA1 UDF

2015-05-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534462#comment-14534462
 ] 

Hive QA commented on HIVE-10639:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731070/HIVE-10639.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 8914 tests executed
*Failed tests:*
{noformat}
TestSparkClient - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3810/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3810/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3810/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731070 - PreCommit-HIVE-TRUNK-Build

 create SHA1 UDF
 ---

 Key: HIVE-10639
 URL: https://issues.apache.org/jira/browse/HIVE-10639
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10639.1.patch


 Calculates an SHA-1 160-bit checksum for the string and binary, as described 
 in RFC 3174 (Secure Hash Algorithm). The value is returned as a string of 40 
 hex digits, or NULL if the argument was NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10640) Vectorized query with NULL constant throws Unsuported vector output type: void error

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10640:

Fix Version/s: (was: 1.2.0)

 Vectorized query with NULL constant  throws Unsuported vector output type: 
 void error
 ---

 Key: HIVE-10640
 URL: https://issues.apache.org/jira/browse/HIVE-10640
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0


 This query from join_nullsafe.q when vectorized throws Unsuported vector 
 output type: void during execution...
 {noformat}
 select * from myinput1 a join myinput1 b on a.key=b.value AND a.key is NULL;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535750#comment-14535750
 ] 

Sushanth Sowmyan commented on HIVE-10463:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 CBO (Calcite Return Path): Insert overwrite... select * from... queries 
 failing for bucketed tables
 ---

 Key: HIVE-10463
 URL: https://issues.apache.org/jira/browse/HIVE-10463
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Laljo John Pullokkaran

 When return path is on. To reproduce the Exception, take the following 
 excerpt from auto_sortmerge_join_10.q:
 {noformat}
 set hive.enforce.bucketing = true;
 set hive.enforce.sorting = true;
 set hive.exec.reducers.max = 1;
 CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
 INTO 2 BUCKETS;
 insert overwrite table tbl1
 select * from src where key  10;
 {noformat}
 It produces the following Exception:
 {noformat}
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157)
 ... 14 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1]
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150)
 ... 14 more
 Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 
 1:_col1]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383)
 ... 22 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10412) CBO : Calculate join selectivity when computing HiveJoin cost

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10412:

Fix Version/s: (was: 1.2.0)

 CBO : Calculate join selectivity when computing HiveJoin cost
 -

 Key: HIVE-10412
 URL: https://issues.apache.org/jira/browse/HIVE-10412
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran

 This is from TPC-DS Q7
 Because we don't compute the selectivity of sub-expression in a HiveJoin we 
 assume that selective and non-selective joins have the similar cost.
 {code}
 select  i_item_id, 
 avg(ss_quantity) agg1,
 avg(ss_list_price) agg2,
 avg(ss_coupon_amt) agg3,
 avg(ss_sales_price) agg4 
  from store_sales, customer_demographics, item
  where store_sales.ss_item_sk = item.i_item_sk and
store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and
cd_gender = 'F' and 
cd_marital_status = 'W' and
cd_education_status = 'Primary'
  group by i_item_id
  order by i_item_id
  limit 100
 {code}
 Cardinality 
 {code}
 item 462,000
 customer_demographics 1,920,800
 store_sales 82,510,879,939
 {code}
 NDVs
 {code}
 item.i_item_sk 439501
 customer_demographics.cd_demo_sk 1835839
 store_sales.ss_cdemo_sk 1835839
 {code}
 From the logs 
 {code}
 2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
 HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], 
 cost=[not available])
   HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], 
 cost=[{8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 
 io}])
 HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], 
 ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
   HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]])
 HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], 
 cd_education_status=[$3])
   HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
 
 HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]])
   HiveProject(i_item_sk=[$0], i_item_id=[$1])
 HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]])
 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 
 rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io}
 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 
 rows, 2.1362E11 cpu, 1.07207098E7 io}
 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(78)) - MapJoin selected
 2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
 HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not 
 available])
   HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], 
 cost=[{8.2511341939E10 rows, 2.1362E11 cpu, 1.07207098E7 io}])
 HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], 
 ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
   HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]])
 HiveProject(i_item_sk=[$0], i_item_id=[$1])
   HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]])
   HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], 
 cd_education_status=[$3])
 HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
   
 HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]])
 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.25108951834E10 
 rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io}
 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.25108951834E10 
 rows, 2.324083308641975E8 cpu, 275417.56 io}
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535752#comment-14535752
 ] 

Sushanth Sowmyan commented on HIVE-10415:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 hive.start.cleanup.scratchdir configuration is not taking effect
 

 Key: HIVE-10415
 URL: https://issues.apache.org/jira/browse/HIVE-10415
 Project: Hive
  Issue Type: Bug
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-10415.patch


 This configuration hive.start.cleanup.scratchdir is not taking effect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10412) CBO : Calculate join selectivity when computing HiveJoin cost

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535756#comment-14535756
 ] 

Sushanth Sowmyan commented on HIVE-10412:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 CBO : Calculate join selectivity when computing HiveJoin cost
 -

 Key: HIVE-10412
 URL: https://issues.apache.org/jira/browse/HIVE-10412
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran

 This is from TPC-DS Q7
 Because we don't compute the selectivity of sub-expression in a HiveJoin we 
 assume that selective and non-selective joins have the similar cost.
 {code}
 select  i_item_id, 
 avg(ss_quantity) agg1,
 avg(ss_list_price) agg2,
 avg(ss_coupon_amt) agg3,
 avg(ss_sales_price) agg4 
  from store_sales, customer_demographics, item
  where store_sales.ss_item_sk = item.i_item_sk and
store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and
cd_gender = 'F' and 
cd_marital_status = 'W' and
cd_education_status = 'Primary'
  group by i_item_id
  order by i_item_id
  limit 100
 {code}
 Cardinality 
 {code}
 item 462,000
 customer_demographics 1,920,800
 store_sales 82,510,879,939
 {code}
 NDVs
 {code}
 item.i_item_sk 439501
 customer_demographics.cd_demo_sk 1835839
 store_sales.ss_cdemo_sk 1835839
 {code}
 From the logs 
 {code}
 2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
 HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], 
 cost=[not available])
   HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], 
 cost=[{8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 
 io}])
 HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], 
 ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
   HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]])
 HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], 
 cd_education_status=[$3])
   HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
 
 HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]])
   HiveProject(i_item_sk=[$0], i_item_id=[$1])
 HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]])
 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 
 rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io}
 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 
 rows, 2.1362E11 cpu, 1.07207098E7 io}
 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(78)) - MapJoin selected
 2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
 HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not 
 available])
   HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], 
 cost=[{8.2511341939E10 rows, 2.1362E11 cpu, 1.07207098E7 io}])
 HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], 
 ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
   HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]])
 HiveProject(i_item_sk=[$0], i_item_id=[$1])
   HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]])
   HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], 
 cd_education_status=[$3])
 HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
   
 HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]])
 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.25108951834E10 
 rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io}
 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel 
 (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.25108951834E10 
 rows, 2.324083308641975E8 cpu, 275417.56 io}
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10304) Add deprecation message to HiveCLI

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10304:

Fix Version/s: (was: 1.2.0)

 Add deprecation message to HiveCLI
 --

 Key: HIVE-10304
 URL: https://issues.apache.org/jira/browse/HIVE-10304
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 1.1.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC1.2
 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch


 As Beeline is now the recommended command line tool to Hive, we should add a 
 message to HiveCLI to indicate that it is deprecated and redirect them to 
 Beeline.  
 This is not suggesting to remove HiveCLI for now, but just a helpful 
 direction for user to know the direction to focus attention in Beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9012) Not able to move and populate the data fully on to the table when the scratch directory is on S3

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9012:
---
Fix Version/s: (was: 0.13.1)

 Not able to move and populate the data fully on to the table when the scratch 
 directory is on S3
 

 Key: HIVE-9012
 URL: https://issues.apache.org/jira/browse/HIVE-9012
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
 Environment: Amazon AMI and S3 as storage service
Reporter: Kolluru Som Shekhar Sharma
Priority: Blocker
   Original Estimate: 504h
  Remaining Estimate: 504h

 I have set the hive.exec.scratchDir to point to a directory on S3 and 
 external table is on S3 level. 
 I ran a simple query which extracts the key value pairs from JSON string 
 without any WHERE clause, and the about of data is ~500GB.  The query ran 
 fine, but when it is trying to move the data from the scratch directory it 
 doesn't complete. So i need to kill the process and manually need to move the 
 data.
 The data size in the scratch directory was nearly ~550GB
 I tried the same scenario with less data and putting where clause, it 
 completed successfully and data also gets populated in the table. I checked 
 the size in the table and in the scratch directory. The data in the table was 
 showing 2MB and the data in the scratch directory is 48.6GB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9842) Enable session/operation timeout by default in HiveServer2

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9842:
---
Fix Version/s: (was: 1.2.0)

 Enable session/operation timeout by default in HiveServer2
 --

 Key: HIVE-9842
 URL: https://issues.apache.org/jira/browse/HIVE-9842
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-9842.1.patch


 HIVE-5799 introduced a session/operation timeout which cleans up abandoned 
 session and op handles. Currently, the default is set to no-op. We should set 
 it to some reasonable value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9842) Enable session/operation timeout by default in HiveServer2

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535778#comment-14535778
 ] 

Sushanth Sowmyan commented on HIVE-9842:


Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 Enable session/operation timeout by default in HiveServer2
 --

 Key: HIVE-9842
 URL: https://issues.apache.org/jira/browse/HIVE-9842
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 1.2.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-9842.1.patch


 HIVE-5799 introduced a session/operation timeout which cleans up abandoned 
 session and op handles. Currently, the default is set to no-op. We should set 
 it to some reasonable value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8218) function registry shared across sessions in HiveServer2

2015-05-08 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8218:
-
Release Note:   (was: This should be fixed by HIVE-2573)

 function registry shared across sessions in HiveServer2
 ---

 Key: HIVE-8218
 URL: https://issues.apache.org/jira/browse/HIVE-8218
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, UDF
Reporter: Thejas M Nair

 FunctionRegistry.mFunctions is static. That mean that in HS2 case, all users 
 will have same set of valid udfs. Ie, add/delete temporary function by one 
 user would affect the namespace of other users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8218) function registry shared across sessions in HiveServer2

2015-05-08 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535786#comment-14535786
 ] 

Jason Dere commented on HIVE-8218:
--

This should be fixed by HIVE-2573

 function registry shared across sessions in HiveServer2
 ---

 Key: HIVE-8218
 URL: https://issues.apache.org/jira/browse/HIVE-8218
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, UDF
Reporter: Thejas M Nair

 FunctionRegistry.mFunctions is static. That mean that in HS2 case, all users 
 will have same set of valid udfs. Ie, add/delete temporary function by one 
 user would affect the namespace of other users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10107) Union All : Vertex missing stats resulting in OOM and in-efficient plans

2015-05-08 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10107:
--
Assignee: Pengcheng Xiong

 Union All : Vertex missing stats resulting in OOM and in-efficient plans
 

 Key: HIVE-10107
 URL: https://issues.apache.org/jira/browse/HIVE-10107
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong

 Reducer Vertices sending data to a Union all edge are missing statistics and 
 as a result we either use very few reducers in the UNION ALL edge or decide 
 to broadcast the results of UNION ALL.
 Query
 {code}
 select 
 count(*) rowcount
 from
 (select 
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales a, store_returns b
 where
 a.ss_item_sk = b.sr_item_sk
 and a.ss_ticket_number = b.sr_ticket_number union all select 
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales c, store_returns d
 where
 c.ss_item_sk = d.sr_item_sk
 and c.ss_ticket_number = d.sr_ticket_number) t
 group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
 having rowcount  1;
 {code}
 Plan snippet 
 {code}
  Edges:
 Reducer 2 - Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
 (CONTAINS)
 Reducer 4 - Union 3 (SIMPLE_EDGE)
 Reducer 7 - Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
 (CONTAINS)
   Reducer 4
 Reduce Operator Tree:
   Group By Operator
 aggregations: count(VALUE._col0)
 keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 
 (type: int)
 mode: mergepartial
 outputColumnNames: _col0, _col1, _col2, _col3
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: COMPLETE
 Filter Operator
   predicate: (_col3  1) (type: boolean)
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
 Column stats: COMPLETE
   Select Operator
 expressions: _col3 (type: bigint)
 outputColumnNames: _col0
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
 Column stats: COMPLETE
 File Output Operator
   compressed: false
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
 Column stats: COMPLETE
   table:
   input format: 
 org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   serde: 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 Reducer 7
 Reduce Operator Tree:
   Merge Join Operator
 condition map:
  Inner Join 0 to 1
 keys:
   0 ss_item_sk (type: int), ss_ticket_number (type: int)
   1 sr_item_sk (type: int), sr_ticket_number (type: int)
 outputColumnNames: _col1, _col6, _col8, _col27, _col34
 Filter Operator
   predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: 
 boolean)
   Select Operator
 expressions: _col1 (type: int), _col8 (type: int), _col6 
 (type: int)
 outputColumnNames: _col0, _col1, _col2
 Group By Operator
   aggregations: count()
   keys: _col2 (type: int), _col0 (type: int), _col1 
 (type: int)
   mode: hash
   outputColumnNames: _col0, _col1, _col2, _col3
   Reduce Output Operator
 key expressions: _col0 (type: int), _col1 (type: 
 int), _col2 (type: int)
 sort order: +++
 Map-reduce partition columns: _col0 (type: int), 
 _col1 (type: int), _col2 (type: int)
 value expressions: _col3 (type: bigint)
 {code}
 The full explain plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Reducer 2 - Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
 (CONTAINS)
 Reducer 4 - Union 3 (SIMPLE_EDGE)
 Reducer 7 - Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
 (CONTAINS)
   DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   

[jira] [Commented] (HIVE-10629) Dropping table in an encrypted zone does not drop warehouse directory

2015-05-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535858#comment-14535858
 ] 

Eugene Koifman commented on HIVE-10629:
---

Commit for HIVE-9264 includes a number for DROP TABLE statements in .q files.
The default value for fs.trash.interval is 0, i.e. disable trash, and we don't 
seem to override it in unit tests.



 Dropping table in an encrypted zone does not drop warehouse directory
 -

 Key: HIVE-10629
 URL: https://issues.apache.org/jira/browse/HIVE-10629
 Project: Hive
  Issue Type: Sub-task
  Components: Security
Reporter: Deepesh Khandelwal
Assignee: Eugene Koifman

 Drop table in an encrypted zone removes the table but not its data. The 
 client sees the following on Hive CLI:
 {noformat}
 hive drop table testtbl;
 OK
 Time taken: 0.158 seconds
 {noformat}
 On the Hive Metastore log following error is thrown:
 {noformat}
 2015-05-05 08:55:27,665 ERROR [pool-6-thread-142]: hive.log 
 (MetaStoreUtils.java:logAndThrowMetaException(1200)) - Got exception: 
 java.io.IOException Failed to move to trash: 
 hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl
 java.io.IOException: Failed to move to trash: 
 hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl
 at 
 org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160)
 at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:114)
 at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:95)
 at 
 org.apache.hadoop.hive.shims.Hadoop23Shims.moveToAppropriateTrash(Hadoop23Shims.java:270)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreFsImpl.deleteDir(HiveMetaStoreFsImpl.java:47)
 at 
 org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:229)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.deleteTableData(HiveMetaStore.java:1584)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1552)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1705)
 at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
 at com.sun.proxy.$Proxy13.drop_table_with_environment_context(Unknown 
 Source)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:9256)
 
 {noformat}
 The client should throw the error and maybe fail the drop table call. To 
 delete the table data one currently has to use {{drop table testtbl purge}} 
 which basically remove the table data permanently skipping trash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10325) Remove ExprNodeNullEvaluator

2015-05-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536126#comment-14536126
 ] 

Hive QA commented on HIVE-10325:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731244/HIVE-10325.2.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8919 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
org.apache.hive.service.cli.thrift.TestThriftHttpCLIService.testAdditionalHttpHeaders
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3818/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3818/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3818/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731244 - PreCommit-HIVE-TRUNK-Build

 Remove ExprNodeNullEvaluator
 

 Key: HIVE-10325
 URL: https://issues.apache.org/jira/browse/HIVE-10325
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10325.1.patch, HIVE-10325.2.patch, HIVE-10325.patch


 since its purpose can instead be served by ExprNodeConstantEvaluator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10565:

Fix Version/s: (was: 1.2.0)

 LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
 OUTER JOIN repeated key correctly
 

 Key: HIVE-10565
 URL: https://issues.apache.org/jira/browse/HIVE-10565
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0

 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
 HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
 HIVE-10565.06.patch, HIVE-10565.07.patch, HIVE-10565.08.patch, 
 HIVE-10565.09.patch, HIVE-10565.091.patch


 Filtering can knock out some of the rows for a repeated key, but those 
 knocked out rows need to be included in the LEFT OUTER JOIN result and are 
 currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535723#comment-14535723
 ] 

Sushanth Sowmyan commented on HIVE-10565:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
 OUTER JOIN repeated key correctly
 

 Key: HIVE-10565
 URL: https://issues.apache.org/jira/browse/HIVE-10565
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0

 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
 HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
 HIVE-10565.06.patch, HIVE-10565.07.patch, HIVE-10565.08.patch, 
 HIVE-10565.09.patch, HIVE-10565.091.patch


 Filtering can knock out some of the rows for a repeated key, but those 
 knocked out rows need to be included in the LEFT OUTER JOIN result and are 
 currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10463:

Fix Version/s: (was: 1.2.0)

 CBO (Calcite Return Path): Insert overwrite... select * from... queries 
 failing for bucketed tables
 ---

 Key: HIVE-10463
 URL: https://issues.apache.org/jira/browse/HIVE-10463
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Laljo John Pullokkaran

 When return path is on. To reproduce the Exception, take the following 
 excerpt from auto_sortmerge_join_10.q:
 {noformat}
 set hive.enforce.bucketing = true;
 set hive.enforce.sorting = true;
 set hive.exec.reducers.max = 1;
 CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
 INTO 2 BUCKETS;
 insert overwrite table tbl1
 select * from src where key  10;
 {noformat}
 It produces the following Exception:
 {noformat}
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157)
 ... 14 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1]
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150)
 ... 14 more
 Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 
 1:_col1]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383)
 ... 22 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10415:

Fix Version/s: (was: 1.2.0)

 hive.start.cleanup.scratchdir configuration is not taking effect
 

 Key: HIVE-10415
 URL: https://issues.apache.org/jira/browse/HIVE-10415
 Project: Hive
  Issue Type: Bug
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-10415.patch


 This configuration hive.start.cleanup.scratchdir is not taking effect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535759#comment-14535759
 ] 

Sushanth Sowmyan commented on HIVE-10304:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 Add deprecation message to HiveCLI
 --

 Key: HIVE-10304
 URL: https://issues.apache.org/jira/browse/HIVE-10304
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 1.1.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC1.2
 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch


 As Beeline is now the recommended command line tool to Hive, we should add a 
 message to HiveCLI to indicate that it is deprecated and redirect them to 
 Beeline.  
 This is not suggesting to remove HiveCLI for now, but just a helpful 
 direction for user to know the direction to focus attention in Beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10194) CBO (Calcite Return Path): Equi join followed by theta join produces a cross product

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10194:

Fix Version/s: (was: 1.2.0)

 CBO (Calcite Return Path): Equi join followed by theta join produces a cross 
 product
 

 Key: HIVE-10194
 URL: https://issues.apache.org/jira/browse/HIVE-10194
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran

 Query 
 {code}
 SELECT count(distinct ws_order_number) as order_count,
sum(ws_ext_ship_cost) as total_shipping_cost,
sum(ws_net_profit) as total_net_profit
 FROM web_sales ws1
 JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
 JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
 JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
 LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
FROM web_sales ws2 JOIN web_sales ws3
ON (ws2.ws_order_number = ws3.ws_order_number)
WHERE ws2.ws_warehouse_sk  
 ws3.ws_warehouse_sk
   ) ws_wh1
 ON (ws1.ws_order_number = ws_wh1.ws_order_number)
 LEFT OUTER JOIN web_returns wr1 ON (ws1.ws_order_number = wr1.wr_order_number)
 WHERE d.d_date between '1999-05-01' and '1999-07-01' and
ca.ca_state = 'TX' and
s.web_company_name = 'pri' and
wr1.wr_order_number is null
 limit 100
 {code}
 Plan
 {code}
 OK
 Time taken: 0.23 seconds
 Warning: Map Join MAPJOIN[83][bigTable=ws1] in task 'Map 2' is a cross product
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 2 - Map 1 (BROADCAST_EDGE)
 Map 8 - Reducer 4 (BROADCAST_EDGE)
 Reducer 3 - Map 2 (SIMPLE_EDGE), Map 5 (BROADCAST_EDGE), Map 6 
 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE)
 Reducer 4 - Map 10 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
 Reducer 9 - Map 8 (SIMPLE_EDGE)
   DagName: mmokhtar_20150402132417_1bc8688b-59a0-4909-82a4-b9d386065bbd:3
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: ws1
   filterExpr: (((ws_ship_addr_sk = ws_order_number) and 
 (ws_ship_date_sk  ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: 
 boolean)
   Statistics: Num rows: 143966864 Data size: 33110363004 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((ws_ship_addr_sk = ws_order_number) and 
 (ws_ship_date_sk  ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: 
 boolean)
 Statistics: Num rows: 71974471 Data size: 1151483592 
 Basic stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: ws_ship_addr_sk (type: int)
   outputColumnNames: _col1
   Statistics: Num rows: 71974471 Data size: 287862044 
 Basic stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 sort order:
 Statistics: Num rows: 71974471 Data size: 287862044 
 Basic stats: COMPLETE Column stats: COMPLETE
 value expressions: _col1 (type: int)
 Execution mode: vectorized
 Map 10
 Map Operator Tree:
 TableScan
   alias: wr1
   Statistics: Num rows: 13749816 Data size: 2585240312 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: wr_order_number (type: int)
 sort order: +
 Map-reduce partition columns: wr_order_number (type: int)
 Statistics: Num rows: 13749816 Data size: 2585240312 
 Basic stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Map 2
 Map Operator Tree:
 TableScan
   alias: ws1
   Statistics: Num rows: 143966864 Data size: 33110363004 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 keys:
   0
   1
 outputColumnNames: _col1
 input vertices:
   0 Map 1
 Statistics: Num rows: 5180969438964472 Data size: 
 20723877755857888 Basic stats: COMPLETE Column stats: COMPLETE
 Select 

[jira] [Commented] (HIVE-10194) CBO (Calcite Return Path): Equi join followed by theta join produces a cross product

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535764#comment-14535764
 ] 

Sushanth Sowmyan commented on HIVE-10194:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 CBO (Calcite Return Path): Equi join followed by theta join produces a cross 
 product
 

 Key: HIVE-10194
 URL: https://issues.apache.org/jira/browse/HIVE-10194
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran

 Query 
 {code}
 SELECT count(distinct ws_order_number) as order_count,
sum(ws_ext_ship_cost) as total_shipping_cost,
sum(ws_net_profit) as total_net_profit
 FROM web_sales ws1
 JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
 JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
 JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
 LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
FROM web_sales ws2 JOIN web_sales ws3
ON (ws2.ws_order_number = ws3.ws_order_number)
WHERE ws2.ws_warehouse_sk  
 ws3.ws_warehouse_sk
   ) ws_wh1
 ON (ws1.ws_order_number = ws_wh1.ws_order_number)
 LEFT OUTER JOIN web_returns wr1 ON (ws1.ws_order_number = wr1.wr_order_number)
 WHERE d.d_date between '1999-05-01' and '1999-07-01' and
ca.ca_state = 'TX' and
s.web_company_name = 'pri' and
wr1.wr_order_number is null
 limit 100
 {code}
 Plan
 {code}
 OK
 Time taken: 0.23 seconds
 Warning: Map Join MAPJOIN[83][bigTable=ws1] in task 'Map 2' is a cross product
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 2 - Map 1 (BROADCAST_EDGE)
 Map 8 - Reducer 4 (BROADCAST_EDGE)
 Reducer 3 - Map 2 (SIMPLE_EDGE), Map 5 (BROADCAST_EDGE), Map 6 
 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE)
 Reducer 4 - Map 10 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
 Reducer 9 - Map 8 (SIMPLE_EDGE)
   DagName: mmokhtar_20150402132417_1bc8688b-59a0-4909-82a4-b9d386065bbd:3
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: ws1
   filterExpr: (((ws_ship_addr_sk = ws_order_number) and 
 (ws_ship_date_sk  ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: 
 boolean)
   Statistics: Num rows: 143966864 Data size: 33110363004 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((ws_ship_addr_sk = ws_order_number) and 
 (ws_ship_date_sk  ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: 
 boolean)
 Statistics: Num rows: 71974471 Data size: 1151483592 
 Basic stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: ws_ship_addr_sk (type: int)
   outputColumnNames: _col1
   Statistics: Num rows: 71974471 Data size: 287862044 
 Basic stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 sort order:
 Statistics: Num rows: 71974471 Data size: 287862044 
 Basic stats: COMPLETE Column stats: COMPLETE
 value expressions: _col1 (type: int)
 Execution mode: vectorized
 Map 10
 Map Operator Tree:
 TableScan
   alias: wr1
   Statistics: Num rows: 13749816 Data size: 2585240312 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: wr_order_number (type: int)
 sort order: +
 Map-reduce partition columns: wr_order_number (type: int)
 Statistics: Num rows: 13749816 Data size: 2585240312 
 Basic stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Map 2
 Map Operator Tree:
 TableScan
   alias: ws1
   Statistics: Num rows: 143966864 Data size: 33110363004 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 keys:
   0
   1
 outputColumnNames: _col1
 input vertices:
   0 Map 1
 Statistics: Num rows: 

[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10165:

Fix Version/s: (was: 1.2.0)

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Attachments: HIVE-10165.0.patch


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.
 h3. Implementation
 Our changes do not break the existing API contracts. Instead our approach has 
 been to consider the functionality offered by the existing API and our 
 proposed API as fulfilling separate and distinct use-cases. The existing API 
 is primarily focused on the task of continuously writing large volumes of new 
 data into a Hive table for near-immediate analysis. Our use-case however, is 
 concerned more with the frequent but not continuous ingestion of mutations to 
 a Hive table from some ETL merge process. Consequently we feel it is 
 justifiable to add our new functionality via an alternative set of public 
 interfaces and leave the existing API as is. This keeps both APIs clean and 
 focused at the expense of presenting additional options to potential users. 
 Wherever possible, shared implementation concerns have been factored out into 
 abstract base classes that are open to third-party extension. A detailed 
 breakdown of the changes is as follows:
 * We've introduced a public {{RecordMutator}} interface whose purpose is to 
 expose insert/update/delete operations to the user. This is a counterpart to 
 the write-only {{RecordWriter}}. We've also factored out life-cycle methods 
 common to these two interfaces into a super {{RecordOperationWriter}} 
 interface.  Note that the row representation has be changed from {{byte[]}} 
 to {{Object}}. Within our data processing jobs our records are often 
 available in a strongly typed and decoded form such as a POJO or a Tuple 
 object. Therefore is seems to make sense that we are able to pass this 
 through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} 
 encoding step. This of course still allows users to use {{byte[]}} if they 
 wish.
 * The introduction of {{RecordMutator}} requires that insert/update/delete 
 operations are then also exposed on a {{TransactionBatch}} type. We've done 
 this with the introduction of a public {{MutatorTransactionBatch}} interface 
 which is a counterpart to the write-only {{TransactionBatch}}. We've also 
 factored out life-cycle methods common to these two interfaces into a super 
 {{BaseTransactionBatch}} interface. 
 * Functionality that would be shared by implementations of both 
 {{RecordWriters}} and {{RecordMutators}} has been factored out of 
 {{AbstractRecordWriter}} into a new abstract base class 
 

[jira] [Resolved] (HIVE-8218) function registry shared across sessions in HiveServer2

2015-05-08 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere resolved HIVE-8218.
--
  Resolution: Duplicate
Release Note: This should be fixed by HIVE-2573

 function registry shared across sessions in HiveServer2
 ---

 Key: HIVE-8218
 URL: https://issues.apache.org/jira/browse/HIVE-8218
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, UDF
Reporter: Thejas M Nair

 FunctionRegistry.mFunctions is static. That mean that in HS2 case, all users 
 will have same set of valid udfs. Ie, add/delete temporary function by one 
 user would affect the namespace of other users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10659) Beeline commands which contains semi-colon as a non-command terminator will fail

2015-05-08 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10659:
-
Attachment: HIVE-10659.1.patch

cc-ing [~thejas] / [~sushanth] for review. I tested this with the initially 
committed patch for HIVE-7018 and it seems to resolve the issue that we found 
in HIVE-10614. Once this patch goes in, we should be able to get HIVE-7018 in 
as well.

Thanks
Hari

 Beeline commands which contains semi-colon as a non-command terminator will 
 fail
 

 Key: HIVE-10659
 URL: https://issues.apache.org/jira/browse/HIVE-10659
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10659.1.patch


 Consider beeline for connecting to mysql and creating commands involving 
 stored procedures. MySQL stored procedures have semi-colon as the statement 
 terminator. Since this coincides with beeline's only available command 
 terminator , semi-colon, beeline will not able to execute the original 
 command successfully. 
 The above scenario can happen when Hive SchemaTool is used to upgrade a mysql 
 metastore db which contains stored procedure in the script(as the one 
 introduced initially by HIVE-7018). As of now, we cannot have any stored 
 procedures as part of MySQL scripts because schemaTool uses beeline as the 
 jdbc client to connect to MySQL. This is a serious limitation and needs to be 
 fixed by providing an option to beeline to not use ; as the command 
 delimiter and process the entire line send to it as a single command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10629) Dropping table in an encrypted zone does not drop warehouse directory

2015-05-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-10629:
-

Assignee: Eugene Koifman

 Dropping table in an encrypted zone does not drop warehouse directory
 -

 Key: HIVE-10629
 URL: https://issues.apache.org/jira/browse/HIVE-10629
 Project: Hive
  Issue Type: Sub-task
  Components: Security
Reporter: Deepesh Khandelwal
Assignee: Eugene Koifman

 Drop table in an encrypted zone removes the table but not its data. The 
 client sees the following on Hive CLI:
 {noformat}
 hive drop table testtbl;
 OK
 Time taken: 0.158 seconds
 {noformat}
 On the Hive Metastore log following error is thrown:
 {noformat}
 2015-05-05 08:55:27,665 ERROR [pool-6-thread-142]: hive.log 
 (MetaStoreUtils.java:logAndThrowMetaException(1200)) - Got exception: 
 java.io.IOException Failed to move to trash: 
 hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl
 java.io.IOException: Failed to move to trash: 
 hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl
 at 
 org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160)
 at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:114)
 at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:95)
 at 
 org.apache.hadoop.hive.shims.Hadoop23Shims.moveToAppropriateTrash(Hadoop23Shims.java:270)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreFsImpl.deleteDir(HiveMetaStoreFsImpl.java:47)
 at 
 org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:229)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.deleteTableData(HiveMetaStore.java:1584)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1552)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1705)
 at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
 at com.sun.proxy.$Proxy13.drop_table_with_environment_context(Unknown 
 Source)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:9256)
 
 {noformat}
 The client should throw the error and maybe fail the drop table call. To 
 delete the table data one currently has to use {{drop table testtbl purge}} 
 which basically remove the table data permanently skipping trash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535870#comment-14535870
 ] 

Sushanth Sowmyan commented on HIVE-10609:
-

Per discussion with Mostafa, I'll add this to the tentative list for 1.2 - 
i.e., it will not be considered a release blocker for 1.2.0, but if it gets 
done before the RC process ends, we will include it in the next RC being built, 
and include it for 1.2.0. Otherwise, it will make it in a stabilization 1.2.1 
release.

I'll update the fix version of this bug with the appropriate version at that 
time.

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline

 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 

[jira] [Commented] (HIVE-9730) make sure logging is never called when not needed in perf-sensitive places

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535885#comment-14535885
 ] 

Sushanth Sowmyan commented on HIVE-9730:


Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 make sure logging is never called when not needed in perf-sensitive places
 --

 Key: HIVE-9730
 URL: https://issues.apache.org/jira/browse/HIVE-9730
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-9730.patch, log4j-llap.png


 log4j logging has really inefficient serialization
 !log4j-llap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9730) make sure logging is never called when not needed in perf-sensitive places

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9730:
---
Fix Version/s: (was: 1.2.0)

 make sure logging is never called when not needed in perf-sensitive places
 --

 Key: HIVE-9730
 URL: https://issues.apache.org/jira/browse/HIVE-9730
 Project: Hive
  Issue Type: Improvement
  Components: Logging
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-9730.patch, log4j-llap.png


 log4j logging has really inefficient serialization
 !log4j-llap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9736:
---
Affects Version/s: 1.2.0

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Affects Versions: 1.2.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10566) LLAP: Vector row extraction allocates new extractors per process method call instead of just once

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10566:

Fix Version/s: (was: 1.2.0)

 LLAP: Vector row extraction allocates new extractors per process method call 
 instead of just once
 -

 Key: HIVE-10566
 URL: https://issues.apache.org/jira/browse/HIVE-10566
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0


 Extractors for unused columns (common for tables with many columns) are 
 created for each batch instead of just once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10609:

Fix Version/s: (was: 1.2.0)

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline

 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = 

[jira] [Commented] (HIVE-10628) Incorrect result when vectorized native mapjoin is enabled using null safe operators =

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535719#comment-14535719
 ] 

Sushanth Sowmyan commented on HIVE-10628:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 Incorrect result when vectorized native mapjoin is enabled using null safe 
 operators =
 

 Key: HIVE-10628
 URL: https://issues.apache.org/jira/browse/HIVE-10628
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0

 Attachments: HIVE-10628.01.patch


 Incorrect results for this query:
 {noformat}
 select count(*) from store_sales ss join store_returns sr on (sr.sr_item_sk 
 = ss.ss_item_sk and sr.sr_customer_sk = ss.ss_customer_sk and 
 sr.sr_item_sk = ss.ss_item_sk) where ss.ss_net_paid  1000;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10628) Incorrect result when vectorized native mapjoin is enabled using null safe operators =

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10628:

Fix Version/s: (was: 1.2.0)

 Incorrect result when vectorized native mapjoin is enabled using null safe 
 operators =
 

 Key: HIVE-10628
 URL: https://issues.apache.org/jira/browse/HIVE-10628
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0

 Attachments: HIVE-10628.01.patch


 Incorrect results for this query:
 {noformat}
 select count(*) from store_sales ss join store_returns sr on (sr.sr_item_sk 
 = ss.ss_item_sk and sr.sr_customer_sk = ss.ss_customer_sk and 
 sr.sr_item_sk = ss.ss_item_sk) where ss.ss_net_paid  1000;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10640) Vectorized query with NULL constant throws Unsuported vector output type: void error

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535718#comment-14535718
 ] 

Sushanth Sowmyan commented on HIVE-10640:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 Vectorized query with NULL constant  throws Unsuported vector output type: 
 void error
 ---

 Key: HIVE-10640
 URL: https://issues.apache.org/jira/browse/HIVE-10640
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0


 This query from join_nullsafe.q when vectorized throws Unsuported vector 
 output type: void during execution...
 {noformat}
 select * from myinput1 a join myinput1 b on a.key=b.value AND a.key is NULL;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10566) LLAP: Vector row extraction allocates new extractors per process method call instead of just once

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535722#comment-14535722
 ] 

Sushanth Sowmyan commented on HIVE-10566:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 LLAP: Vector row extraction allocates new extractors per process method call 
 instead of just once
 -

 Key: HIVE-10566
 URL: https://issues.apache.org/jira/browse/HIVE-10566
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0


 Extractors for unused columns (common for tables with many columns) are 
 created for each batch instead of just once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535768#comment-14535768
 ] 

Sushanth Sowmyan commented on HIVE-10165:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

(On an added note, this is almost a perfect example of how developers should 
file jiras, I think. [~leftylev], do you suppose we can link to this jira from 
the HowToContribute page?)

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Attachments: HIVE-10165.0.patch


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.
 h3. Implementation
 Our changes do not break the existing API contracts. Instead our approach has 
 been to consider the functionality offered by the existing API and our 
 proposed API as fulfilling separate and distinct use-cases. The existing API 
 is primarily focused on the task of continuously writing large volumes of new 
 data into a Hive table for near-immediate analysis. Our use-case however, is 
 concerned more with the frequent but not continuous ingestion of mutations to 
 a Hive table from some ETL merge process. Consequently we feel it is 
 justifiable to add our new functionality via an alternative set of public 
 interfaces and leave the existing API as is. This keeps both APIs clean and 
 focused at the expense of presenting additional options to potential users. 
 Wherever possible, shared implementation concerns have been factored out into 
 abstract base classes that are open to third-party extension. A detailed 
 breakdown of the changes is as follows:
 * We've introduced a public {{RecordMutator}} interface whose purpose is to 
 expose insert/update/delete operations to the user. This is a counterpart to 
 the write-only {{RecordWriter}}. We've also factored out life-cycle methods 
 common to these two interfaces into a super {{RecordOperationWriter}} 
 interface.  Note that the row representation has be changed from {{byte[]}} 
 to {{Object}}. Within our data processing jobs our records are often 
 available in a strongly typed and decoded form such as a POJO or a Tuple 
 object. Therefore is seems to make sense that we are able to pass this 
 through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} 
 encoding step. This of course still allows users to use {{byte[]}} if they 
 wish.
 * The introduction of {{RecordMutator}} requires that insert/update/delete 
 operations are then also exposed on a {{TransactionBatch}} type. We've done 
 this with the introduction of a public {{MutatorTransactionBatch}} interface 
 which is a counterpart to the write-only {{TransactionBatch}}. We've also 
 factored out life-cycle 

[jira] [Updated] (HIVE-10115) HS2 running on a Kerberized cluster should offer Kerberos(GSSAPI) and Delegation token(DIGEST) when alternate authentication is enabled

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10115:

Fix Version/s: (was: 1.2.0)

 HS2 running on a Kerberized cluster should offer Kerberos(GSSAPI) and 
 Delegation token(DIGEST) when alternate authentication is enabled
 ---

 Key: HIVE-10115
 URL: https://issues.apache.org/jira/browse/HIVE-10115
 Project: Hive
  Issue Type: Improvement
  Components: Authentication
Affects Versions: 1.1.0
Reporter: Mubashir Kazia
Assignee: Mubashir Kazia
  Labels: patch
 Attachments: HIVE-10115.0.patch


 In a Kerberized cluster when alternate authentication is enabled on HS2, it 
 should also accept Kerberos Authentication. The reason this is important is 
 because when we enable LDAP authentication HS2 stops accepting delegation 
 token authentication. So we are forced to enter username passwords in the 
 oozie configuration.
 The whole idea of SASL is that multiple authentication mechanism can be 
 offered. If we disable Kerberos(GSSAPI) and delegation token (DIGEST) 
 authentication when we enable LDAP authentication, this defeats SASL purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10121) Implement a hive --service udflint command to check UDF jars for common shading mistakes

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10121:

Fix Version/s: (was: 1.2.0)

 Implement a hive --service udflint command to check UDF jars for common 
 shading mistakes
 

 Key: HIVE-10121
 URL: https://issues.apache.org/jira/browse/HIVE-10121
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Gopal V
Assignee: Abdelrahman Shettia
 Attachments: HIVE-10121.1.patch, HIVE-10121.2.patch, bad_udfs.out, 
 bad_udfs_verbose.out, good_udfs.out, good_udfs_verbose.out


 Several SerDe and UDF jars tend to shade in various parts of the dependencies 
 including hadoop-common or guava without relocation.
 Implement a simple udflint tool which automates some part of the class path 
 and shaded resources audit process required when upgrading a hive install 
 from an old version to a new one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10157) Make use of the timed version of getDagStatus in TezJobMonitor

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10157:

Fix Version/s: (was: 1.2.0)

 Make use of the timed version of getDagStatus in TezJobMonitor
 --

 Key: HIVE-10157
 URL: https://issues.apache.org/jira/browse/HIVE-10157
 Project: Hive
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10157) Make use of the timed version of getDagStatus in TezJobMonitor

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535770#comment-14535770
 ] 

Sushanth Sowmyan commented on HIVE-10157:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 Make use of the timed version of getDagStatus in TezJobMonitor
 --

 Key: HIVE-10157
 URL: https://issues.apache.org/jira/browse/HIVE-10157
 Project: Hive
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10661) LLAP: investigate why GC with IO elevator disabled is so bad

2015-05-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10661:

Description: 
Examples of running same query (Q1) on experimental setup, with Parallel GC, 12 
times. 
Time, DAG name, DAG time, GC time counter.
GC time counter on LLAP seems relatively reliable.
Note that non-IO jobs are also much slower during some time. It may not be 
explained entirely by GC, I am investigating it now.
Running io and non-io on the same cluster w/o restarting produces these 
problems also only on non-IO runs

I may look at this later, after main GC tuning, but for now I decided to give 
up on this since elevator will be on by default when using LLAP.


{noformat}
$ cat io-dag.csv 
2015-05-08 12:10:57,695,dag_1429683757595_0843_1,71142,953216
2015-05-08 12:11:41,769,dag_1429683757595_0843_2,43144,844430
2015-05-08 12:12:22,335,dag_1429683757595_0843_3,39828,866538
2015-05-08 12:13:01,327,dag_1429683757595_0843_4,38213,822179
2015-05-08 12:13:39,610,dag_1429683757595_0843_5,37513,863968
2015-05-08 12:14:19,293,dag_1429683757595_0843_6,38320,913591
2015-05-08 12:14:58,500,dag_1429683757595_0843_7,38587,972450
2015-05-08 12:15:39,017,dag_1429683757595_0843_8,39845,1085598
2015-05-08 12:16:19,708,dag_1429683757595_0843_9,39979,1165559
2015-05-08 12:17:03,174,dag_1429683757595_0843_10,42713,1447033
2015-05-08 12:17:47,557,dag_1429683757595_0843_11,43670,1454114
2015-05-08 12:18:31,440,dag_1429683757595_0843_12,43178,1380477

$ cat noio-dag.csv 
2015-05-08 11:44:05,846,dag_1429683757595_0841_1,60740,1643276
2015-05-08 11:44:55,761,dag_1429683757595_0841_2,48984,1590546
2015-05-08 11:45:48,978,dag_1429683757595_0841_3,52353,1765823
2015-05-08 11:46:44,810,dag_1429683757595_0841_4,54930,1831224
2015-05-08 11:47:47,368,dag_1429683757595_0841_5,61677,2068089
2015-05-08 11:49:05,235,dag_1429683757595_0841_6,76725,2416709
2015-05-08 11:51:56,998,dag_1429683757595_0841_7,170575,3250698
2015-05-08 11:58:16,728,dag_1429683757595_0841_8,377732,5541900
2015-05-08 12:03:17,344,dag_1429683757595_0841_9,298682,1844769
2015-05-08 12:05:23,267,dag_1429683757595_0841_10,124954,1331763
2015-05-08 12:06:35,650,dag_1429683757595_0841_11,71350,1703387
2015-05-08 12:07:42,599,dag_1429683757595_0841_12,66143,1724482
{noformat}

  was:
Examples of running same query (Q1) on experimental setup, with Parallel GC, 12 
times. 
Time, DAG name, DAG time, GC time counter.
GC time counter on LLAP seems relatively reliable.
Note that non-IO jobs are also much slower during some time. It may not be 
explained entirely by GC, I am investigating it now.

I may look at this later, after main GC tuning, but for now I decided to give 
up on this since elevator will be on by default when using LLAP.


{noformat}
$ cat io-dag.csv 
2015-05-08 12:10:57,695,dag_1429683757595_0843_1,71142,953216
2015-05-08 12:11:41,769,dag_1429683757595_0843_2,43144,844430
2015-05-08 12:12:22,335,dag_1429683757595_0843_3,39828,866538
2015-05-08 12:13:01,327,dag_1429683757595_0843_4,38213,822179
2015-05-08 12:13:39,610,dag_1429683757595_0843_5,37513,863968
2015-05-08 12:14:19,293,dag_1429683757595_0843_6,38320,913591
2015-05-08 12:14:58,500,dag_1429683757595_0843_7,38587,972450
2015-05-08 12:15:39,017,dag_1429683757595_0843_8,39845,1085598
2015-05-08 12:16:19,708,dag_1429683757595_0843_9,39979,1165559
2015-05-08 12:17:03,174,dag_1429683757595_0843_10,42713,1447033
2015-05-08 12:17:47,557,dag_1429683757595_0843_11,43670,1454114
2015-05-08 12:18:31,440,dag_1429683757595_0843_12,43178,1380477

$ cat noio-dag.csv 
2015-05-08 11:44:05,846,dag_1429683757595_0841_1,60740,1643276
2015-05-08 11:44:55,761,dag_1429683757595_0841_2,48984,1590546
2015-05-08 11:45:48,978,dag_1429683757595_0841_3,52353,1765823
2015-05-08 11:46:44,810,dag_1429683757595_0841_4,54930,1831224
2015-05-08 11:47:47,368,dag_1429683757595_0841_5,61677,2068089
2015-05-08 11:49:05,235,dag_1429683757595_0841_6,76725,2416709
2015-05-08 11:51:56,998,dag_1429683757595_0841_7,170575,3250698
2015-05-08 11:58:16,728,dag_1429683757595_0841_8,377732,5541900
2015-05-08 12:03:17,344,dag_1429683757595_0841_9,298682,1844769
2015-05-08 12:05:23,267,dag_1429683757595_0841_10,124954,1331763
2015-05-08 12:06:35,650,dag_1429683757595_0841_11,71350,1703387
2015-05-08 12:07:42,599,dag_1429683757595_0841_12,66143,1724482
{noformat}


 LLAP: investigate why GC with IO elevator disabled is so bad
 

 Key: HIVE-10661
 URL: https://issues.apache.org/jira/browse/HIVE-10661
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Prasanth Jayachandran

 Examples of running same query (Q1) on experimental setup, with Parallel GC, 
 12 times. 
 Time, DAG name, DAG time, GC time counter.
 GC time counter on LLAP seems relatively reliable.
 Note that non-IO jobs are also 

[jira] [Updated] (HIVE-9828) Semantic analyzer does not capture view parent entity for tables referred in view with union all

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9828:
---
Fix Version/s: (was: 1.2.0)

 Semantic analyzer does not capture view parent entity for tables referred in 
 view with union all 
 -

 Key: HIVE-9828
 URL: https://issues.apache.org/jira/browse/HIVE-9828
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 1.1.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-9828.1-npf.patch, HIVE-9828.1-npf.patch, 
 HIVE-9828.2.patch


 Hive compiler adds tables used in a view definition in the input entity list, 
 with the view as parent entity for the table.
 In case of a view with union all query, this is not being done property. For 
 example,
 {noformat}
 create view view1 as select t.id from (select tab1.id from db.tab1 union all 
 select tab2.id from db.tab2 ) t;
 {noformat}
 This query will capture tab1 and tab2 as read entity without view1 as parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9828) Semantic analyzer does not capture view parent entity for tables referred in view with union all

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535880#comment-14535880
 ] 

Sushanth Sowmyan commented on HIVE-9828:


Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 Semantic analyzer does not capture view parent entity for tables referred in 
 view with union all 
 -

 Key: HIVE-9828
 URL: https://issues.apache.org/jira/browse/HIVE-9828
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 1.1.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-9828.1-npf.patch, HIVE-9828.1-npf.patch, 
 HIVE-9828.2.patch


 Hive compiler adds tables used in a view definition in the input entity list, 
 with the view as parent entity for the table.
 In case of a view with union all query, this is not being done property. For 
 example,
 {noformat}
 create view view1 as select t.id from (select tab1.id from db.tab1 union all 
 select tab2.id from db.tab2 ) t;
 {noformat}
 This query will capture tab1 and tab2 as read entity without view1 as parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9736) StorageBasedAuthProvider should batch namenode-calls where possible.

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9736:
---
Fix Version/s: (was: 1.2.0)

 StorageBasedAuthProvider should batch namenode-calls where possible.
 

 Key: HIVE-9736
 URL: https://issues.apache.org/jira/browse/HIVE-9736
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Security
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
  Labels: TODOC1.2
 Attachments: HIVE-9736.1.patch, HIVE-9736.2.patch, HIVE-9736.3.patch, 
 HIVE-9736.4.patch, HIVE-9736.5.patch, HIVE-9736.6.patch, HIVE-9736.7.patch


 Consider a table partitioned by 2 keys (dt, region). Say a dt partition could 
 have 1 associated regions. Consider that the user does:
 {code:sql}
 ALTER TABLE my_table DROP PARTITION (dt='20150101');
 {code}
 As things stand now, {{StorageBasedAuthProvider}} will make individual 
 {{DistributedFileSystem.listStatus()}} calls for each partition-directory, 
 and authorize each one separately. It'd be faster to batch the calls, and 
 examine multiple FileStatus objects at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9713) CBO : inefficient join order created for left join outer condition

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535887#comment-14535887
 ] 

Sushanth Sowmyan commented on HIVE-9713:


Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

(Is this another candidate for 1.2.1?)

 CBO : inefficient join order created for left join outer condition
 --

 Key: HIVE-9713
 URL: https://issues.apache.org/jira/browse/HIVE-9713
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran

 For the query below which is a subset of TPC-DS Query 80, CBO joins 
 catalog_sales with catalog_returns first although the CE of the join is 
 relatively high.
 catalog_sales should be joined with the selective dimension tables first.
 {code}
 select  cp_catalog_page_id as catalog_page_id,
   sum(cs_ext_sales_price) as sales,
   sum(coalesce(cr_return_amount, 0)) as returns,
   sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
   from catalog_sales left outer join catalog_returns on
  (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
  date_dim,
  catalog_page,
  item,
  promotion
  where cs_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and cs_catalog_page_sk = cp_catalog_page_sk
and cs_item_sk = i_item_sk
and i_current_price  50
and cs_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by cp_catalog_page_id
 {code}
 Logical plan from CBO debug logs 
 {code}
 2015-02-17 22:34:04,577 DEBUG [main]: parse.CalcitePlanner 
 (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
 HiveProject(catalog_page_id=[$0], sales=[$1], returns=[$2], profit=[$3]): 
 rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 
 io}, id = 1395
   HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[sum($2)], 
 agg#2=[sum($3)]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 
 rows, 0.0 cpu, 0.0 io}, id = 1393
 HiveProject($f0=[$14], $f1=[$5], $f2=[coalesce($9, 0)], $f3=[-($6, 
 coalesce($10, 0))]): rowcount = 1.368586152225262E8, cumulative cost = 
 {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1391
   HiveJoin(condition=[=($3, $17)], joinType=[inner]): rowcount = 
 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 
 0.0 io}, id = 1508
 HiveJoin(condition=[=($2, $15)], joinType=[inner]): rowcount = 
 2.737172304450524E8, cumulative cost = {8.252425594517495E15 rows, 0.0 cpu, 
 0.0 io}, id = 1506
   HiveJoin(condition=[=($1, $13)], joinType=[inner]): rowcount = 
 8.211516913351573E8, cumulative cost = {8.252424773349804E15 rows, 0.0 cpu, 
 0.0 io}, id = 1504
 HiveJoin(condition=[=($0, $11)], joinType=[inner]): rowcount = 
 1.1296953399027347E11, cumulative cost = {8.252311803804096E15 rows, 0.0 cpu, 
 0.0 io}, id = 1418
   HiveJoin(condition=[AND(=($2, $7), =($4, $8))], 
 joinType=[left]): rowcount = 8.252311488455487E15, cumulative cost = 
 {3.15348608E8 rows, 0.0 cpu, 0.0 io}, id = 1413
 HiveProject(cs_sold_date_sk=[$0], cs_catalog_page_sk=[$12], 
 cs_item_sk=[$15], cs_promo_sk=[$16], cs_order_number=[$17], 
 cs_ext_sales_price=[$23], cs_net_profit=[$33]): rowcount = 2.86549727E8, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1324
   HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]): 
 rowcount = 2.86549727E8, cumulative cost = {0}, id = 1136
 HiveProject(cr_item_sk=[$2], cr_order_number=[$16], 
 cr_return_amount=[$18], cr_net_loss=[$26]): rowcount = 2.8798881E7, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1327
   HiveTableScan(table=[[tpcds_bin_orc_200.catalog_returns]]): 
 rowcount = 2.8798881E7, cumulative cost = {0}, id = 1137
   HiveProject(d_date_sk=[$0], d_date=[$2]): rowcount = 1.0, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1371
 HiveFilter(condition=[between(false, $2, 
 CAST('1998-08-04'):DATE, CAST('1998-09-04'):DATE)]): rowcount = 1.0, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1369
   HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): 
 rowcount = 73049.0, cumulative cost = {0}, id = 1138
 HiveProject(cp_catalog_page_sk=[$0], cp_catalog_page_id=[$1]): 
 rowcount = 11718.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1375
   HiveTableScan(table=[[tpcds_bin_orc_200.catalog_page]]): 
 rowcount = 11718.0, cumulative cost = {0}, id = 1139
   HiveProject(i_item_sk=[$0], i_current_price=[$5]): rowcount = 
 

[jira] [Commented] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535890#comment-14535890
 ] 

Sushanth Sowmyan commented on HIVE-9695:


Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 Redundant filter operator in reducer Vertex when CBO is disabled
 

 Key: HIVE-9695
 URL: https://issues.apache.org/jira/browse/HIVE-9695
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran

 There is a redundant filter operator in reducer Vertex when CBO is disabled.
 Query 
 {code}
 select 
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales a, store_returns b, store
 where
 a.ss_item_sk = b.sr_item_sk
 and a.ss_ticket_number = b.sr_ticket_number 
 and ss_sold_date_sk between 2450816 and 2451500
   and sr_returned_date_sk between 2450816 and 2451500
   and s_store_sk = ss_store_sk;
 {code}
 Plan snippet 
 {code}
   Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE 
 Column stats: COMPLETE
   Filter Operator
 predicate: (_col1 = _col27) and (_col8 = _col34)) and 
 _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) 
 and (_col49 = _col6)) (type: boolean)
 {code}
 Full plan with CBO disabled
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Reducer 2 - Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 
 (SIMPLE_EDGE)
   DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: b
   filterExpr: ((sr_item_sk is not null and sr_ticket_number 
 is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: 
 boolean)
   Statistics: Num rows: 2370038095 Data size: 170506118656 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (sr_item_sk is not null and sr_ticket_number 
 is not null) (type: boolean)
 Statistics: Num rows: 706893063 Data size: 6498502768 
 Basic stats: COMPLETE Column stats: COMPLETE
 Reduce Output Operator
   key expressions: sr_item_sk (type: int), 
 sr_ticket_number (type: int)
   sort order: ++
   Map-reduce partition columns: sr_item_sk (type: int), 
 sr_ticket_number (type: int)
   Statistics: Num rows: 706893063 Data size: 6498502768 
 Basic stats: COMPLETE Column stats: COMPLETE
   value expressions: sr_returned_date_sk (type: int)
 Execution mode: vectorized
 Map 3
 Map Operator Tree:
 TableScan
   alias: store
   filterExpr: s_store_sk is not null (type: boolean)
   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: s_store_sk is not null (type: boolean)
 Statistics: Num rows: 1704 Data size: 6816 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Reduce Output Operator
   key expressions: s_store_sk (type: int)
   sort order: +
   Map-reduce partition columns: s_store_sk (type: int)
   Statistics: Num rows: 1704 Data size: 6816 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 Map 4
 Map Operator Tree:
 TableScan
   alias: a
   filterExpr: (((ss_item_sk is not null and ss_ticket_number 
 is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 
 AND 2451500) (type: boolean)
   Statistics: Num rows: 28878719387 Data size: 2405805439460 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ((ss_item_sk is not null and ss_ticket_number 
 is not null) and ss_store_sk is not null) (type: boolean)
 Statistics: Num rows: 8405840828 Data size: 110101408700 
 Basic stats: COMPLETE Column stats: COMPLETE
 Reduce Output Operator
   key expressions: ss_item_sk (type: int), 
 ss_ticket_number (type: int)
   sort order: ++
   

[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-08 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535732#comment-14535732
 ] 

Mostafa Mokhtar commented on HIVE-10609:


[~sushanth]

This fix is needed to keep queries from crashing.
Please include in 1.2.0

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline

 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN 

[jira] [Updated] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10533:

Fix Version/s: (was: 1.2.0)

 CBO (Calcite Return Path): Join to MultiJoin support for outer joins
 

 Key: HIVE-10533
 URL: https://issues.apache.org/jira/browse/HIVE-10533
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez

 CBO return path: auto_join7.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10557) CBO : Support reference to alias in queries

2015-05-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10557:

Fix Version/s: (was: 1.2.0)

 CBO : Support reference to alias in queries 
 

 Key: HIVE-10557
 URL: https://issues.apache.org/jira/browse/HIVE-10557
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
Priority: Minor

 Query 
 {code:sql}
 explain
 select
 count(*) rowcount
 from
 (select
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales a, store_returns b
 where
 a.ss_item_sk = b.sr_item_sk
 and a.ss_ticket_number = b.sr_ticket_number
 and ss_sold_date_sk between 2450816 and 2451500
 and sr_returned_date_sk between 2450816 and 2451500
 union all
 select
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales c, store_returns d
 where
 c.ss_item_sk = d.sr_item_sk
 and c.ss_ticket_number = d.sr_ticket_number
 and ss_sold_date_sk between 2450816 and 2451500
 and sr_returned_date_sk between 2450816 and 2451500) t
 group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
 having rowcount  1
 {code}
 Exception 
 {code}
 15/04/30 04:44:21 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping 
 CBO.
 org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: 
 Encountered Select alias 'rowcount' in having clause 'rowcount  1' 
 This non standard behavior is not supported with cbo on. Turn off cbo for 
 these queries.
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.validateNoHavingReferenceToAlias(CalcitePlanner.java:2888)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBHavingLogicalPlan(CalcitePlanner.java:2828)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:2738)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:804)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:765)
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
 at 
 org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
 at 
 org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
 at 
 org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:604)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:242)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10015)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:205)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535730#comment-14535730
 ] 

Sushanth Sowmyan commented on HIVE-10533:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 CBO (Calcite Return Path): Join to MultiJoin support for outer joins
 

 Key: HIVE-10533
 URL: https://issues.apache.org/jira/browse/HIVE-10533
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez

 CBO return path: auto_join7.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10557) CBO : Support reference to alias in queries

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535726#comment-14535726
 ] 

Sushanth Sowmyan commented on HIVE-10557:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 CBO : Support reference to alias in queries 
 

 Key: HIVE-10557
 URL: https://issues.apache.org/jira/browse/HIVE-10557
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
Priority: Minor

 Query 
 {code:sql}
 explain
 select
 count(*) rowcount
 from
 (select
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales a, store_returns b
 where
 a.ss_item_sk = b.sr_item_sk
 and a.ss_ticket_number = b.sr_ticket_number
 and ss_sold_date_sk between 2450816 and 2451500
 and sr_returned_date_sk between 2450816 and 2451500
 union all
 select
 ss_item_sk, ss_ticket_number, ss_store_sk
 from
 store_sales c, store_returns d
 where
 c.ss_item_sk = d.sr_item_sk
 and c.ss_ticket_number = d.sr_ticket_number
 and ss_sold_date_sk between 2450816 and 2451500
 and sr_returned_date_sk between 2450816 and 2451500) t
 group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
 having rowcount  1
 {code}
 Exception 
 {code}
 15/04/30 04:44:21 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping 
 CBO.
 org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: 
 Encountered Select alias 'rowcount' in having clause 'rowcount  1' 
 This non standard behavior is not supported with cbo on. Turn off cbo for 
 these queries.
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.validateNoHavingReferenceToAlias(CalcitePlanner.java:2888)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBHavingLogicalPlan(CalcitePlanner.java:2828)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:2738)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:804)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:765)
 at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
 at 
 org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730)
 at 
 org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
 at 
 org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:604)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:242)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10015)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:205)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10479) Empty tabAlias in columnInfo which triggers PPD

2015-05-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535745#comment-14535745
 ] 

Sushanth Sowmyan commented on HIVE-10479:
-

Removing fix version of 1.2.0 in preparation of release, since this is not a 
blocker for 1.2.0.

 Empty tabAlias in columnInfo which triggers PPD
 ---

 Key: HIVE-10479
 URL: https://issues.apache.org/jira/browse/HIVE-10479
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10479.patch


 in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
 when aliases contains empty string  and key is an empty string  too, it 
 assumes that aliases contains key. This will trigger incorrect PPD. To 
 reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >