[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-10-19 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176228#comment-14176228
 ] 

Lefty Leverenz commented on HIVE-8290:
--

[~alangates], in the Hive Transactions doc I moved *hive.support.concurrency* 
from the table of new transactions to the next section, and revised various 
parameter descriptions there and in Configuration Properties.

* [Hive Transactions -- Configuration | 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40509723#HiveTransactions-Configuration]
** [New Configuration Parameters for Transactions | 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40509723#HiveTransactions-NewConfigurationParametersforTransactions]
** [Configuration Values to Set for INSERT, UPDATE, DELETE | 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40509723#HiveTransactions-ConfigurationValuestoSetforINSERT,UPDATE,DELETE]
* [Configuration Properties -- Transactions and Compactor | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor]
 (list of turn ons)
** [hive.txn.manager | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.manager]
** [hive.compactor.initiator.on | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.compactor.initiator.on]
** [hive.compactor.worker.threads | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.compactor.worker.threads]
* Configuration Properties -- other parameters
** [hive.support.concurrency | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency]
** [hive.enforce.bucketing | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.enforce.bucketing]
** [hive.exec.dynamic.partition.mode | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.dynamic.partition.mode]

If these changes pass muster, equivalent changes can be made in HiveConf.java 
(HIVE-6586).

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.2.patch, HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: contents of hive/lib in hive tar file

2014-10-19 Thread Eugene Koifman
because that doesn't work

On Sat, Oct 18, 2014 at 9:24 PM, Hari Subramaniyan 
hsubramani...@hortonworks.com wrote:

 Why cant you try making this change in pom.xml in the source root
 directory. Am I missing something here?

 Thanks
 Hari

 On Sat, Oct 18, 2014 at 8:58 PM, Eugene Koifman ekoif...@hortonworks.com
 wrote:

  Does anyone know to ensure that a particular jar (and those it depends
 on)
  is added to hive/lib dir?
  Specifically
  dependency
  groupIdorg.apache.curator/groupId
  artifactIdcurator-framework/artifactId
  version${curator.version}/version
  /dependency
 
 
  I looked at bin.xml under packaging/ but not sure what to do.
 
  --
 
  Thanks,
  Eugene
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 

Thanks,
Eugene

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8501:
--
Attachment: HIVE-8501.3.patch

address [~julianhyde]'s comments about equals

 Fix CBO to use indexes when GenericUDFBridge is applied 
 

 Key: HIVE-8501
 URL: https://issues.apache.org/jira/browse/HIVE-8501
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch


 previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
 we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and 
 (UDFToDouble(key)  UDFToDouble(100))) for example.
 This does not work for the case when we have predicate: ((UDFToDouble(key)  
 80.0) and (UDFToDouble(key)  100.0))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26891: Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26891/
---

(Updated Oct. 19, 2014, 5:40 p.m.)


Review request for hive and Sergey Shelukhin.


Changes
---

address [~julianhyde]'s comments about equals


Repository: hive-git


Description
---

previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and (UDFToDouble(key) 
 UDFToDouble(100))) for example.
This does not work for the case when we have predicate: ((UDFToDouble(key)  
80.0) and (UDFToDouble(key)  100.0))


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java 
4987f7a 

Diff: https://reviews.apache.org/r/26891/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8501:
--
Status: Open  (was: Patch Available)

 Fix CBO to use indexes when GenericUDFBridge is applied 
 

 Key: HIVE-8501
 URL: https://issues.apache.org/jira/browse/HIVE-8501
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch


 previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
 we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and 
 (UDFToDouble(key)  UDFToDouble(100))) for example.
 This does not work for the case when we have predicate: ((UDFToDouble(key)  
 80.0) and (UDFToDouble(key)  100.0))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8501:
--
Status: Patch Available  (was: Open)

 Fix CBO to use indexes when GenericUDFBridge is applied 
 

 Key: HIVE-8501
 URL: https://issues.apache.org/jira/browse/HIVE-8501
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch


 previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
 we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and 
 (UDFToDouble(key)  UDFToDouble(100))) for example.
 This does not work for the case when we have predicate: ((UDFToDouble(key)  
 80.0) and (UDFToDouble(key)  100.0))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 26891: Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26891/
---

(Updated Oct. 19, 2014, 5:58 p.m.)


Review request for hive and Sergey Shelukhin.


Repository: hive-git


Description
---

previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and (UDFToDouble(key) 
 UDFToDouble(100))) for example.
This does not work for the case when we have predicate: ((UDFToDouble(key)  
80.0) and (UDFToDouble(key)  100.0))


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java 
4987f7a 

Diff: https://reviews.apache.org/r/26891/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8501:
--
Status: Open  (was: Patch Available)

 Fix CBO to use indexes when GenericUDFBridge is applied 
 

 Key: HIVE-8501
 URL: https://issues.apache.org/jira/browse/HIVE-8501
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch


 previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
 we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and 
 (UDFToDouble(key)  UDFToDouble(100))) for example.
 This does not work for the case when we have predicate: ((UDFToDouble(key)  
 80.0) and (UDFToDouble(key)  100.0))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8501:
--
Attachment: HIVE-8501.4.patch

 Fix CBO to use indexes when GenericUDFBridge is applied 
 

 Key: HIVE-8501
 URL: https://issues.apache.org/jira/browse/HIVE-8501
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch, 
 HIVE-8501.4.patch


 previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
 we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and 
 (UDFToDouble(key)  UDFToDouble(100))) for example.
 This does not work for the case when we have predicate: ((UDFToDouble(key)  
 80.0) and (UDFToDouble(key)  100.0))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8501:
--
Status: Patch Available  (was: Open)

 Fix CBO to use indexes when GenericUDFBridge is applied 
 

 Key: HIVE-8501
 URL: https://issues.apache.org/jira/browse/HIVE-8501
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch, 
 HIVE-8501.4.patch


 previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
 we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and 
 (UDFToDouble(key)  UDFToDouble(100))) for example.
 This does not work for the case when we have predicate: ((UDFToDouble(key)  
 80.0) and (UDFToDouble(key)  100.0))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created HIVE-8517:
-

 Summary: When joining on partition column NDV gets overridden by 
StatsUtils.getColStatisticsFromExpression
 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0


When joining on partition column number of partitions is used as NDV which gets 
overridden by StatsUtils.getColStatisticsFromExpression and the number of 
partitions used as NDV is replaced by number of rows which results in the same 
behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. 
Joining on partition columns with fetch column stats enabled results it very 
small CE which negatively affects query performance 

This is the call stack.
{code}
StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
line: 1001
StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
NodeProcessorCtx, Object...) line: 1479
DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90   
PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
94
PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 
PreOrderWalker.walk(Node) line: 54  
PreOrderWalker.walk(Node) line: 59  
PreOrderWalker.walk(Node) line: 59  
PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
HashMapNode,Object) line: 109   
AnnotateWithStatistics.transform(ParseContext) line: 78 
TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248
TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
SetWriteEntity) line: 120 
TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
HashSetReadEntity, HashSetWriteEntity) line: 99   
SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037   
SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221  
ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74   
ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
221   
Driver.compile(String, boolean) line: 415   
{code}

Query
{code}

select
  ss_item_sk item_sk, d_date, sum(ss_sales_price),
  sum(sum(ss_sales_price))
  over (partition by ss_item_sk order by d_date rows between unbounded 
preceding and current row) cume_sales
from store_sales
,date_dim
where ss_sold_date_sk=d_date_sk
  and d_month_seq between 1193 and 1193+11
  and ss_item_sk is not NULL
group by ss_item_sk, d_date
{code}

Plan 
Notice in the Map join operator the number of rows drop from 82,510,879,939 to 
36524 after the join.
{code}
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 1 - Map 4 (BROADCAST_EDGE)
Reducer 2 - Map 1 (SIMPLE_EDGE)
Reducer 3 - Reducer 2 (SIMPLE_EDGE)
  DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: ss_item_sk is not null (type: boolean)
  Statistics: Num rows: 82510879939 Data size: 6873789738208 
Basic stats: COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: ss_item_sk is not null (type: boolean)
Statistics: Num rows: 82510879939 Data size: 652315818272 
Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
  condition map:
   Inner Join 0 to 1
  condition expressions:
0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
1 {d_date_sk} {d_date} {d_month_seq}
  keys:
0 ss_sold_date_sk (type: int)
1 d_date_sk (type: int)
  outputColumnNames: _col1, _col12, _col22, _col26, _col28, 
_col29
  input vertices:
1 Map 4
  Statistics: Num rows: 36524 Data size: 4163736 Basic 
stats: COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 
AND 1204) and _col1 is not null) (type: boolean)
Statistics: Num rows: 9131 Data size: 1040934 Basic 
stats: COMPLETE Column stats: COMPLETE
Select Operator
  expressions: _col1 (type: int), _col28 (type: 
string), _col12 (type: float)
  outputColumnNames: _col1, _col28, _col12
  Statistics: Num rows: 9131 Data size: 1040934 Basic 
stats: COMPLETE 

[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8517:
--
Assignee: Mostafa Mokhtar

 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((_col22 = _col26) and _col29 BETWEEN 
 1193 AND 1204) and _col1 is not null) (type: boolean)
 Statistics: Num rows: 9131 Data size: 1040934 Basic 
 

[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8517:
--
Status: Open  (was: Patch Available)

 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-8517.1.patch


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((_col22 = _col26) and _col29 BETWEEN 
 1193 AND 1204) and _col1 is not null) (type: boolean)
 

[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8517:
--
Priority: Critical  (was: Major)

 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8517.1.patch


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((_col22 = _col26) and _col29 BETWEEN 
 1193 AND 1204) and _col1 is not null) (type: boolean)
 

[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8517:
--
Status: Patch Available  (was: Open)

 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-8517.1.patch


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((_col22 = _col26) and _col29 BETWEEN 
 1193 AND 1204) and _col1 is not null) (type: boolean)
 

[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8517:
--
Attachment: HIVE-8517.1.patch

 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-8517.1.patch


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((_col22 = _col26) and _col29 BETWEEN 
 1193 AND 1204) and _col1 is not null) (type: boolean)
 

[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8517:
--
Tags: hive

 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8517.1.patch


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((_col22 = _col26) and _col29 BETWEEN 
 1193 AND 1204) and _col1 is not null) (type: boolean)
   

[jira] [Work started] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-8517 started by Mostafa Mokhtar.
-
 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8517.1.patch


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (((_col22 = _col26) and _col29 BETWEEN 
 1193 AND 1204) and _col1 is not null) (type: boolean)
 

[jira] [Commented] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176455#comment-14176455
 ] 

Hive QA commented on HIVE-8501:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12675730/HIVE-8501.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6564 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_in_db
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1340/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1340/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1340/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12675730
 - PreCommit-HIVE-TRUNK-Build

 Fix CBO to use indexes when GenericUDFBridge is applied 
 

 Key: HIVE-8501
 URL: https://issues.apache.org/jira/browse/HIVE-8501
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch, 
 HIVE-8501.4.patch


 previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
 we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and 
 (UDFToDouble(key)  UDFToDouble(100))) for example.
 This does not work for the case when we have predicate: ((UDFToDouble(key)  
 80.0) and (UDFToDouble(key)  100.0))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled

2014-10-19 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176471#comment-14176471
 ] 

Gunther Hagleitner commented on HIVE-8498:
--

Multi children happens in:

- Multi insert queries
- Dynamic partition pruning
- Correlation optimizer

For the first two it would really be better to have vectorization work. Can we 
fix the actual issues here instead of disabling this stuff wholesale?


 Insert into table misses some rows when vectorization is enabled
 

 Key: HIVE-8498
 URL: https://issues.apache.org/jira/browse/HIVE-8498
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Matt McCline
Priority: Critical
  Labels: vectorization
 Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch


  Following is a small reproducible case for the issue
 create table orc1
   stored as orc
   tblproperties(orc.compress=ZLIB)
   as
 select rn
 from
 (
   select cast(1 as int) as rn from src limit 1
   union all
   select cast(100 as int) as rn from src limit 1
   union all
   select cast(1 as int) as rn from src limit 1
 ) t;
 create table orc_rn1 (rn int);
 create table orc_rn2 (rn int);
 create table orc_rn3 (rn int);
 // These inserts should produce 3 rows but only 1 row is produced
 from orc1 a
 insert overwrite table orc_rn1 select a.* where a.rn  100
 insert overwrite table orc_rn2 select a.* where a.rn = 100 and a.rn  1000
 insert overwrite table orc_rn3 select a.* where a.rn = 1000;
 select * from orc_rn1
 union all
 select * from orc_rn2
 union all
 select * from orc_rn3;
 The expected output of the query is
 1
 100
 1
 But with vectorization enabled we get
 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled

2014-10-19 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176475#comment-14176475
 ] 

Gunther Hagleitner commented on HIVE-8498:
--

When I did the vectorized dynamic pruning stuff there was no problem with 
vectorization. It seems that the multi-child case is at least partially 
working. Do we know why the multi insert case is failing? The fix might not be 
that difficult. Is it?

I can see how correlation optimizer might be more tricky. That one produces 
diamond shapes in the plan as far as I remember.

 Insert into table misses some rows when vectorization is enabled
 

 Key: HIVE-8498
 URL: https://issues.apache.org/jira/browse/HIVE-8498
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Matt McCline
Priority: Critical
  Labels: vectorization
 Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch


  Following is a small reproducible case for the issue
 create table orc1
   stored as orc
   tblproperties(orc.compress=ZLIB)
   as
 select rn
 from
 (
   select cast(1 as int) as rn from src limit 1
   union all
   select cast(100 as int) as rn from src limit 1
   union all
   select cast(1 as int) as rn from src limit 1
 ) t;
 create table orc_rn1 (rn int);
 create table orc_rn2 (rn int);
 create table orc_rn3 (rn int);
 // These inserts should produce 3 rows but only 1 row is produced
 from orc1 a
 insert overwrite table orc_rn1 select a.* where a.rn  100
 insert overwrite table orc_rn2 select a.* where a.rn = 100 and a.rn  1000
 insert overwrite table orc_rn3 select a.* where a.rn = 1000;
 select * from orc_rn1
 union all
 select * from orc_rn2
 union all
 select * from orc_rn3;
 The expected output of the query is
 1
 100
 1
 But with vectorization enabled we get
 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied

2014-10-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176478#comment-14176478
 ] 

Hive QA commented on HIVE-8501:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12675733/HIVE-8501.4.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6565 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_in_db
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1341/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1341/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1341/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12675733
 - PreCommit-HIVE-TRUNK-Build

 Fix CBO to use indexes when GenericUDFBridge is applied 
 

 Key: HIVE-8501
 URL: https://issues.apache.org/jira/browse/HIVE-8501
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch, 
 HIVE-8501.4.patch


 previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that
 we have predicate: ((UDFToDouble(key)  UDFToDouble(80)) and 
 (UDFToDouble(key)  UDFToDouble(100))) for example.
 This does not work for the case when we have predicate: ((UDFToDouble(key)  
 80.0) and (UDFToDouble(key)  100.0))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled

2014-10-19 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176491#comment-14176491
 ] 

Matt McCline commented on HIVE-8498:


Jitendra [~jnp] told me a while ago the vectorization logic doesn't support / 
wasn't architected for tagging/multiple children.  Part of this may be due to 
do with the shadow VectorizationContext data structures that track which 
columns of vectorized row batches for each vectorized operator.

This JIRA is about multi insert queries basic functionality not working -- only 
rows from first inset being processed.  I don't know if the solution is 
difficult or not.


 Insert into table misses some rows when vectorization is enabled
 

 Key: HIVE-8498
 URL: https://issues.apache.org/jira/browse/HIVE-8498
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Matt McCline
Priority: Critical
  Labels: vectorization
 Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch


  Following is a small reproducible case for the issue
 create table orc1
   stored as orc
   tblproperties(orc.compress=ZLIB)
   as
 select rn
 from
 (
   select cast(1 as int) as rn from src limit 1
   union all
   select cast(100 as int) as rn from src limit 1
   union all
   select cast(1 as int) as rn from src limit 1
 ) t;
 create table orc_rn1 (rn int);
 create table orc_rn2 (rn int);
 create table orc_rn3 (rn int);
 // These inserts should produce 3 rows but only 1 row is produced
 from orc1 a
 insert overwrite table orc_rn1 select a.* where a.rn  100
 insert overwrite table orc_rn2 select a.* where a.rn = 100 and a.rn  1000
 insert overwrite table orc_rn3 select a.* where a.rn = 1000;
 select * from orc_rn1
 union all
 select * from orc_rn2
 union all
 select * from orc_rn3;
 The expected output of the query is
 1
 100
 1
 But with vectorization enabled we get
 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-19 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176496#comment-14176496
 ] 

Matt McCline commented on HIVE-8474:



Some background on vectorization.

There are are shadow VectorizationContext data structures that track which 
columns of vectorized row batches for used by each vectorized operators.

In row-by-row mode an operator can easily form a new row Object Array to 
correspond to the outputObjInspector.

However, in Vectorization we mask or project away columns in a 
VectorizedRowBatch (e.g. VectorFilterOperator) so the same batch can travel 
down the operators without being copied.  Or, in the case of computing new 
columns, VectorSelectOperator will compute new scratch columns.

So, the VectorizationContext starts as all the table columns for Map or the 
keys and values for Reduce and then as we go down the operators new 
VectorizationContext objects are cloned and their column and scratch column 
maps are modified.

So, some operators do not use inputObjInspectors or outputObjInspector.  
Others, do use them when the vector operator unpacks batches into rows to call 
an row mode operator.

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected

2014-10-19 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176500#comment-14176500
 ] 

Matt McCline commented on HIVE-8474:



I would prefer a clone of addToBatchFrom be cloned and chaged.  I know this 
duplicates code -- but it provides a cleaner place to have comments on the 
difference and not have extra code execute that doesn't provide value to the 
main path..

After 0.14.0 we should look at getting the vector reader(s) and Vectorizer 
class to create the Map top level VectorizationContext with just the columns 
that are needed.

Adding [~jnp]

 Vectorized reads of transactional tables fail when not all columns are 
 selected
 ---

 Key: HIVE-8474
 URL: https://issues.apache.org/jira/browse/HIVE-8474
 Project: Hive
  Issue Type: Bug
  Components: Transactions, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8474.patch


 {code}
 create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) 
 clustered by (age) into 2 buckets stored as orc TBLPROPERTIES 
 ('transactional'='true');
 select name, age from concur_orc_tab order by name;
 {code}
 results in
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 13 more
 {code}
 The issue is that the object inspector passed to VectorizedOrcAcidRowReader 
 has all of the columns in the file rather than only the projected columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled

2014-10-19 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176506#comment-14176506
 ] 

Gunther Hagleitner commented on HIVE-8498:
--

Tagging afaik only comes into play only for demux/mux. It might be easier to 
fix the multi insert case, especially since I know the event broadcast is 
already working (and you would disable this). The plan for this multi-insert 
query should be something like:

ts - fil[1] - fs[1]
- fil[2] - fs[2]
- fil[3] - fs[3] 

The problem might be as simple as making sure the TS fowards to all it's 
children.
It might, however, also be a case of the vectorization code not converting 
operators correctly.

If it's simple, the best approach might be to put a fix for the multi-insert 
case, and disable correlation optimizer (tagging) when vectorization is on.

[~jnp] do you have any insights?

 Insert into table misses some rows when vectorization is enabled
 

 Key: HIVE-8498
 URL: https://issues.apache.org/jira/browse/HIVE-8498
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0, 0.13.1
Reporter: Prasanth J
Assignee: Matt McCline
Priority: Critical
  Labels: vectorization
 Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch


  Following is a small reproducible case for the issue
 create table orc1
   stored as orc
   tblproperties(orc.compress=ZLIB)
   as
 select rn
 from
 (
   select cast(1 as int) as rn from src limit 1
   union all
   select cast(100 as int) as rn from src limit 1
   union all
   select cast(1 as int) as rn from src limit 1
 ) t;
 create table orc_rn1 (rn int);
 create table orc_rn2 (rn int);
 create table orc_rn3 (rn int);
 // These inserts should produce 3 rows but only 1 row is produced
 from orc1 a
 insert overwrite table orc_rn1 select a.* where a.rn  100
 insert overwrite table orc_rn2 select a.* where a.rn = 100 and a.rn  1000
 insert overwrite table orc_rn3 select a.* where a.rn = 1000;
 select * from orc_rn1
 union all
 select * from orc_rn2
 union all
 select * from orc_rn3;
 The expected output of the query is
 1
 100
 1
 But with vectorization enabled we get
 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8514) TestCliDriver.testCliDriver_index_in_db fails in trunk

2014-10-19 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176526#comment-14176526
 ] 

Navis commented on HIVE-8514:
-

+1

 TestCliDriver.testCliDriver_index_in_db fails in trunk
 --

 Key: HIVE-8514
 URL: https://issues.apache.org/jira/browse/HIVE-8514
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8514.1.patch


 I thought I had tested it on trunk, but apparently not.
 .q.out file needs update for trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-7893) Find a way to get a job identifier when submitting a spark job [Spark Branch]

2014-10-19 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li resolved HIVE-7893.
--
Resolution: Fixed

Fixed via HIVE-7439

 Find a way to get a job identifier when submitting a spark job [Spark Branch]
 -

 Key: HIVE-7893
 URL: https://issues.apache.org/jira/browse/HIVE-7893
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
  Labels: Spark-M3

 Currently we use the {{foreach}} RDD action to submit a spark job. In order 
 to implement job monitoring functionality (HIVE-7438), we need to get a job 
 identifier when submitting the job, so that we can later register some 
 listener for that specific job.
 This task requires facilitation from spark side (SPARK-2636). I've tried to 
 use {{AsyncRDDActions}} instead of the traditional actions, and it proved to 
 be a possible way to get the job ID we need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8514) TestCliDriver.testCliDriver_index_in_db fails in trunk

2014-10-19 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176544#comment-14176544
 ] 

Thejas M Nair commented on HIVE-8514:
-

I will commit this in another hour. It is a simple test update, I don't think 
it needs to wait long. That will reduce confusion among people looking into 
test results.



 TestCliDriver.testCliDriver_index_in_db fails in trunk
 --

 Key: HIVE-8514
 URL: https://issues.apache.org/jira/browse/HIVE-8514
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8514.1.patch


 I thought I had tested it on trunk, but apparently not.
 .q.out file needs update for trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8514) TestCliDriver.testCliDriver_index_in_db fails in trunk

2014-10-19 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8514:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk and 0.14 branch.
Thanks for the review Navis!


 TestCliDriver.testCliDriver_index_in_db fails in trunk
 --

 Key: HIVE-8514
 URL: https://issues.apache.org/jira/browse/HIVE-8514
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8514.1.patch


 I thought I had tested it on trunk, but apparently not.
 .q.out file needs update for trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8186) Self join may fail if one side has VCs and other doesn't

2014-10-19 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8186:

Attachment: HIVE-8186.6.patch.txt

 Self join may fail if one side has VCs and other doesn't
 

 Key: HIVE-8186
 URL: https://issues.apache.org/jira/browse/HIVE-8186
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Navis
 Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt, 
 HIVE-8186.3.patch.txt, HIVE-8186.4.patch.txt, HIVE-8186.5.patch.txt, 
 HIVE-8186.6.patch.txt


 See comments. This also fails on trunk, although not on original join_vc query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8186) Self join may fail if one side have virtual column(s) and other doesn't

2014-10-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176607#comment-14176607
 ] 

Hive QA commented on HIVE-8186:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12675765/HIVE-8186.6.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6565 tests executed
*Failed tests:*
{noformat}
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1344/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1344/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1344/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12675765
 - PreCommit-HIVE-TRUNK-Build

 Self join may fail if one side have virtual column(s) and other doesn't
 ---

 Key: HIVE-8186
 URL: https://issues.apache.org/jira/browse/HIVE-8186
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Navis
 Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt, 
 HIVE-8186.3.patch.txt, HIVE-8186.4.patch.txt, HIVE-8186.5.patch.txt, 
 HIVE-8186.6.patch.txt


 See comments. This also fails on trunk, although not on original join_vc query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression

2014-10-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176611#comment-14176611
 ] 

Prasanth J commented on HIVE-8517:
--

My bad. 

Very minor nit: Can you change the if condition as per coding convention? 
http://www.oracle.com/technetwork/java/codeconventions-150003.pdf (page:12  13)

Otherwise, +1. Pending unit tests.

 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8517.1.patch


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 

[jira] [Commented] (HIVE-8186) Self join may fail if one side has VCs and other doesn't

2014-10-19 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176595#comment-14176595
 ] 

Lefty Leverenz commented on HIVE-8186:
--

bq. I really hope we publish a DICT for these ABBRs. Before we use any, we put 
it in the DICT first.

How would you enforce that rule, by eDICT?

But seriously, for the sake of JIRA readers everywhere the description should 
spell out uncommon abbreviations that appear in the title.

 Self join may fail if one side has VCs and other doesn't
 

 Key: HIVE-8186
 URL: https://issues.apache.org/jira/browse/HIVE-8186
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Navis
 Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt, 
 HIVE-8186.3.patch.txt, HIVE-8186.4.patch.txt, HIVE-8186.5.patch.txt, 
 HIVE-8186.6.patch.txt


 See comments. This also fails on trunk, although not on original join_vc query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8357) Path type entities should use qualified path rather than string

2014-10-19 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8357:

Attachment: HIVE-8357.4.patch.txt

Rerun test

 Path type entities should use qualified path rather than string
 ---

 Key: HIVE-8357
 URL: https://issues.apache.org/jira/browse/HIVE-8357
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-8357.1.patch.txt, HIVE-8357.2.patch.txt, 
 HIVE-8357.3.patch.txt, HIVE-8357.4.patch.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8466) nonReserved keywords can not be used as table alias

2014-10-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176591#comment-14176591
 ] 

Hive QA commented on HIVE-8466:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12675756/HIVE-8466.3.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6565 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_in_db
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.testStatsAfterCompactionPartTbl
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1343/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1343/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1343/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12675756
 - PreCommit-HIVE-TRUNK-Build

 nonReserved keywords can not be used as table alias
 ---

 Key: HIVE-8466
 URL: https://issues.apache.org/jira/browse/HIVE-8466
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: cw
Priority: Minor
 Attachments: HIVE-8466.1.patch, HIVE-8466.2.patch.txt, 
 HIVE-8466.3.patch.txt


 There is a small mistake in the patch of issue HIVE-2906. See the change of 
 FromClauseParser.g
 -: tabname=tableName (ts=tableSample)? (KW_AS? alias=identifier)?
 -- ^(TOK_TABREF $tabname $ts? $alias?)
 +: tabname=tableName (props=tableProperties)? (ts=tableSample)? (KW_AS? 
 alias=Identifier)?
 +- ^(TOK_TABREF $tabname $props? $ts? $alias?)
 With the 'identifier' changed to 'Identifier' we can not use nonReserved 
 keywords as table alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8319) Add configuration for custom services in hiveserver2

2014-10-19 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8319:

Attachment: HIVE-8319.3.patch.txt

Rerun test

 Add configuration for custom services in hiveserver2
 

 Key: HIVE-8319
 URL: https://issues.apache.org/jira/browse/HIVE-8319
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-8319.1.patch.txt, HIVE-8319.2.patch.txt, 
 HIVE-8319.3.patch.txt


 NO PRECOMMIT TESTS
 Register services to hiveserver2, for example, 
 {noformat}
 property
   namehive.server2.service.classesname
   
 valuecom.nexr.hive.service.HiveStatus,com.nexr.hive.service.AzkabanServicevalue
 /property
 property
   nameazkaban.ssl.portname
   name...name
 /property
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8397) Approximated cardinality with HyperLogLog UDAF

2014-10-19 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8397:

Attachment: HIVE-8397.2.patch.txt

 Approximated cardinality with HyperLogLog UDAF
 --

 Key: HIVE-8397
 URL: https://issues.apache.org/jira/browse/HIVE-8397
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-8397.1.patch.txt, HIVE-8397.2.patch.txt


 Useful sometimes for quick estimation of bulk data. 
 {noformat}
 select hll(key), hll(value) from src;
 {noformat}
 Also can be used with hive.fetch.task.aggr=true;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8466) nonReserved keywords can not be used as table alias

2014-10-19 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176592#comment-14176592
 ] 

Navis commented on HIVE-8466:
-

Fails seemed not related to this. [~cwsteinbach] Could you review this?

 nonReserved keywords can not be used as table alias
 ---

 Key: HIVE-8466
 URL: https://issues.apache.org/jira/browse/HIVE-8466
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: cw
Priority: Minor
 Attachments: HIVE-8466.1.patch, HIVE-8466.2.patch.txt, 
 HIVE-8466.3.patch.txt


 There is a small mistake in the patch of issue HIVE-2906. See the change of 
 FromClauseParser.g
 -: tabname=tableName (ts=tableSample)? (KW_AS? alias=identifier)?
 -- ^(TOK_TABREF $tabname $ts? $alias?)
 +: tabname=tableName (props=tableProperties)? (ts=tableSample)? (KW_AS? 
 alias=Identifier)?
 +- ^(TOK_TABREF $tabname $props? $ts? $alias?)
 With the 'identifier' changed to 'Identifier' we can not use nonReserved 
 keywords as table alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)