[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176228#comment-14176228 ] Lefty Leverenz commented on HIVE-8290: -- [~alangates], in the Hive Transactions doc I moved *hive.support.concurrency* from the table of new transactions to the next section, and revised various parameter descriptions there and in Configuration Properties. * [Hive Transactions -- Configuration | https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40509723#HiveTransactions-Configuration] ** [New Configuration Parameters for Transactions | https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40509723#HiveTransactions-NewConfigurationParametersforTransactions] ** [Configuration Values to Set for INSERT, UPDATE, DELETE | https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=40509723#HiveTransactions-ConfigurationValuestoSetforINSERT,UPDATE,DELETE] * [Configuration Properties -- Transactions and Compactor | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor] (list of turn ons) ** [hive.txn.manager | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.manager] ** [hive.compactor.initiator.on | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.compactor.initiator.on] ** [hive.compactor.worker.threads | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.compactor.worker.threads] * Configuration Properties -- other parameters ** [hive.support.concurrency | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency] ** [hive.enforce.bucketing | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.enforce.bucketing] ** [hive.exec.dynamic.partition.mode | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.dynamic.partition.mode] If these changes pass muster, equivalent changes can be made in HiveConf.java (HIVE-6586). With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.2.patch, HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: contents of hive/lib in hive tar file
because that doesn't work On Sat, Oct 18, 2014 at 9:24 PM, Hari Subramaniyan hsubramani...@hortonworks.com wrote: Why cant you try making this change in pom.xml in the source root directory. Am I missing something here? Thanks Hari On Sat, Oct 18, 2014 at 8:58 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Does anyone know to ensure that a particular jar (and those it depends on) is added to hive/lib dir? Specifically dependency groupIdorg.apache.curator/groupId artifactIdcurator-framework/artifactId version${curator.version}/version /dependency I looked at bin.xml under packaging/ but not sure what to do. -- Thanks, Eugene -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Thanks, Eugene -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied
[ https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8501: -- Attachment: HIVE-8501.3.patch address [~julianhyde]'s comments about equals Fix CBO to use indexes when GenericUDFBridge is applied Key: HIVE-8501 URL: https://issues.apache.org/jira/browse/HIVE-8501 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26891: Fix CBO to use indexes when GenericUDFBridge is applied
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26891/ --- (Updated Oct. 19, 2014, 5:40 p.m.) Review request for hive and Sergey Shelukhin. Changes --- address [~julianhyde]'s comments about equals Repository: hive-git Description --- previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java 4987f7a Diff: https://reviews.apache.org/r/26891/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied
[ https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8501: -- Status: Open (was: Patch Available) Fix CBO to use indexes when GenericUDFBridge is applied Key: HIVE-8501 URL: https://issues.apache.org/jira/browse/HIVE-8501 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied
[ https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8501: -- Status: Patch Available (was: Open) Fix CBO to use indexes when GenericUDFBridge is applied Key: HIVE-8501 URL: https://issues.apache.org/jira/browse/HIVE-8501 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26891: Fix CBO to use indexes when GenericUDFBridge is applied
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26891/ --- (Updated Oct. 19, 2014, 5:58 p.m.) Review request for hive and Sergey Shelukhin. Repository: hive-git Description --- previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java 4987f7a Diff: https://reviews.apache.org/r/26891/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied
[ https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8501: -- Status: Open (was: Patch Available) Fix CBO to use indexes when GenericUDFBridge is applied Key: HIVE-8501 URL: https://issues.apache.org/jira/browse/HIVE-8501 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied
[ https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8501: -- Attachment: HIVE-8501.4.patch Fix CBO to use indexes when GenericUDFBridge is applied Key: HIVE-8501 URL: https://issues.apache.org/jira/browse/HIVE-8501 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch, HIVE-8501.4.patch previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied
[ https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8501: -- Status: Patch Available (was: Open) Fix CBO to use indexes when GenericUDFBridge is applied Key: HIVE-8501 URL: https://issues.apache.org/jira/browse/HIVE-8501 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch, HIVE-8501.4.patch previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
Mostafa Mokhtar created HIVE-8517: - Summary: When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Fix For: 0.14.0 When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 AND 1204) and _col1 is not null) (type: boolean) Statistics: Num rows: 9131 Data size: 1040934 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col1 (type: int), _col28 (type: string), _col12 (type: float) outputColumnNames: _col1, _col28, _col12 Statistics: Num rows: 9131 Data size: 1040934 Basic stats: COMPLETE
[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8517: -- Assignee: Mostafa Mokhtar When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 AND 1204) and _col1 is not null) (type: boolean) Statistics: Num rows: 9131 Data size: 1040934 Basic
[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8517: -- Status: Open (was: Patch Available) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-8517.1.patch When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 AND 1204) and _col1 is not null) (type: boolean)
[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8517: -- Priority: Critical (was: Major) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8517.1.patch When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 AND 1204) and _col1 is not null) (type: boolean)
[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8517: -- Status: Patch Available (was: Open) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-8517.1.patch When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 AND 1204) and _col1 is not null) (type: boolean)
[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8517: -- Attachment: HIVE-8517.1.patch When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-8517.1.patch When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 AND 1204) and _col1 is not null) (type: boolean)
[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8517: -- Tags: hive When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8517.1.patch When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 AND 1204) and _col1 is not null) (type: boolean)
[jira] [Work started] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8517 started by Mostafa Mokhtar. - When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8517.1.patch When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((_col22 = _col26) and _col29 BETWEEN 1193 AND 1204) and _col1 is not null) (type: boolean)
[jira] [Commented] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied
[ https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176455#comment-14176455 ] Hive QA commented on HIVE-8501: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12675730/HIVE-8501.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6564 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_in_db {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1340/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1340/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1340/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12675730 - PreCommit-HIVE-TRUNK-Build Fix CBO to use indexes when GenericUDFBridge is applied Key: HIVE-8501 URL: https://issues.apache.org/jira/browse/HIVE-8501 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch, HIVE-8501.4.patch previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled
[ https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176471#comment-14176471 ] Gunther Hagleitner commented on HIVE-8498: -- Multi children happens in: - Multi insert queries - Dynamic partition pruning - Correlation optimizer For the first two it would really be better to have vectorization work. Can we fix the actual issues here instead of disabling this stuff wholesale? Insert into table misses some rows when vectorization is enabled Key: HIVE-8498 URL: https://issues.apache.org/jira/browse/HIVE-8498 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Matt McCline Priority: Critical Labels: vectorization Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch Following is a small reproducible case for the issue create table orc1 stored as orc tblproperties(orc.compress=ZLIB) as select rn from ( select cast(1 as int) as rn from src limit 1 union all select cast(100 as int) as rn from src limit 1 union all select cast(1 as int) as rn from src limit 1 ) t; create table orc_rn1 (rn int); create table orc_rn2 (rn int); create table orc_rn3 (rn int); // These inserts should produce 3 rows but only 1 row is produced from orc1 a insert overwrite table orc_rn1 select a.* where a.rn 100 insert overwrite table orc_rn2 select a.* where a.rn = 100 and a.rn 1000 insert overwrite table orc_rn3 select a.* where a.rn = 1000; select * from orc_rn1 union all select * from orc_rn2 union all select * from orc_rn3; The expected output of the query is 1 100 1 But with vectorization enabled we get 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled
[ https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176475#comment-14176475 ] Gunther Hagleitner commented on HIVE-8498: -- When I did the vectorized dynamic pruning stuff there was no problem with vectorization. It seems that the multi-child case is at least partially working. Do we know why the multi insert case is failing? The fix might not be that difficult. Is it? I can see how correlation optimizer might be more tricky. That one produces diamond shapes in the plan as far as I remember. Insert into table misses some rows when vectorization is enabled Key: HIVE-8498 URL: https://issues.apache.org/jira/browse/HIVE-8498 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Matt McCline Priority: Critical Labels: vectorization Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch Following is a small reproducible case for the issue create table orc1 stored as orc tblproperties(orc.compress=ZLIB) as select rn from ( select cast(1 as int) as rn from src limit 1 union all select cast(100 as int) as rn from src limit 1 union all select cast(1 as int) as rn from src limit 1 ) t; create table orc_rn1 (rn int); create table orc_rn2 (rn int); create table orc_rn3 (rn int); // These inserts should produce 3 rows but only 1 row is produced from orc1 a insert overwrite table orc_rn1 select a.* where a.rn 100 insert overwrite table orc_rn2 select a.* where a.rn = 100 and a.rn 1000 insert overwrite table orc_rn3 select a.* where a.rn = 1000; select * from orc_rn1 union all select * from orc_rn2 union all select * from orc_rn3; The expected output of the query is 1 100 1 But with vectorization enabled we get 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8501) Fix CBO to use indexes when GenericUDFBridge is applied
[ https://issues.apache.org/jira/browse/HIVE-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176478#comment-14176478 ] Hive QA commented on HIVE-8501: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12675733/HIVE-8501.4.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6565 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_in_db {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1341/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1341/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1341/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12675733 - PreCommit-HIVE-TRUNK-Build Fix CBO to use indexes when GenericUDFBridge is applied Key: HIVE-8501 URL: https://issues.apache.org/jira/browse/HIVE-8501 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Priority: Minor Attachments: HIVE-8501.1.patch, HIVE-8501.2.patch, HIVE-8501.3.patch, HIVE-8501.4.patch previous https://issues.apache.org/jira/browse/HIVE-8389 assumes that we have predicate: ((UDFToDouble(key) UDFToDouble(80)) and (UDFToDouble(key) UDFToDouble(100))) for example. This does not work for the case when we have predicate: ((UDFToDouble(key) 80.0) and (UDFToDouble(key) 100.0)) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled
[ https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176491#comment-14176491 ] Matt McCline commented on HIVE-8498: Jitendra [~jnp] told me a while ago the vectorization logic doesn't support / wasn't architected for tagging/multiple children. Part of this may be due to do with the shadow VectorizationContext data structures that track which columns of vectorized row batches for each vectorized operator. This JIRA is about multi insert queries basic functionality not working -- only rows from first inset being processed. I don't know if the solution is difficult or not. Insert into table misses some rows when vectorization is enabled Key: HIVE-8498 URL: https://issues.apache.org/jira/browse/HIVE-8498 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Matt McCline Priority: Critical Labels: vectorization Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch Following is a small reproducible case for the issue create table orc1 stored as orc tblproperties(orc.compress=ZLIB) as select rn from ( select cast(1 as int) as rn from src limit 1 union all select cast(100 as int) as rn from src limit 1 union all select cast(1 as int) as rn from src limit 1 ) t; create table orc_rn1 (rn int); create table orc_rn2 (rn int); create table orc_rn3 (rn int); // These inserts should produce 3 rows but only 1 row is produced from orc1 a insert overwrite table orc_rn1 select a.* where a.rn 100 insert overwrite table orc_rn2 select a.* where a.rn = 100 and a.rn 1000 insert overwrite table orc_rn3 select a.* where a.rn = 1000; select * from orc_rn1 union all select * from orc_rn2 union all select * from orc_rn3; The expected output of the query is 1 100 1 But with vectorization enabled we get 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176496#comment-14176496 ] Matt McCline commented on HIVE-8474: Some background on vectorization. There are are shadow VectorizationContext data structures that track which columns of vectorized row batches for used by each vectorized operators. In row-by-row mode an operator can easily form a new row Object Array to correspond to the outputObjInspector. However, in Vectorization we mask or project away columns in a VectorizedRowBatch (e.g. VectorFilterOperator) so the same batch can travel down the operators without being copied. Or, in the case of computing new columns, VectorSelectOperator will compute new scratch columns. So, the VectorizationContext starts as all the table columns for Map or the keys and values for Reduce and then as we go down the operators new VectorizationContext objects are cloned and their column and scratch column maps are modified. So, some operators do not use inputObjInspectors or outputObjInspector. Others, do use them when the vector operator unpacks batches into rows to call an row mode operator. Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8474) Vectorized reads of transactional tables fail when not all columns are selected
[ https://issues.apache.org/jira/browse/HIVE-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176500#comment-14176500 ] Matt McCline commented on HIVE-8474: I would prefer a clone of addToBatchFrom be cloned and chaged. I know this duplicates code -- but it provides a cleaner place to have comments on the difference and not have extra code execute that doesn't provide value to the main path.. After 0.14.0 we should look at getting the vector reader(s) and Vectorizer class to create the Map top level VectorizationContext with just the columns that are needed. Adding [~jnp] Vectorized reads of transactional tables fail when not all columns are selected --- Key: HIVE-8474 URL: https://issues.apache.org/jira/browse/HIVE-8474 Project: Hive Issue Type: Bug Components: Transactions, Vectorization Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8474.patch {code} create table concur_orc_tab(name varchar(50), age int, gpa decimal(3, 2)) clustered by (age) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); select name, age from concur_orc_tab order by name; {code} results in {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setNullColIsNullValue(VectorizedBatchUtil.java:63) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatchFrom(VectorizedBatchUtil.java:443) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addRowToBatch(VectorizedBatchUtil.java:214) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowReader.next(VectorizedOrcAcidRowReader.java:43) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 13 more {code} The issue is that the object inspector passed to VectorizedOrcAcidRowReader has all of the columns in the file rather than only the projected columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled
[ https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176506#comment-14176506 ] Gunther Hagleitner commented on HIVE-8498: -- Tagging afaik only comes into play only for demux/mux. It might be easier to fix the multi insert case, especially since I know the event broadcast is already working (and you would disable this). The plan for this multi-insert query should be something like: ts - fil[1] - fs[1] - fil[2] - fs[2] - fil[3] - fs[3] The problem might be as simple as making sure the TS fowards to all it's children. It might, however, also be a case of the vectorization code not converting operators correctly. If it's simple, the best approach might be to put a fix for the multi-insert case, and disable correlation optimizer (tagging) when vectorization is on. [~jnp] do you have any insights? Insert into table misses some rows when vectorization is enabled Key: HIVE-8498 URL: https://issues.apache.org/jira/browse/HIVE-8498 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Matt McCline Priority: Critical Labels: vectorization Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch Following is a small reproducible case for the issue create table orc1 stored as orc tblproperties(orc.compress=ZLIB) as select rn from ( select cast(1 as int) as rn from src limit 1 union all select cast(100 as int) as rn from src limit 1 union all select cast(1 as int) as rn from src limit 1 ) t; create table orc_rn1 (rn int); create table orc_rn2 (rn int); create table orc_rn3 (rn int); // These inserts should produce 3 rows but only 1 row is produced from orc1 a insert overwrite table orc_rn1 select a.* where a.rn 100 insert overwrite table orc_rn2 select a.* where a.rn = 100 and a.rn 1000 insert overwrite table orc_rn3 select a.* where a.rn = 1000; select * from orc_rn1 union all select * from orc_rn2 union all select * from orc_rn3; The expected output of the query is 1 100 1 But with vectorization enabled we get 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8514) TestCliDriver.testCliDriver_index_in_db fails in trunk
[ https://issues.apache.org/jira/browse/HIVE-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176526#comment-14176526 ] Navis commented on HIVE-8514: - +1 TestCliDriver.testCliDriver_index_in_db fails in trunk -- Key: HIVE-8514 URL: https://issues.apache.org/jira/browse/HIVE-8514 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8514.1.patch I thought I had tested it on trunk, but apparently not. .q.out file needs update for trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7893) Find a way to get a job identifier when submitting a spark job [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li resolved HIVE-7893. -- Resolution: Fixed Fixed via HIVE-7439 Find a way to get a job identifier when submitting a spark job [Spark Branch] - Key: HIVE-7893 URL: https://issues.apache.org/jira/browse/HIVE-7893 Project: Hive Issue Type: Task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Labels: Spark-M3 Currently we use the {{foreach}} RDD action to submit a spark job. In order to implement job monitoring functionality (HIVE-7438), we need to get a job identifier when submitting the job, so that we can later register some listener for that specific job. This task requires facilitation from spark side (SPARK-2636). I've tried to use {{AsyncRDDActions}} instead of the traditional actions, and it proved to be a possible way to get the job ID we need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8514) TestCliDriver.testCliDriver_index_in_db fails in trunk
[ https://issues.apache.org/jira/browse/HIVE-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176544#comment-14176544 ] Thejas M Nair commented on HIVE-8514: - I will commit this in another hour. It is a simple test update, I don't think it needs to wait long. That will reduce confusion among people looking into test results. TestCliDriver.testCliDriver_index_in_db fails in trunk -- Key: HIVE-8514 URL: https://issues.apache.org/jira/browse/HIVE-8514 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8514.1.patch I thought I had tested it on trunk, but apparently not. .q.out file needs update for trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8514) TestCliDriver.testCliDriver_index_in_db fails in trunk
[ https://issues.apache.org/jira/browse/HIVE-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-8514: Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk and 0.14 branch. Thanks for the review Navis! TestCliDriver.testCliDriver_index_in_db fails in trunk -- Key: HIVE-8514 URL: https://issues.apache.org/jira/browse/HIVE-8514 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8514.1.patch I thought I had tested it on trunk, but apparently not. .q.out file needs update for trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8186) Self join may fail if one side has VCs and other doesn't
[ https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8186: Attachment: HIVE-8186.6.patch.txt Self join may fail if one side has VCs and other doesn't Key: HIVE-8186 URL: https://issues.apache.org/jira/browse/HIVE-8186 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Navis Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt, HIVE-8186.3.patch.txt, HIVE-8186.4.patch.txt, HIVE-8186.5.patch.txt, HIVE-8186.6.patch.txt See comments. This also fails on trunk, although not on original join_vc query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8186) Self join may fail if one side have virtual column(s) and other doesn't
[ https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176607#comment-14176607 ] Hive QA commented on HIVE-8186: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12675765/HIVE-8186.6.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6565 tests executed *Failed tests:* {noformat} org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1344/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1344/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1344/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12675765 - PreCommit-HIVE-TRUNK-Build Self join may fail if one side have virtual column(s) and other doesn't --- Key: HIVE-8186 URL: https://issues.apache.org/jira/browse/HIVE-8186 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Navis Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt, HIVE-8186.3.patch.txt, HIVE-8186.4.patch.txt, HIVE-8186.5.patch.txt, HIVE-8186.6.patch.txt See comments. This also fails on trunk, although not on original join_vc query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression
[ https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176611#comment-14176611 ] Prasanth J commented on HIVE-8517: -- My bad. Very minor nit: Can you change the if condition as per coding convention? http://www.oracle.com/technetwork/java/codeconventions-150003.pdf (page:12 13) Otherwise, +1. Pending unit tests. When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression - Key: HIVE-8517 URL: https://issues.apache.org/jira/browse/HIVE-8517 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8517.1.patch When joining on partition column number of partitions is used as NDV which gets overridden by StatsUtils.getColStatisticsFromExpression and the number of partitions used as NDV is replaced by number of rows which results in the same behavior as explained in https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance This is the call stack. {code} StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) line: 1001 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, NodeProcessorCtx, Object...) line: 1479 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 94 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78 PreOrderWalker.walk(Node) line: 54 PreOrderWalker.walk(Node) line: 59 PreOrderWalker.walk(Node) line: 59 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, HashMapNode,Object) line: 109 AnnotateWithStatistics.transform(ParseContext) line: 78 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, SetWriteEntity) line: 120 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, HashSetReadEntity, HashSetWriteEntity) line: 99 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221 Driver.compile(String, boolean) line: 415 {code} Query {code} select ss_item_sk item_sk, d_date, sum(ss_sales_price), sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1193 and 1193+11 and ss_item_sk is not NULL group by ss_item_sk, d_date {code} Plan Notice in the Map join operator the number of rows drop from 82,510,879,939 to 36524 after the join. {code} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 4 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ss_item_sk is not null (type: boolean) Statistics: Num rows: 82510879939 Data size: 652315818272 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk} 1 {d_date_sk} {d_date} {d_month_seq} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col1, _col12, _col22, _col26, _col28, _col29 input vertices: 1 Map 4 Statistics: Num rows: 36524 Data size: 4163736 Basic
[jira] [Commented] (HIVE-8186) Self join may fail if one side has VCs and other doesn't
[ https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176595#comment-14176595 ] Lefty Leverenz commented on HIVE-8186: -- bq. I really hope we publish a DICT for these ABBRs. Before we use any, we put it in the DICT first. How would you enforce that rule, by eDICT? But seriously, for the sake of JIRA readers everywhere the description should spell out uncommon abbreviations that appear in the title. Self join may fail if one side has VCs and other doesn't Key: HIVE-8186 URL: https://issues.apache.org/jira/browse/HIVE-8186 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Navis Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt, HIVE-8186.3.patch.txt, HIVE-8186.4.patch.txt, HIVE-8186.5.patch.txt, HIVE-8186.6.patch.txt See comments. This also fails on trunk, although not on original join_vc query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8357) Path type entities should use qualified path rather than string
[ https://issues.apache.org/jira/browse/HIVE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8357: Attachment: HIVE-8357.4.patch.txt Rerun test Path type entities should use qualified path rather than string --- Key: HIVE-8357 URL: https://issues.apache.org/jira/browse/HIVE-8357 Project: Hive Issue Type: Improvement Components: Authorization Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-8357.1.patch.txt, HIVE-8357.2.patch.txt, HIVE-8357.3.patch.txt, HIVE-8357.4.patch.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8466) nonReserved keywords can not be used as table alias
[ https://issues.apache.org/jira/browse/HIVE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176591#comment-14176591 ] Hive QA commented on HIVE-8466: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12675756/HIVE-8466.3.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6565 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_in_db org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.testStatsAfterCompactionPartTbl {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1343/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1343/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1343/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12675756 - PreCommit-HIVE-TRUNK-Build nonReserved keywords can not be used as table alias --- Key: HIVE-8466 URL: https://issues.apache.org/jira/browse/HIVE-8466 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: cw Priority: Minor Attachments: HIVE-8466.1.patch, HIVE-8466.2.patch.txt, HIVE-8466.3.patch.txt There is a small mistake in the patch of issue HIVE-2906. See the change of FromClauseParser.g -: tabname=tableName (ts=tableSample)? (KW_AS? alias=identifier)? -- ^(TOK_TABREF $tabname $ts? $alias?) +: tabname=tableName (props=tableProperties)? (ts=tableSample)? (KW_AS? alias=Identifier)? +- ^(TOK_TABREF $tabname $props? $ts? $alias?) With the 'identifier' changed to 'Identifier' we can not use nonReserved keywords as table alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8319) Add configuration for custom services in hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8319: Attachment: HIVE-8319.3.patch.txt Rerun test Add configuration for custom services in hiveserver2 Key: HIVE-8319 URL: https://issues.apache.org/jira/browse/HIVE-8319 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-8319.1.patch.txt, HIVE-8319.2.patch.txt, HIVE-8319.3.patch.txt NO PRECOMMIT TESTS Register services to hiveserver2, for example, {noformat} property namehive.server2.service.classesname valuecom.nexr.hive.service.HiveStatus,com.nexr.hive.service.AzkabanServicevalue /property property nameazkaban.ssl.portname name...name /property {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8397) Approximated cardinality with HyperLogLog UDAF
[ https://issues.apache.org/jira/browse/HIVE-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8397: Attachment: HIVE-8397.2.patch.txt Approximated cardinality with HyperLogLog UDAF -- Key: HIVE-8397 URL: https://issues.apache.org/jira/browse/HIVE-8397 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-8397.1.patch.txt, HIVE-8397.2.patch.txt Useful sometimes for quick estimation of bulk data. {noformat} select hll(key), hll(value) from src; {noformat} Also can be used with hive.fetch.task.aggr=true; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8466) nonReserved keywords can not be used as table alias
[ https://issues.apache.org/jira/browse/HIVE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176592#comment-14176592 ] Navis commented on HIVE-8466: - Fails seemed not related to this. [~cwsteinbach] Could you review this? nonReserved keywords can not be used as table alias --- Key: HIVE-8466 URL: https://issues.apache.org/jira/browse/HIVE-8466 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: cw Priority: Minor Attachments: HIVE-8466.1.patch, HIVE-8466.2.patch.txt, HIVE-8466.3.patch.txt There is a small mistake in the patch of issue HIVE-2906. See the change of FromClauseParser.g -: tabname=tableName (ts=tableSample)? (KW_AS? alias=identifier)? -- ^(TOK_TABREF $tabname $ts? $alias?) +: tabname=tableName (props=tableProperties)? (ts=tableSample)? (KW_AS? alias=Identifier)? +- ^(TOK_TABREF $tabname $props? $ts? $alias?) With the 'identifier' changed to 'Identifier' we can not use nonReserved keywords as table alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)