[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()
[ https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8458: - Description: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE was: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE Potential null dereference in Utilities#clearWork() --- Key: HIVE-8458 URL: https://issues.apache.org/jira/browse/HIVE-8458 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-8458_001.patch {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8343: - Description: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html was: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner Key: HIVE-8343 URL: https://issues.apache.org/jira/browse/HIVE-8343 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: JongWon Park Priority: Minor Attachments: HIVE-8343.patch In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635379#comment-14635379 ] Pengcheng Xiong commented on HIVE-3: [~tfriedr], thanks for your efforts. So, I am going to close this jira, [~shiroy] and [~libing] please feel free to reopen it if the problem remains. Thanks. ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. --- Key: HIVE-3 URL: https://issues.apache.org/jira/browse/HIVE-3 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 1.2.1 Environment: Reporter: Shiroy Pigarez Assignee: Pengcheng Xiong Priority: Critical I was trying to perform some column statistics using hive as per the documentation https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive and was encountering the following errors: Seems like a bug. Can you look into this? Thanks in advance. -- HIVE table {noformat} hive create table people_part( name string, address string) PARTITIONED BY (dob string, nationality varchar(2)) row format delimited fields terminated by '\t'; {noformat} --Analyze table with partition dob and nationality with FOR COLUMNS {noformat} hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) COMPUTE STATISTICS FOR COLUMNS; NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) at org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) at org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:95 cannot recognize input near 'EOF' 'EOF' 'EOF' in column name {noformat} --Analyze table with partition dob and nationality values specified with FOR COLUMNS {noformat} hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') COMPUTE STATISTICS FOR COLUMNS; NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) at org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) at org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at
[jira] [Updated] (HIVE-11310) Avoid expensive AST tree conversion to String in RowResolver
[ https://issues.apache.org/jira/browse/HIVE-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11310: --- Attachment: HIVE-11310.3.patch Avoid expensive AST tree conversion to String in RowResolver Key: HIVE-11310 URL: https://issues.apache.org/jira/browse/HIVE-11310 Project: Hive Issue Type: Bug Components: Parser Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11310.1.patch, HIVE-11310.2.patch, HIVE-11310.3.patch, HIVE-11310.patch We use the AST tree String representation of a condition in the WHERE clause to identify its column in the RowResolver. This can lead to OOM Exceptions when the condition is very large. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11328) Avoid String representation of expression nodes in ConstantPropagateProcFactory unless necessary
[ https://issues.apache.org/jira/browse/HIVE-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635344#comment-14635344 ] Hive QA commented on HIVE-11328: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746339/HIVE-11328.patch {color:green}SUCCESS:{color} +1 9229 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4681/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4681/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4681/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12746339 - PreCommit-HIVE-TRUNK-Build Avoid String representation of expression nodes in ConstantPropagateProcFactory unless necessary Key: HIVE-11328 URL: https://issues.apache.org/jira/browse/HIVE-11328 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11328.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11327) HiveQL to HBase - Predicate Pushdown for composite key not working
[ https://issues.apache.org/jira/browse/HIVE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yannik Zuehlke updated HIVE-11327: -- Tags: hive, predicatepushdown, hbase (was: hive predicatepushdown) HiveQL to HBase - Predicate Pushdown for composite key not working -- Key: HIVE-11327 URL: https://issues.apache.org/jira/browse/HIVE-11327 Project: Hive Issue Type: Bug Components: HBase Handler, Hive Affects Versions: 0.14.0 Reporter: Yannik Zuehlke Priority: Blocker I am using Hive 0.14 and Hbase 0.98.8 I would like to use HiveQL for accessing a HBase table. I created a table with a complex composite rowkey: {quote} CREATE EXTERNAL TABLE db.hive_hbase (rowkey structp1:string, p2:string, p3:string, column1 string, column2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ';' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:c1,cf:c2) TBLPROPERTIES(hbase.table.name=hbase_table); {quote} The table is getting successfully created, but the HiveQL query is taking forever: {quote} SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz'; {quote} I am working with 1 TB of data (around 1,5 bn records) and this queries takes forever (It ran over night, but did not finish in the morning). I changed the log4j properties to 'DEBUG' and found some interesting information: {quote} 2015-07-15 15:56:41,232 INFO ppd.OpProcFactory (OpProcFactory.java:logExpr(823)) - Pushdown Predicates of FIL For Alias : hive_hbase 2015-07-15 15:56:41,232 INFO ppd.OpProcFactory (OpProcFactory.java:logExpr(826)) - (rowkey.p1 = 'xyz') {quote} But some lines later: {quote} 2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory (OpProcFactory.java:pushFilterToStorageHandler(1051)) - No pushdown possible for predicate: (rowkey.p1 = 'xyz') {quote} So my guess is: HiveQL over HBase does not do any predicate pushdown but starts a MapReduce job. The normal HBase scan (via the HBase Shell) takes around 5 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636165#comment-14636165 ] Hive QA commented on HIVE-7723: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746439/HIVE-7723.12.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9245 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4688/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4688/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4688/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12746439 - PreCommit-HIVE-TRUNK-Build Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity Key: HIVE-7723 URL: https://issues.apache.org/jira/browse/HIVE-7723 Project: Hive Issue Type: Bug Components: CLI, Physical Optimizer Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, HIVE-7723.11.patch, HIVE-7723.11.patch, HIVE-7723.12.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it showed that ReadEntity.equals is taking ~40% of the CPU. ReadEntity.equals is called from the snippet below. Again and again the set is iterated over to get the actual match, a HashMap is a better option for this case as Set doesn't have a Get method. Also for ReadEntity equals is case-insensitive while hash is , which is an undesired behavior. {code} public static ReadEntity addInput(SetReadEntity inputs, ReadEntity newInput) { // If the input is already present, make sure the new parent is added to the input. if (inputs.contains(newInput)) { for (ReadEntity input : inputs) { if (input.equals(newInput)) { if ((newInput.getParents() != null) (!newInput.getParents().isEmpty())) { input.getParents().addAll(newInput.getParents()); input.setDirect(input.isDirect() || newInput.isDirect()); } return input; } } assert false; } else { inputs.add(newInput); return newInput; } // make compile happy return null; } {code} This is the query used : {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN
[jira] [Commented] (HIVE-11335) Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636164#comment-14636164 ] fatkun commented on HIVE-11335: --- [~jcamachorodriguez] Could you take a look? Multi-Join Inner Query producing incorrect results -- Key: HIVE-11335 URL: https://issues.apache.org/jira/browse/HIVE-11335 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.1.0 Environment: CDH5.4.0 Reporter: fatkun test step {code} create table log (uid string, uid2 string); insert into log values ('1', '1'); create table user (uid string, name string); insert into user values ('1', test1); select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid2=c.uid); {code} return wrong result: 1 test1 It should be both return test1 I try to find error, if I use this query, return right result.(join key different) {code} select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid=c.uid); {code} The explain is different,Query1 only select one colum {code} b:user TableScan alias: user Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: uid (type: string) outputColumnNames: _col0 {code} I think there is something wrong in ColumnPruner.But i cannot find it out. It may relate HIVE-10996 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11196) Utilities.getPartitionDesc() should try to reuse TableDesc object
[ https://issues.apache.org/jira/browse/HIVE-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636128#comment-14636128 ] Hive QA commented on HIVE-11196: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746431/HIVE-11196.3.patch {color:green}SUCCESS:{color} +1 9245 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4687/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4687/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4687/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12746431 - PreCommit-HIVE-TRUNK-Build Utilities.getPartitionDesc() should try to reuse TableDesc object -- Key: HIVE-11196 URL: https://issues.apache.org/jira/browse/HIVE-11196 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11196.1.patch, HIVE-11196.2.patch, HIVE-11196.3.patch Currently, Utilities.getPartitionDesc() creates a new PartitionDesc object which inturn creates new TableDesc object via Utilities.getTableDesc(part.getTable()) for every call. This value needs to be reused so that we can avoid the expense of creating new Descriptor object wherever possible -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636202#comment-14636202 ] wangchangchun commented on HIVE-11055: -- My hplsql is OK now. Now I want to use permant stored procedure. I should use .hplsqlrc file. .hplsqlrc file should put where? and the content of stored procedure put where? can you give me an example? HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636203#comment-14636203 ] wangchangchun commented on HIVE-11055: -- My hplsql is OK now. Now I want to use permant stored procedure. I should use .hplsqlrc file. .hplsqlrc file should put where? and the content of stored procedure put where? can you give me an example? HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11334) Incorrect answer when facing multiple chars delim and negative count for substring_index
[ https://issues.apache.org/jira/browse/HIVE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636217#comment-14636217 ] zhichao-li commented on HIVE-11334: --- https://patch-diff.githubusercontent.com/raw/apache/hive/pull/47.patch Incorrect answer when facing multiple chars delim and negative count for substring_index - Key: HIVE-11334 URL: https://issues.apache.org/jira/browse/HIVE-11334 Project: Hive Issue Type: Bug Reporter: zhichao-li Priority: Minor substring_index(www||apache||org, ||, -2) would return |apache||org instead of apache||org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11320) ACID enable predicate pushdown for insert-only delta file
[ https://issues.apache.org/jira/browse/HIVE-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635223#comment-14635223 ] Eugene Koifman commented on HIVE-11320: --- [~alangates], could you review please ACID enable predicate pushdown for insert-only delta file - Key: HIVE-11320 URL: https://issues.apache.org/jira/browse/HIVE-11320 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11320.patch Given ACID table T against which some Insert/Update/Delete has been executed but not Major Compaction. This table will have some number of delta files. (and possibly base files). Given a query: select * from T where c1 = 5; OrcRawRecordMerger() c'tor currently disables predicate pushdown in ORC to the delta file via eventOptions.searchArgument(null, null); When a delta file is known to only have Insert events we can safely push the predicate. ORC maintains stats in a footer which have counts of insert/update/delete events in the file - this can be used to determine that a given delta file only has Insert events. See OrcRecordUpdate.parseAcidStats() This will enable PPD for Streaming Ingest (HIVE-5687) use cases which by definition only generate Insert events. PPD for deltas with arbitrary types of events can be achieved but it is more complicated and will be addressed separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11320) ACID enable predicate pushdown for insert-only delta file
[ https://issues.apache.org/jira/browse/HIVE-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635403#comment-14635403 ] Alan Gates commented on HIVE-11320: --- +1 ACID enable predicate pushdown for insert-only delta file - Key: HIVE-11320 URL: https://issues.apache.org/jira/browse/HIVE-11320 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11320.patch Given ACID table T against which some Insert/Update/Delete has been executed but not Major Compaction. This table will have some number of delta files. (and possibly base files). Given a query: select * from T where c1 = 5; OrcRawRecordMerger() c'tor currently disables predicate pushdown in ORC to the delta file via eventOptions.searchArgument(null, null); When a delta file is known to only have Insert events we can safely push the predicate. ORC maintains stats in a footer which have counts of insert/update/delete events in the file - this can be used to determine that a given delta file only has Insert events. See OrcRecordUpdate.parseAcidStats() This will enable PPD for Streaming Ingest (HIVE-5687) use cases which by definition only generate Insert events. PPD for deltas with arbitrary types of events can be achieved but it is more complicated and will be addressed separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8176) Close of FSDataOutputStream in OrcRecordUpdater ctor should be in finally clause
[ https://issues.apache.org/jira/browse/HIVE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8176: - Description: {code} try { FSDataOutputStream strm = fs.create(new Path(path, ACID_FORMAT), false); strm.writeInt(ORC_ACID_VERSION); strm.close(); } catch (IOException ioe) { {code} If strm.writeInt() throws IOE, strm would be left unclosed. was: {code} try { FSDataOutputStream strm = fs.create(new Path(path, ACID_FORMAT), false); strm.writeInt(ORC_ACID_VERSION); strm.close(); } catch (IOException ioe) { {code} If strm.writeInt() throws IOE, strm would be left unclosed. Close of FSDataOutputStream in OrcRecordUpdater ctor should be in finally clause Key: HIVE-8176 URL: https://issues.apache.org/jira/browse/HIVE-8176 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Attachments: HIVE-8176.patch {code} try { FSDataOutputStream strm = fs.create(new Path(path, ACID_FORMAT), false); strm.writeInt(ORC_ACID_VERSION); strm.close(); } catch (IOException ioe) { {code} If strm.writeInt() throws IOE, strm would be left unclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11301) thrift metastore issue when getting stats results in disconnect
[ https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11301: --- Attachment: HIVE-11301.02.patch It seems that the QA run is not completed. Now resubmit patch for a complete QA run. thrift metastore issue when getting stats results in disconnect --- Key: HIVE-11301 URL: https://issues.apache.org/jira/browse/HIVE-11301 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Pengcheng Xiong Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch On metastore side it looks like this: {noformat} 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is unset! Struct:AggrStats(colStats:null, partsFound:0) at org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} and then {noformat} 2015-07-17 20:32:27,796 WARN [pool-3-thread-150]: transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error closing output stream. java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) at org.apache.thrift.transport.TSocket.close(TSocket.java:196) at org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} Which on client manifests as {noformat} 2015-07-17 20:32:27,796 WARN [main()]: metastore.RetryingMetaStoreClient (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_aggr_stats_for(ThriftHiveMetastore.java:3029) at
[jira] [Resolved] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-3. Resolution: Cannot Reproduce ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. --- Key: HIVE-3 URL: https://issues.apache.org/jira/browse/HIVE-3 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 1.2.1 Environment: Reporter: Shiroy Pigarez Assignee: Pengcheng Xiong Priority: Critical I was trying to perform some column statistics using hive as per the documentation https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive and was encountering the following errors: Seems like a bug. Can you look into this? Thanks in advance. -- HIVE table {noformat} hive create table people_part( name string, address string) PARTITIONED BY (dob string, nationality varchar(2)) row format delimited fields terminated by '\t'; {noformat} --Analyze table with partition dob and nationality with FOR COLUMNS {noformat} hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) COMPUTE STATISTICS FOR COLUMNS; NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) at org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) at org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:95 cannot recognize input near 'EOF' 'EOF' 'EOF' in column name {noformat} --Analyze table with partition dob and nationality values specified with FOR COLUMNS {noformat} hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') COMPUTE STATISTICS FOR COLUMNS; NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) at org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) at org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) at
[jira] [Commented] (HIVE-11310) Avoid expensive AST tree conversion to String in RowResolver
[ https://issues.apache.org/jira/browse/HIVE-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635515#comment-14635515 ] Hive QA commented on HIVE-11310: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746374/HIVE-11310.3.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9229 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_gby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_lateral_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_lateral_view org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_move_tasks_share_dependencies {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4682/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4682/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4682/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12746374 - PreCommit-HIVE-TRUNK-Build Avoid expensive AST tree conversion to String in RowResolver Key: HIVE-11310 URL: https://issues.apache.org/jira/browse/HIVE-11310 Project: Hive Issue Type: Bug Components: Parser Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11310.1.patch, HIVE-11310.2.patch, HIVE-11310.3.patch, HIVE-11310.patch We use the AST tree String representation of a condition in the WHERE clause to identify its column in the RowResolver. This can lead to OOM Exceptions when the condition is very large. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11254) Process result sets returned by a stored procedure
[ https://issues.apache.org/jira/browse/HIVE-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635397#comment-14635397 ] Alan Gates commented on HIVE-11254: --- Why are you including hive-jdbc version 1.2.1? It seems like you want to set it to ${project.version} so that you get whatever was just built. Process result sets returned by a stored procedure -- Key: HIVE-11254 URL: https://issues.apache.org/jira/browse/HIVE-11254 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11254.1.patch, HIVE-11254.2.patch, HIVE-11254.3.patch, HIVE-11254.4.patch Stored procedure can return one or more result sets. A caller should be able to process them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11330) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression
[ https://issues.apache.org/jira/browse/HIVE-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-11330: --- Attachment: HIVE-11330.patch Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression --- Key: HIVE-11330 URL: https://issues.apache.org/jira/browse/HIVE-11330 Project: Hive Issue Type: Bug Components: Hive, Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Attachments: HIVE-11330.patch Queries with heavily nested filters can cause a StackOverflowError {code} Exception in thread main java.lang.StackOverflowError at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:301) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11229) Mutation API: Coordinator communication with meta store should be optional
[ https://issues.apache.org/jira/browse/HIVE-11229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635470#comment-14635470 ] Alan Gates commented on HIVE-11229: --- +1 Mutation API: Coordinator communication with meta store should be optional -- Key: HIVE-11229 URL: https://issues.apache.org/jira/browse/HIVE-11229 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 2.0.0 Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Attachments: HIVE-11229.1.patch [~ekoifman] raised a theoretical issue with the streaming mutation API (HIVE-10165) where worker nodes operating in a distributed cluster might overwhelm a meta store while trying to obtain partition locks. Although this does not happen in practice (see HIVE-11228), the API does communicate with the meta store in this manner to obtain partition paths and create new partitions. Therefore the issue described does in fact exist in the current implementation, albeit in a different code path. I’d like to make such communication optional like so: * When the user chooses not to create partitions on demand, no meta store connection will be created in the {{MutationCoordinators}}. Additionally, partition paths will be resolved using {{org.apache.hadoop.hive.metastore.Warehouse.getPartitionPath(Path, LinkedHashMapString, String)}} which should be suitable so long as standard Hive partition layouts are followed. * If the user does choose to create partitions on demand then the system will operate as is does currently; using the meta store to both issue {{add_partition}} events and look up partition meta data. * The documentation will be updated to describe these behaviours and outline alternative approaches to collecting affected partition names and creating partitions in a less intensive manner. Side note for follow up: The parameter names {{tblName}} and {{dbName}} seem to be the wrong way around on the method {{org.apache.hadoop.hive.metastore.IMetaStoreClient.getPartition(String, String, ListString)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11334) Incorrect answer when facing multiple chars delim and negative count for substring_index
[ https://issues.apache.org/jira/browse/HIVE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636126#comment-14636126 ] ASF GitHub Bot commented on HIVE-11334: --- GitHub user zhichao-li opened a pull request: https://github.com/apache/hive/pull/47 HIVE-11334-fix substring_index for multiple chars delim You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhichao-li/hive substringindex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/47.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #47 commit 2acedeef1ca4ff1211410e9ffe9c437f2902de0d Author: zhichao.li zhichao...@intel.com Date: 2015-07-22T01:50:03Z fix substring_index for multiple chars delim Incorrect answer when facing multiple chars delim and negative count for substring_index - Key: HIVE-11334 URL: https://issues.apache.org/jira/browse/HIVE-11334 Project: Hive Issue Type: Bug Reporter: zhichao-li Priority: Minor substring_index(www||apache||org, ||, -2) would return |apache||org instead of apache||org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11328) Avoid String representation of expression nodes in ConstantPropagateProcFactory unless necessary
[ https://issues.apache.org/jira/browse/HIVE-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636155#comment-14636155 ] Ashutosh Chauhan commented on HIVE-11328: - +1 Avoid String representation of expression nodes in ConstantPropagateProcFactory unless necessary Key: HIVE-11328 URL: https://issues.apache.org/jira/browse/HIVE-11328 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11328.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636347#comment-14636347 ] Lefty Leverenz commented on HIVE-10673: --- Doc note: *hive.optimize.dynamic.partition.hashjoin* should be documented in the wiki. Does it belong in the Tez section of Configuration Properties, or should it go in the general query execution section and just be added to the list of related parameters at the beginning of the Tez section? * [Configuration Properties -- Tez | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez] * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] Commit error: The commit to master was mislabeled HIVE-11303: Getting Tez LimitExceededException after dag execution on large query (commit ID 04d54f61c9f56906160936751e772080c079498c). The actual HIVE-11303 has commit ID 72f97fc7760134465333983fc40766e9e864e643. Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Labels: TODOC1.3 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, HIVE-10673.11.patch, HIVE-10673.12, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 of the CPU was spent during sorting/merging. While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting, which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join algorithm to perform the join in the reducer. This will require the small tables in the join to fit in the reducer/hash table for this to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11310) Avoid expensive AST tree conversion to String in RowResolver
[ https://issues.apache.org/jira/browse/HIVE-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636063#comment-14636063 ] Hive QA commented on HIVE-11310: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746399/HIVE-11310.4.patch {color:green}SUCCESS:{color} +1 9245 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4686/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4686/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4686/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12746399 - PreCommit-HIVE-TRUNK-Build Avoid expensive AST tree conversion to String in RowResolver Key: HIVE-11310 URL: https://issues.apache.org/jira/browse/HIVE-11310 Project: Hive Issue Type: Bug Components: Parser Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11310.1.patch, HIVE-11310.2.patch, HIVE-11310.3.patch, HIVE-11310.4.patch, HIVE-11310.patch We use the AST tree String representation of a condition in the WHERE clause to identify its column in the RowResolver. This can lead to OOM Exceptions when the condition is very large. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10799) Refactor the SearchArgumentFactory to remove the dependence on ExprNodeGenericFuncDesc
[ https://issues.apache.org/jira/browse/HIVE-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636064#comment-14636064 ] Prasanth Jayachandran commented on HIVE-10799: -- PPD on Char types is broken for bloom filters. Char object is not trimmed before inserting into bloom filter. So when we convert the stats object to string the hashcodes will not match. We are hitting this here https://issues.apache.org/jira/browse/HIVE-11312?focusedCommentId=14634349page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634349 Would it make sense to add CHAR to predicate types? Refactor the SearchArgumentFactory to remove the dependence on ExprNodeGenericFuncDesc -- Key: HIVE-10799 URL: https://issues.apache.org/jira/browse/HIVE-10799 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-10799.patch, HIVE-10799.patch, HIVE-10799.patch, HIVE-10799.patch, HIVE-10799.patch SearchArgumentFactory and SearchArgumentImpl are high level and shouldn't depend on the internals of Hive's AST model. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11333) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ColumnPruner prunes columns of UnionOperator that should be kept
[ https://issues.apache.org/jira/browse/HIVE-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11333: --- Attachment: HIVE-11333.01.patch CBO: Calcite Operator To Hive Operator (Calcite Return Path): ColumnPruner prunes columns of UnionOperator that should be kept -- Key: HIVE-11333 URL: https://issues.apache.org/jira/browse/HIVE-11333 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11333.01.patch unionOperator will have the schema following the operator in the first branch. Because ColumnPruner prunes columns based on the internal name, the column in other branches may be pruned due to a different internal name from the first branch. To repro, run rcfile_union.q with return path turned on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11334) Incorrect answer when facing multiple chars delim and negative count for substring_index
[ https://issues.apache.org/jira/browse/HIVE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhichao-li updated HIVE-11334: -- Priority: Major (was: Minor) Incorrect answer when facing multiple chars delim and negative count for substring_index - Key: HIVE-11334 URL: https://issues.apache.org/jira/browse/HIVE-11334 Project: Hive Issue Type: Bug Reporter: zhichao-li substring_index(www||apache||org, ||, -2) would return |apache||org instead of apache||org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-10673: -- Labels: TODOC1.3 (was: ) Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Labels: TODOC1.3 Fix For: 1.3.0, 2.0.0 Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, HIVE-10673.11.patch, HIVE-10673.12, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 of the CPU was spent during sorting/merging. While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting, which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join algorithm to perform the join in the reducer. This will require the small tables in the join to fit in the reducer/hash table for this to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8128) Improve Parquet Vectorization
[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-8128: Attachment: testParquetFile Upload the Parquet file for qfile test, which should be put at ./data/files/ Improve Parquet Vectorization - Key: HIVE-8128 URL: https://issues.apache.org/jira/browse/HIVE-8128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen Fix For: parquet-branch Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch, HIVE-8128.6-parquet.patch, testParquetFile NO PRECOMMIT TESTS We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. As discussed in PARQUET-131, we will work out Hive POC based on the new Parquet vectorized API, and then finish the implementation after finilized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11335) Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fatkun updated HIVE-11335: -- Description: test step {code} create table log (uid string, uid2 string); insert into log values ('1', '1'); create table user (uid string, name string); insert into user values ('1', test1); select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid2=c.uid); {code} return wrong result: 1 test1 It should be both return test1 I try to find error, if I use this query, return right result.(join key different) {code} select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid=c.uid); {code} The explain is different,Query1 only select one colum. It should select uid and name. {code} b:user TableScan alias: user Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: uid (type: string) outputColumnNames: _col0 {code} I think there is something wrong in ColumnPruner.But i cannot find it out. It may relate HIVE-10996 was: test step {code} create table log (uid string, uid2 string); insert into log values ('1', '1'); create table user (uid string, name string); insert into user values ('1', test1); select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid2=c.uid); {code} return wrong result: 1 test1 It should be both return test1 I try to find error, if I use this query, return right result.(join key different) {code} select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid=c.uid); {code} The explain is different,Query1 only select one colum {code} b:user TableScan alias: user Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: uid (type: string) outputColumnNames: _col0 {code} I think there is something wrong in ColumnPruner.But i cannot find it out. It may relate HIVE-10996 Multi-Join Inner Query producing incorrect results -- Key: HIVE-11335 URL: https://issues.apache.org/jira/browse/HIVE-11335 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.1.0 Environment: CDH5.4.0 Reporter: fatkun test step {code} create table log (uid string, uid2 string); insert into log values ('1', '1'); create table user (uid string, name string); insert into user values ('1', test1); select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid2=c.uid); {code} return wrong result: 1 test1 It should be both return test1 I try to find error, if I use this query, return right result.(join key different) {code} select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid=c.uid); {code} The explain is different,Query1 only select one colum. It should select uid and name. {code} b:user TableScan alias: user Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: uid (type: string) outputColumnNames: _col0 {code} I think there is something wrong in ColumnPruner.But i cannot find it out. It may relate HIVE-10996 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function
[ https://issues.apache.org/jira/browse/HIVE-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636169#comment-14636169 ] Ashutosh Chauhan commented on HIVE-11271: - [~ychena] Can you check if this issue is fixed by HIVE-11333 ? If so, I think that is a better fix, since such issues should be resolved at compile time, not run time (ie Operators should not participate in this) cc: [~pxiong] java.lang.IndexOutOfBoundsException when union all with if function --- Key: HIVE-11271 URL: https://issues.apache.org/jira/browse/HIVE-11271 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11271.1.patch Some queries with Union all as subquery fail in MapReduce task with stacktrace: {noformat} 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing operator UNION[104] 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor complete. 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: job_local826862759_0005 java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 10 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140) ... 21 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at
[jira] [Commented] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function
[ https://issues.apache.org/jira/browse/HIVE-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636085#comment-14636085 ] Yongzhi Chen commented on HIVE-11271: - Thanks [~szehon] for reviewing it. java.lang.IndexOutOfBoundsException when union all with if function --- Key: HIVE-11271 URL: https://issues.apache.org/jira/browse/HIVE-11271 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11271.1.patch Some queries with Union all as subquery fail in MapReduce task with stacktrace: {noformat} 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing operator UNION[104] 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor complete. 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: job_local826862759_0005 java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 10 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140) ... 21 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119) ... 21 more {noformat} Reproduce: {noformat}
[jira] [Updated] (HIVE-11335) Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fatkun updated HIVE-11335: -- Attachment: query1.txt query2.txt attach query explain Multi-Join Inner Query producing incorrect results -- Key: HIVE-11335 URL: https://issues.apache.org/jira/browse/HIVE-11335 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.1.0 Environment: CDH5.4.0 Reporter: fatkun Attachments: query1.txt, query2.txt test step {code} create table log (uid string, uid2 string); insert into log values ('1', '1'); create table user (uid string, name string); insert into user values ('1', test1); select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid2=c.uid); {code} return wrong result: 1 test1 It should be both return test1 I try to find error, if I use this query, return right result.(join key different) {code} select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid=c.uid); {code} The explain is different,Query1 only select one colum. It should select uid and name. {code} b:user TableScan alias: user Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: uid (type: string) outputColumnNames: _col0 {code} I think there is something wrong in ColumnPruner.But i cannot find it out. It may relate HIVE-10996 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11335) Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fatkun updated HIVE-11335: -- Description: test step {code} create table log (uid string, uid2 string); insert into log values ('1', '1'); create table user (uid string, name string); insert into user values ('1', test1); select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid2=c.uid); {code} return wrong result: 1 test1 It should be both return test1 I try to find error, if I use this query, return right result.(join key different) {code} select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid=c.uid); {code} The explain is different,Query1 only select one colum {code} b:user TableScan alias: user Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: uid (type: string) outputColumnNames: _col0 {code} I think there is something wrong in ColumnPruner.But i cannot find it out. It may relate HIVE-10996 was: test step ``` create table log (uid string, uid2 string); insert into log values ('1', '1'); create table user (uid string, name string); insert into user values ('1', test1); select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid2=c.uid); ``` return wrong result: 1 test1 It should be both return test1 I try to find error, if I use this query, return right result.(join key different) ``` select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid=c.uid); ``` The explain is different,Query1 only select one colum ``` b:user TableScan alias: user Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: uid (type: string) outputColumnNames: _col0 ``` I think there is something wrong in ColumnPruner.But i cannot find it out. It may relate HIVE-10996 Multi-Join Inner Query producing incorrect results -- Key: HIVE-11335 URL: https://issues.apache.org/jira/browse/HIVE-11335 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.1.0 Environment: CDH5.4.0 Reporter: fatkun test step {code} create table log (uid string, uid2 string); insert into log values ('1', '1'); create table user (uid string, name string); insert into user values ('1', test1); select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid2=c.uid); {code} return wrong result: 1 test1 It should be both return test1 I try to find error, if I use this query, return right result.(join key different) {code} select b.name, c.name from log a left outer join (select uid, name from user) b on (a.uid=b.uid) left outer join user c on (a.uid=c.uid); {code} The explain is different,Query1 only select one colum {code} b:user TableScan alias: user Statistics: Num rows: 1 Data size: 7 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: uid (type: string) outputColumnNames: _col0 {code} I think there is something wrong in ColumnPruner.But i cannot find it out. It may relate HIVE-10996 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11077) Add support in parser and wire up to txn manager
[ https://issues.apache.org/jira/browse/HIVE-11077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635611#comment-14635611 ] Alan Gates commented on HIVE-11077: --- Comments posted to review board. Add support in parser and wire up to txn manager Key: HIVE-11077 URL: https://issues.apache.org/jira/browse/HIVE-11077 Project: Hive Issue Type: Sub-task Components: SQL, Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11077.3.patch, HIVE-11077.5.patch, HIVE-11077.6.patch, HIVE-11077.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11310) Avoid expensive AST tree conversion to String in RowResolver
[ https://issues.apache.org/jira/browse/HIVE-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11310: --- Attachment: HIVE-11310.4.patch Avoid expensive AST tree conversion to String in RowResolver Key: HIVE-11310 URL: https://issues.apache.org/jira/browse/HIVE-11310 Project: Hive Issue Type: Bug Components: Parser Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11310.1.patch, HIVE-11310.2.patch, HIVE-11310.3.patch, HIVE-11310.4.patch, HIVE-11310.patch We use the AST tree String representation of a condition in the WHERE clause to identify its column in the RowResolver. This can lead to OOM Exceptions when the condition is very large. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function
[ https://issues.apache.org/jira/browse/HIVE-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635581#comment-14635581 ] Szehon Ho commented on HIVE-11271: -- Sorry for late reply. The overall idea makes sense (keeping track of corresponding columns in parent filter condition), so +1 from me. java.lang.IndexOutOfBoundsException when union all with if function --- Key: HIVE-11271 URL: https://issues.apache.org/jira/browse/HIVE-11271 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11271.1.patch Some queries with Union all as subquery fail in MapReduce task with stacktrace: {noformat} 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing operator UNION[104] 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor complete. 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: job_local826862759_0005 java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 10 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140) ... 21 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442) at
[jira] [Commented] (HIVE-11301) thrift metastore issue when getting stats results in disconnect
[ https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635694#comment-14635694 ] Hive QA commented on HIVE-11301: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746384/HIVE-11301.02.patch {color:green}SUCCESS:{color} +1 9230 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4683/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4683/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4683/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12746384 - PreCommit-HIVE-TRUNK-Build thrift metastore issue when getting stats results in disconnect --- Key: HIVE-11301 URL: https://issues.apache.org/jira/browse/HIVE-11301 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Pengcheng Xiong Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch On metastore side it looks like this: {noformat} 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is unset! Struct:AggrStats(colStats:null, partsFound:0) at org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} and then {noformat} 2015-07-17 20:32:27,796 WARN [pool-3-thread-150]: transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error closing output stream. java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) at org.apache.thrift.transport.TSocket.close(TSocket.java:196) at org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} Which on client manifests as {noformat} 2015-07-17 20:32:27,796 WARN [main()]: metastore.RetryingMetaStoreClient (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost
[jira] [Commented] (HIVE-11301) thrift metastore issue when getting stats results in disconnect
[ https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635698#comment-14635698 ] Pengcheng Xiong commented on HIVE-11301: [~sershe], could you please take a look? IMHO, I think the problem of invalid thrift can be handled in future, in a separate JIRA. Thanks. thrift metastore issue when getting stats results in disconnect --- Key: HIVE-11301 URL: https://issues.apache.org/jira/browse/HIVE-11301 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Pengcheng Xiong Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch On metastore side it looks like this: {noformat} 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is unset! Struct:AggrStats(colStats:null, partsFound:0) at org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} and then {noformat} 2015-07-17 20:32:27,796 WARN [pool-3-thread-150]: transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error closing output stream. java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) at org.apache.thrift.transport.TSocket.close(TSocket.java:196) at org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} Which on client manifests as {noformat} 2015-07-17 20:32:27,796 WARN [main()]: metastore.RetryingMetaStoreClient (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at
[jira] [Commented] (HIVE-11296) Merge from master to spark branch [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635578#comment-14635578 ] Hive QA commented on HIVE-11296: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746164/HIVE-11296.1-spark.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7689 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_count_distinct org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/937/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/937/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-937/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12746164 - PreCommit-HIVE-SPARK-Build Merge from master to spark branch [Spark Branch] Key: HIVE-11296 URL: https://issues.apache.org/jira/browse/HIVE-11296 Project: Hive Issue Type: Bug Components: Spark Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-11296.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635548#comment-14635548 ] Sushanth Sowmyan commented on HIVE-8678: What storage format are you using for the table in question? (i.e. is it Text, RCFile, ORC, something else?) Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11254) Process result sets returned by a stored procedure
[ https://issues.apache.org/jira/browse/HIVE-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635560#comment-14635560 ] Dmitry Tolpeko commented on HIVE-11254: --- This was the latest version available at maven repo. Actially I do not use hive-jdbc yet in tests, is it ok if I modify pom later (when tests requiring Hive connection will be added)? Process result sets returned by a stored procedure -- Key: HIVE-11254 URL: https://issues.apache.org/jira/browse/HIVE-11254 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11254.1.patch, HIVE-11254.2.patch, HIVE-11254.3.patch, HIVE-11254.4.patch Stored procedure can return one or more result sets. A caller should be able to process them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()
[ https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11316: - Attachment: HIVE-11316.3.patch [~jcamachorodriguez] Can you please review this patch. The patch#3 also addresses the issue raised by [~eugene.koifman] in HIVE-11281. Thanks Hari Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree() -- Key: HIVE-11316 URL: https://issues.apache.org/jira/browse/HIVE-11316 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11316.1.patch, HIVE-11316.2.patch, HIVE-11316.3.patch HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira is suppose to alter the string memoization to use a different data structure that doesn't duplicate any part of the string so that we do not run into OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data
[ https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635576#comment-14635576 ] Sergey Shelukhin commented on HIVE-11305: - also requires hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled to be true LLAP: Hybrid Map-join cache returns invalid data - Key: HIVE-11305 URL: https://issues.apache.org/jira/browse/HIVE-11305 Project: Hive Issue Type: Sub-task Affects Versions: llap Environment: TPC-DS 200 scale data Reporter: Gopal V Assignee: Sergey Shelukhin Priority: Critical Fix For: llap Attachments: q55-test.sql Start a 1-node LLAP cluster with 16 executors and run attached test-case on the single node instance. {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) ... 17 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11301) thrift metastore issue when getting stats results in disconnect
[ https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635710#comment-14635710 ] Sergey Shelukhin commented on HIVE-11301: - +1 thrift metastore issue when getting stats results in disconnect --- Key: HIVE-11301 URL: https://issues.apache.org/jira/browse/HIVE-11301 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Pengcheng Xiong Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch On metastore side it looks like this: {noformat} 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is unset! Struct:AggrStats(colStats:null, partsFound:0) at org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} and then {noformat} 2015-07-17 20:32:27,796 WARN [pool-3-thread-150]: transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error closing output stream. java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) at org.apache.thrift.transport.TSocket.close(TSocket.java:196) at org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} Which on client manifests as {noformat} 2015-07-17 20:32:27,796 WARN [main()]: metastore.RetryingMetaStoreClient (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost connection. Attempting to reconnect. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_aggr_stats_for(ThriftHiveMetastore.java:3029) at
[jira] [Commented] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data
[ https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635835#comment-14635835 ] Sergey Shelukhin commented on HIVE-11305: - hmm, no, it's a different cache and queryId LLAP: Hybrid Map-join cache returns invalid data - Key: HIVE-11305 URL: https://issues.apache.org/jira/browse/HIVE-11305 Project: Hive Issue Type: Sub-task Affects Versions: llap Environment: TPC-DS 200 scale data Reporter: Gopal V Assignee: Sergey Shelukhin Priority: Critical Fix For: llap Attachments: q55-test.sql Start a 1-node LLAP cluster with 16 executors and run attached test-case on the single node instance. {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) ... 17 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data
[ https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635839#comment-14635839 ] Sergey Shelukhin commented on HIVE-11305: - ah nm, stupid LLAP: Hybrid Map-join cache returns invalid data - Key: HIVE-11305 URL: https://issues.apache.org/jira/browse/HIVE-11305 Project: Hive Issue Type: Sub-task Affects Versions: llap Environment: TPC-DS 200 scale data Reporter: Gopal V Assignee: Sergey Shelukhin Priority: Critical Fix For: llap Attachments: q55-test.sql Start a 1-node LLAP cluster with 16 executors and run attached test-case on the single node instance. {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) ... 17 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11294) Use HBase to cache aggregated stats
[ https://issues.apache.org/jira/browse/HIVE-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635866#comment-14635866 ] Alan Gates commented on HIVE-11294: --- Incorporated most of the feedback. Responses a couple of the comments where I disagreed. Use HBase to cache aggregated stats --- Key: HIVE-11294 URL: https://issues.apache.org/jira/browse/HIVE-11294 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: hbase-metastore-branch Reporter: Alan Gates Assignee: Alan Gates Fix For: hbase-metastore-branch Attachments: HIVE-11294.patch Currently stats are cached only in the memory of the client. Given that HBase can easily manage the scale of caching aggregated stats we should be using it to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11196) Utilities.getPartitionDesc() should try to reuse TableDesc object
[ https://issues.apache.org/jira/browse/HIVE-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11196: - Attachment: HIVE-11196.3.patch [~jcamachorodriguez] Can you please review this patch #3. Thanks Hari Utilities.getPartitionDesc() should try to reuse TableDesc object -- Key: HIVE-11196 URL: https://issues.apache.org/jira/browse/HIVE-11196 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11196.1.patch, HIVE-11196.2.patch, HIVE-11196.3.patch Currently, Utilities.getPartitionDesc() creates a new PartitionDesc object which inturn creates new TableDesc object via Utilities.getTableDesc(part.getTable()) for every call. This value needs to be reused so that we can avoid the expense of creating new Descriptor object wherever possible -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11330) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression
[ https://issues.apache.org/jira/browse/HIVE-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635830#comment-14635830 ] Hive QA commented on HIVE-11330: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746390/HIVE-11330.patch {color:green}SUCCESS:{color} +1 9241 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4684/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4684/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4684/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12746390 - PreCommit-HIVE-TRUNK-Build Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression --- Key: HIVE-11330 URL: https://issues.apache.org/jira/browse/HIVE-11330 Project: Hive Issue Type: Bug Components: Hive, Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Attachments: HIVE-11330.patch Queries with heavily nested filters can cause a StackOverflowError {code} Exception in thread main java.lang.StackOverflowError at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:301) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11330) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression
[ https://issues.apache.org/jira/browse/HIVE-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635799#comment-14635799 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11330: -- +1 pending tests. Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression --- Key: HIVE-11330 URL: https://issues.apache.org/jira/browse/HIVE-11330 Project: Hive Issue Type: Bug Components: Hive, Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran Attachments: HIVE-11330.patch Queries with heavily nested filters can cause a StackOverflowError {code} Exception in thread main java.lang.StackOverflowError at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:301) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateExpression(StatsRulesProcFactory.java:326) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$FilterStatsRule.evaluateChildExpr(StatsRulesProcFactory.java:525) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635822#comment-14635822 ] Michael McLellan commented on HIVE-8678: ORC Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data
[ https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635781#comment-14635781 ] Sergey Shelukhin commented on HIVE-11305: - looks like mapjoin cache is being reused between queries. There's either a bug in recent patch, or some fundamental issue (i.e. if query ID is duplicated between queries, I am not sure how it ever worked) LLAP: Hybrid Map-join cache returns invalid data - Key: HIVE-11305 URL: https://issues.apache.org/jira/browse/HIVE-11305 Project: Hive Issue Type: Sub-task Affects Versions: llap Environment: TPC-DS 200 scale data Reporter: Gopal V Assignee: Sergey Shelukhin Priority: Critical Fix For: llap Attachments: q55-test.sql Start a 1-node LLAP cluster with 16 executors and run attached test-case on the single node instance. {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) ... 17 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-10673: -- Release Note: This adds configuration parameter hive.optimize.dynamic.partition.hashjoin, which enables selection of the dynamically partitioned hash join with the Tez execution engine Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Fix For: 1.3.0, 2.0.0 Attachments: HIVE-10673.1.patch, HIVE-10673.10.patch, HIVE-10673.11.patch, HIVE-10673.12, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 of the CPU was spent during sorting/merging. While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting, which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join algorithm to perform the join in the reducer. This will require the small tables in the join to fit in the reducer/hash table for this to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11254) Process result sets returned by a stored procedure
[ https://issues.apache.org/jira/browse/HIVE-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635932#comment-14635932 ] Alan Gates commented on HIVE-11254: --- I think changing the pom entry to: {code} dependency groupIdorg.apache.hive/groupId artifactIdhive-jdbc/artifactId version${project.version}/version scopetest/scope /dependency {code} will do what you want. But if you want to drop the pom.xml changes that's fine too. Other than this I'm +1 on the patch. Process result sets returned by a stored procedure -- Key: HIVE-11254 URL: https://issues.apache.org/jira/browse/HIVE-11254 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11254.1.patch, HIVE-11254.2.patch, HIVE-11254.3.patch, HIVE-11254.4.patch Stored procedure can return one or more result sets. A caller should be able to process them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11253) Move SearchArgument and VectorizedRowBatch classes to storage-api.
[ https://issues.apache.org/jira/browse/HIVE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635951#comment-14635951 ] Prasanth Jayachandran commented on HIVE-11253: -- LGTM, +1 Move SearchArgument and VectorizedRowBatch classes to storage-api. -- Key: HIVE-11253 URL: https://issues.apache.org/jira/browse/HIVE-11253 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 2.0.0 Attachments: HIVE-11253.patch, HIVE-11253.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11294) Use HBase to cache aggregated stats
[ https://issues.apache.org/jira/browse/HIVE-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-11294: -- Attachment: HIVE-11294.2.patch New patch that incorporates changes based on Thejas' feedback. Use HBase to cache aggregated stats --- Key: HIVE-11294 URL: https://issues.apache.org/jira/browse/HIVE-11294 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: hbase-metastore-branch Reporter: Alan Gates Assignee: Alan Gates Fix For: hbase-metastore-branch Attachments: HIVE-11294.2.patch, HIVE-11294.patch Currently stats are cached only in the memory of the client. Given that HBase can easily manage the scale of caching aggregated stats we should be using it to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11318) Move ORC table properties from OrcFile to OrcOutputFormat
[ https://issues.apache.org/jira/browse/HIVE-11318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HIVE-11318. -- Resolution: Duplicate Move ORC table properties from OrcFile to OrcOutputFormat - Key: HIVE-11318 URL: https://issues.apache.org/jira/browse/HIVE-11318 Project: Hive Issue Type: Sub-task Reporter: Prasanth Jayachandran Fix For: 2.0.0 OrcFile contains TableProperties which can be moved to OrcOutputFormat. Also remove deprecated configs that are no longer used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()
[ https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635983#comment-14635983 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11316: -- The failures on patch#3 are unrelated to the changes. Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree() -- Key: HIVE-11316 URL: https://issues.apache.org/jira/browse/HIVE-11316 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11316.1.patch, HIVE-11316.2.patch, HIVE-11316.3.patch HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira is suppose to alter the string memoization to use a different data structure that doesn't duplicate any part of the string so that we do not run into OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
[ https://issues.apache.org/jira/browse/HIVE-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635988#comment-14635988 ] Owen O'Malley commented on HIVE-11321: -- So the path looks like: * table properties using the orc.* name * config using the orc.* name * config using the current hive.exec.ql.orc.* name * default Move OrcFile.OrcTableProperties from OrcFile into OrcConf. -- Key: HIVE-11321 URL: https://issues.apache.org/jira/browse/HIVE-11321 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 2.0.0 Attachments: HIVE-11321.patch We should pull all of the configuration/table property knobs into a single list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7723: Attachment: HIVE-7723.12.patch Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity Key: HIVE-7723 URL: https://issues.apache.org/jira/browse/HIVE-7723 Project: Hive Issue Type: Bug Components: CLI, Physical Optimizer Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, HIVE-7723.11.patch, HIVE-7723.11.patch, HIVE-7723.12.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it showed that ReadEntity.equals is taking ~40% of the CPU. ReadEntity.equals is called from the snippet below. Again and again the set is iterated over to get the actual match, a HashMap is a better option for this case as Set doesn't have a Get method. Also for ReadEntity equals is case-insensitive while hash is , which is an undesired behavior. {code} public static ReadEntity addInput(SetReadEntity inputs, ReadEntity newInput) { // If the input is already present, make sure the new parent is added to the input. if (inputs.contains(newInput)) { for (ReadEntity input : inputs) { if (input.equals(newInput)) { if ((newInput.getParents() != null) (!newInput.getParents().isEmpty())) { input.getParents().addAll(newInput.getParents()); input.setDirect(input.isDirect() || newInput.isDirect()); } return input; } } assert false; } else { inputs.add(newInput); return newInput; } // make compile happy return null; } {code} This is the query used : {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON
[jira] [Commented] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by
[ https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636001#comment-14636001 ] Mostafa Mokhtar commented on HIVE-11172: [~hsubramaniyan] Is this getting back ported? Vectorization wrong results for aggregate query with where clause without group by -- Key: HIVE-11172 URL: https://issues.apache.org/jira/browse/HIVE-11172 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Reporter: Yi Zhang Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Fix For: 2.0.0 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, HIVE-11172.3.patch create table testvec(id int, dt int, greg_dt string) stored as orc; insert into table testvec values (1,20150330, '2015-03-30'), (2,20150301, '2015-03-01'), (3,20150502, '2015-05-02'), (4,20150401, '2015-04-01'), (5,20150313, '2015-03-13'), (6,20150314, '2015-03-14'), (7,20150404, '2015-04-04'); hive select dt, greg_dt from testvec where id=5; OK 20150313 2015-03-13 Time taken: 4.435 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive set hive.map.aggr; hive.map.aggr=true hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-30 hive set hive.vectorized.execution.enabled=false; hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11259) LLAP: clean up ORC dependencies part 1
[ https://issues.apache.org/jira/browse/HIVE-11259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635943#comment-14635943 ] Prasanth Jayachandran commented on HIVE-11259: -- Moving interfaces to common looks good to me. Some comments related to simplifying the code 1) There are many Chunk classes that inherit from DiskRange. BufferChunk, CacheChunk, ProcCacheChunk, TrackedCacheChunk, UncompressedCacheChunk, DiskRange, DiskRangeList. I am not sure if we need all of them. I see the purpose of BufferChunk, CacheChunk and DiskRange. But others seems to be an overkill. Can you move the flags/variables (isCompressed, isReleased etc.) to CacheChunk? 2) Can the DiskRangeList purpose can be replaced by TreeMap? Using offset as the key comparator, lowerKey(), higherKey() methods for merging in case of any overlap. 3) Replace BooleanRef with Boolean? LLAP: clean up ORC dependencies part 1 -- Key: HIVE-11259 URL: https://issues.apache.org/jira/browse/HIVE-11259 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11259.patch Before there's storage handler module, we can clean some things up -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11259) LLAP: clean up ORC dependencies part 1
[ https://issues.apache.org/jira/browse/HIVE-11259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635948#comment-14635948 ] Sergey Shelukhin commented on HIVE-11259: - 1) These classes existed before, I will see if they can be merged. 2) I don't think removing intrinsic linked list is a good idea, there are problems with java LinkedList - it's hard to keep pointer to an element, as soon as list is modified all the iterators are invalidated. So for example reading multiple columns RG by RG, keeping the pointer to end of last RG (where nexr RG, that may be in separate buffer due to SARG filtering, starts) becomes impossible - as soon as one column read replaces buffer chunk with cache chunk, pointers for all other columns become invalid. TreeMap will have the same problem as far as I can tell. It's really not that complicated to have a linked list, if Java was a real programming language we could even make it an aspect sort of thing via multiple inheritance :) 3) Ok LLAP: clean up ORC dependencies part 1 -- Key: HIVE-11259 URL: https://issues.apache.org/jira/browse/HIVE-11259 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11259.patch Before there's storage handler module, we can clean some things up -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11259) LLAP: clean up ORC dependencies part 1
[ https://issues.apache.org/jira/browse/HIVE-11259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635948#comment-14635948 ] Sergey Shelukhin edited comment on HIVE-11259 at 7/21/15 10:33 PM: --- 1) These classes existed before, I will see if they can be merged. 2) I don't think removing intrinsic linked list is a good idea, there are problems with java LinkedList - it's hard to keep pointer to an element and modify it in place in general; also, as soon as list is modified all the iterators are invalidated; so, for example, reading multiple columns RG by RG, keeping the pointer to end of last RG (where nexr RG, that may be in separate buffer due to SARG filtering, starts) becomes impossible - as soon as one column read replaces buffer chunk with cache chunk, iterators for all other columns become invalid. TreeMap will have the same problem as far as I can tell. It's really not that complicated to have a linked list, if Java was a real programming language we could even make it an aspect sort of thing via multiple inheritance :) 3) Ok was (Author: sershe): 1) These classes existed before, I will see if they can be merged. 2) I don't think removing intrinsic linked list is a good idea, there are problems with java LinkedList - it's hard to keep pointer to an element, as soon as list is modified all the iterators are invalidated. So for example reading multiple columns RG by RG, keeping the pointer to end of last RG (where nexr RG, that may be in separate buffer due to SARG filtering, starts) becomes impossible - as soon as one column read replaces buffer chunk with cache chunk, pointers for all other columns become invalid. TreeMap will have the same problem as far as I can tell. It's really not that complicated to have a linked list, if Java was a real programming language we could even make it an aspect sort of thing via multiple inheritance :) 3) Ok LLAP: clean up ORC dependencies part 1 -- Key: HIVE-11259 URL: https://issues.apache.org/jira/browse/HIVE-11259 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11259.patch Before there's storage handler module, we can clean some things up -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11253) Move SearchArgument and VectorizedRowBatch classes to storage-api.
[ https://issues.apache.org/jira/browse/HIVE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635951#comment-14635951 ] Prasanth Jayachandran edited comment on HIVE-11253 at 7/21/15 10:35 PM: LGTM, +1. Pending tests. was (Author: prasanth_j): LGTM, +1 Move SearchArgument and VectorizedRowBatch classes to storage-api. -- Key: HIVE-11253 URL: https://issues.apache.org/jira/browse/HIVE-11253 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 2.0.0 Attachments: HIVE-11253.patch, HIVE-11253.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
[ https://issues.apache.org/jira/browse/HIVE-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11321: - Attachment: HIVE-11321.patch This patch pulls all of the configuration knobs into OrcConf including all of the table properties. Move OrcFile.OrcTableProperties from OrcFile into OrcConf. -- Key: HIVE-11321 URL: https://issues.apache.org/jira/browse/HIVE-11321 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 2.0.0 Attachments: HIVE-11321.patch We should pull all of the configuration/table property knobs into a single list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
[ https://issues.apache.org/jira/browse/HIVE-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11321: - Attachment: (was: HIVE-11321.patch) Move OrcFile.OrcTableProperties from OrcFile into OrcConf. -- Key: HIVE-11321 URL: https://issues.apache.org/jira/browse/HIVE-11321 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 2.0.0 We should pull all of the configuration/table property knobs into a single list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11321) Move OrcFile.OrcTableProperties from OrcFile into OrcConf.
[ https://issues.apache.org/jira/browse/HIVE-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11321: - Attachment: HIVE-11321.patch Sorry, I uploaded the wrong version of the patch. Move OrcFile.OrcTableProperties from OrcFile into OrcConf. -- Key: HIVE-11321 URL: https://issues.apache.org/jira/browse/HIVE-11321 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 2.0.0 Attachments: HIVE-11321.patch We should pull all of the configuration/table property knobs into a single list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11305) LLAP: Hybrid Map-join cache returns invalid data
[ https://issues.apache.org/jira/browse/HIVE-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11305: Attachment: HIVE-11305.patch LLAP: Hybrid Map-join cache returns invalid data - Key: HIVE-11305 URL: https://issues.apache.org/jira/browse/HIVE-11305 Project: Hive Issue Type: Sub-task Affects Versions: llap Environment: TPC-DS 200 scale data Reporter: Gopal V Assignee: Sergey Shelukhin Priority: Critical Fix For: llap Attachments: HIVE-11305.patch, q55-test.sql Start a 1-node LLAP cluster with 16 executors and run attached test-case on the single node instance. {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.loadHashTable(VectorMapJoinCommonOperator.java:648) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:314) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1104) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1108) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) ... 17 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()
[ https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635982#comment-14635982 ] Hive QA commented on HIVE-11316: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746398/HIVE-11316.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9245 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4685/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4685/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4685/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12746398 - PreCommit-HIVE-TRUNK-Build Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree() -- Key: HIVE-11316 URL: https://issues.apache.org/jira/browse/HIVE-11316 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11316.1.patch, HIVE-11316.2.patch, HIVE-11316.3.patch HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira is suppose to alter the string memoization to use a different data structure that doesn't duplicate any part of the string so that we do not run into OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11332) Unicode table comments do not work
[ https://issues.apache.org/jira/browse/HIVE-11332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11332: Description: Noticed by accident. {noformat} select ' ', count(*) from moo; Query ID = sershe_20150721190413_979e1b6f-86d6-436f-b8e6-d6785b9d3b83 Total jobs = 1 Launching Job 1 out of 1 [snip] OK 0 Time taken: 13.347 seconds, Fetched: 1 row(s) hive ALTER TABLE moo SET TBLPROPERTIES ('comment' = ' '); OK Time taken: 0.292 seconds hive desc extended moo; OK i int Detailed Table Information Table(tableName:moo, dbName:default, owner:sershe, createTime:1437519787, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)], location:hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/moo, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{last_modified_time=1437519883, totalSize=0, numRows=-1, rawDataSize=-1, COLUMN_STATS_ACCURATE=false, numFiles=0, transient_lastDdlTime=1437519883, comment=?? , last_modified_by=sershe}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.347 seconds, Fetched: 3 row(s) {noformat} was: Noticed by accident. {noformat} select ' ', count(*) from moo; Query ID = sershe_20150721190413_979e1b6f-86d6-436f-b8e6-d6785b9d3b83 Total jobs = 1 Launching Job 1 out of 1 [snip] OK 0 Time taken: 13.347 seconds, Fetched: 1 row(s) hive ALTER TABLE moo SET TBLPROPERTIES ('comment' = '' '); OK Time taken: 0.292 seconds hive desc extended moo; OK i int Detailed Table Information Table(tableName:moo, dbName:default, owner:sershe, createTime:1437519787, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)], location:hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/moo, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{last_modified_time=1437519883, totalSize=0, numRows=-1, rawDataSize=-1, COLUMN_STATS_ACCURATE=false, numFiles=0, transient_lastDdlTime=1437519883, comment=?? , last_modified_by=sershe}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.347 seconds, Fetched: 3 row(s) {noformat} Unicode table comments do not work -- Key: HIVE-11332 URL: https://issues.apache.org/jira/browse/HIVE-11332 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Noticed by accident. {noformat} select ' ', count(*) from moo; Query ID = sershe_20150721190413_979e1b6f-86d6-436f-b8e6-d6785b9d3b83 Total jobs = 1 Launching Job 1 out of 1 [snip] OK 0 Time taken: 13.347 seconds, Fetched: 1 row(s) hive ALTER TABLE moo SET TBLPROPERTIES ('comment' = ' '); OK Time taken: 0.292 seconds hive desc extended moo; OK i int Detailed Table InformationTable(tableName:moo, dbName:default, owner:sershe, createTime:1437519787, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:i, type:int, comment:null)], location:hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/moo, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{last_modified_time=1437519883, totalSize=0, numRows=-1, rawDataSize=-1, COLUMN_STATS_ACCURATE=false, numFiles=0,
[jira] [Commented] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by
[ https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636005#comment-14636005 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11172: -- Yes, will be a good candidate for backporting to 1.2.1. [~sushanth] what do you think. Thanks Hari Vectorization wrong results for aggregate query with where clause without group by -- Key: HIVE-11172 URL: https://issues.apache.org/jira/browse/HIVE-11172 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Reporter: Yi Zhang Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Fix For: 2.0.0 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, HIVE-11172.3.patch create table testvec(id int, dt int, greg_dt string) stored as orc; insert into table testvec values (1,20150330, '2015-03-30'), (2,20150301, '2015-03-01'), (3,20150502, '2015-05-02'), (4,20150401, '2015-04-01'), (5,20150313, '2015-03-13'), (6,20150314, '2015-03-14'), (7,20150404, '2015-04-04'); hive select dt, greg_dt from testvec where id=5; OK 20150313 2015-03-13 Time taken: 4.435 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive set hive.map.aggr; hive.map.aggr=true hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-30 hive set hive.vectorized.execution.enabled=false; hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by
[ https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636010#comment-14636010 ] Sushanth Sowmyan commented on HIVE-11172: - Incorrect results makes it a good candidate for a backport to branch-1.2. Pedantic note : 1.2.1 has already shipped. This would go in 1.2.2, please set fix version appropriately after committing. Vectorization wrong results for aggregate query with where clause without group by -- Key: HIVE-11172 URL: https://issues.apache.org/jira/browse/HIVE-11172 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Reporter: Yi Zhang Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Fix For: 2.0.0 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, HIVE-11172.3.patch create table testvec(id int, dt int, greg_dt string) stored as orc; insert into table testvec values (1,20150330, '2015-03-30'), (2,20150301, '2015-03-01'), (3,20150502, '2015-05-02'), (4,20150401, '2015-04-01'), (5,20150313, '2015-03-13'), (6,20150314, '2015-03-14'), (7,20150404, '2015-04-04'); hive select dt, greg_dt from testvec where id=5; OK 20150313 2015-03-13 Time taken: 4.435 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive set hive.map.aggr; hive.map.aggr=true hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-30 hive set hive.vectorized.execution.enabled=false; hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by
[ https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11172: Fix Version/s: 1.3.0 Vectorization wrong results for aggregate query with where clause without group by -- Key: HIVE-11172 URL: https://issues.apache.org/jira/browse/HIVE-11172 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Reporter: Yi Zhang Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, HIVE-11172.3.patch create table testvec(id int, dt int, greg_dt string) stored as orc; insert into table testvec values (1,20150330, '2015-03-30'), (2,20150301, '2015-03-01'), (3,20150502, '2015-05-02'), (4,20150401, '2015-04-01'), (5,20150313, '2015-03-13'), (6,20150314, '2015-03-14'), (7,20150404, '2015-04-04'); hive select dt, greg_dt from testvec where id=5; OK 20150313 2015-03-13 Time taken: 4.435 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive set hive.map.aggr; hive.map.aggr=true hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-30 hive set hive.vectorized.execution.enabled=false; hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11209) Clean up dependencies in HiveDecimalWritable
[ https://issues.apache.org/jira/browse/HIVE-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636017#comment-14636017 ] Prasanth Jayachandran commented on HIVE-11209: -- Vint object is no longer reusable with this patch. Vint allocation in inner loop will hit performance right? Clean up dependencies in HiveDecimalWritable Key: HIVE-11209 URL: https://issues.apache.org/jira/browse/HIVE-11209 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 2.0.0 Attachments: HIVE-11209.patch, HIVE-11209.patch, HIVE-11209.patch, HIVE-11209.patch Currently HiveDecimalWritable depends on: * org.apache.hadoop.hive.serde2.ByteStream * org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils * org.apache.hadoop.hive.serde2.typeinfo.HiveDecimalUtils since we need HiveDecimalWritable for the decimal VectorizedColumnBatch, breaking these dependencies will improve things. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11152) Swapping join inputs in ASTConverter
[ https://issues.apache.org/jira/browse/HIVE-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636039#comment-14636039 ] Jason Dere commented on HIVE-11152: --- FYI, this was listed as being fixed on 1.2.2, but I do not see any such commit on either branch-1 or branch-1.2. Swapping join inputs in ASTConverter Key: HIVE-11152 URL: https://issues.apache.org/jira/browse/HIVE-11152 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 2.0.0 Attachments: HIVE-11152.02.patch, HIVE-11152.patch We want that multijoin optimization in SemanticAnalyzer always kicks in when CBO is enabled (if possible). For that, we may need to swap the join inputs when we return from CBO through the Hive AST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11316) Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree()
[ https://issues.apache.org/jira/browse/HIVE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11316: - Attachment: HIVE-11316.2.patch Use datastructure that doesnt duplicate any part of string for ASTNode::toStringTree() -- Key: HIVE-11316 URL: https://issues.apache.org/jira/browse/HIVE-11316 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11316.1.patch, HIVE-11316.2.patch HIVE-11281 uses an approach to memoize toStringTree() for ASTNode. This jira is suppose to alter the string memoization to use a different data structure that doesn't duplicate any part of the string so that we do not run into OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634593#comment-14634593 ] wangchangchun commented on HIVE-11055: -- Hello, I download Hive sourcecode of the latest version in July 17。I want to test PL/HQL function 。 I built Hive package, and beeline hive is ok Now。 But hplsql can not use。 ERROR is like this: ./hplsql -e select * from hive_tables Unhandled exception in PL/HQL java.lang.Exception: Unknown connection profile: null at org.apache.hive.hplsql.Conn.getConnection(Conn.java:127) at org.apache.hive.hplsql.Conn.executeQuery(Conn.java:55) at org.apache.hive.hplsql.Exec.executeQuery(Exec.java:412) at org.apache.hive.hplsql.Exec.executeQuery(Exec.java:421) at org.apache.hive.hplsql.Select.select(Select.java:73) And I can not find hplsql-site.xml in the package. Can you tell me where the problem is ? HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x
[ https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11304: - Attachment: HIVE-11304.patch This is WIP patch to trigger test run with new log4j2 properties. Migrate to Log4j2 from Log4j 1.x Key: HIVE-11304 URL: https://issues.apache.org/jira/browse/HIVE-11304 Project: Hive Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11304.patch Log4J2 has some great benefits and can benefit hive significantly. Some notable features include 1) Performance (parametrized logging, performance when logging is disabled etc.) More details can be found here https://logging.apache.org/log4j/2.x/performance.html 2) RoutingAppender - Route logs to different log files based on MDC context (useful for HS2, LLAP etc.) 3) Asynchronous logging This is an umbrella jira to track changes related to Log4j2 migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query
[ https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11303: --- Affects Version/s: 2.0.0 Getting Tez LimitExceededException after dag execution on large query - Key: HIVE-11303 URL: https://issues.apache.org/jira/browse/HIVE-11303 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.2.0, 1.3.0, 2.0.0 Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11303.1.patch {noformat} 2015-07-17 18:18:11,830 INFO [main]: counters.Limits (Limits.java:ensureInitialized(59)) - Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - Failed to execute tez graph. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104) at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567) at org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634905#comment-14634905 ] Dmitry Tolpeko commented on HIVE-11055: --- Note that hive/hplsql/src/main/resources/hplsql-site.xml file appears after you apply HIVE-11254 patch. HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query
[ https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634624#comment-14634624 ] Hive QA commented on HIVE-11303: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746194/HIVE-11303.1.patch {color:green}SUCCESS:{color} +1 9228 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4675/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4675/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4675/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12746194 - PreCommit-HIVE-TRUNK-Build Getting Tez LimitExceededException after dag execution on large query - Key: HIVE-11303 URL: https://issues.apache.org/jira/browse/HIVE-11303 Project: Hive Issue Type: Bug Components: Tez Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11303.1.patch {noformat} 2015-07-17 18:18:11,830 INFO [main]: counters.Limits (Limits.java:ensureInitialized(59)) - Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - Failed to execute tez graph. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104) at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567) at org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634988#comment-14634988 ] wangchangchun commented on HIVE-11055: -- sorry, I can not find it in hive/hplsql/src/main/resources HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x
[ https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634997#comment-14634997 ] Hive QA commented on HIVE-11304: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12746277/HIVE-11304.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9231 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testNegativeCliDriver_case_with_row_sequence org.apache.hive.hplsql.TestHplsqlLocal.testException org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithVerboseMode org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithTez.testFetchResultsOfLogWithVerboseMode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4678/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4678/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4678/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12746277 - PreCommit-HIVE-TRUNK-Build Migrate to Log4j2 from Log4j 1.x Key: HIVE-11304 URL: https://issues.apache.org/jira/browse/HIVE-11304 Project: Hive Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11304.patch Log4J2 has some great benefits and can benefit hive significantly. Some notable features include 1) Performance (parametrized logging, performance when logging is disabled etc.) More details can be found here https://logging.apache.org/log4j/2.x/performance.html 2) RoutingAppender - Route logs to different log files based on MDC context (useful for HS2, LLAP etc.) 3) Asynchronous logging This is an umbrella jira to track changes related to Log4j2 migration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-11055: -- Attachment: hplsql-site.xml HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635014#comment-14635014 ] wangchangchun commented on HIVE-11055: -- I put an hplsql-site.xml in conf dir. The content is from HIVE-11254.4.patch ./hplsql -e select * from hive_tables; Unhandled exception in PL/HQL java.lang.Exception: Unknown connection profile: hiveconn at org.apache.hive.hplsql.Conn.getConnection(Conn.java:127) at org.apache.hive.hplsql.Conn.executeQuery(Conn.java:55) at org.apache.hive.hplsql.Exec.executeQuery(Exec.java:412) at org.apache.hive.hplsql.Exec.executeQuery(Exec.java:421) at org.apache.hive.hplsql.Select.select(Select.java:73) ``` configuration property namehplsql.conn.default/name valuehiveconn/value descriptionThe default connection profile/description /property property namehiveconn/name valueorg.apache.hive.jdbc.HiveDriver;jdbc:hive2:///value descriptionHiveServer2 JDBC connection (embedded mode)/description /property property namehplsql.conn.init.hiveconn/name value set mapred.job.queue.name=default; set hive.execution.engine=mr; use default; /value descriptionStatements for execute after connection to the database/description /property ``` HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634989#comment-14634989 ] wangchangchun commented on HIVE-11055: -- sorry, I can not find it in hive/hplsql/src/main/resources HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635013#comment-14635013 ] Dmitry Tolpeko commented on HIVE-11055: --- I attached it to the JIRA. HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch, hplsql-site.xml There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8128) Improve Parquet Vectorization
[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634790#comment-14634790 ] Dong Chen commented on HIVE-8128: - Patch V6 updated. Review board: https://reviews.apache.org/r/36540/ The patch depends on the new Parquet vector API at https://github.com/nezihyigitbasi-nflx/parquet-mr/commits/vector In this POC, the general workflow was done, two tests passed, and INT type was supported. The idea is that we create a VectorizedParquetRecordReader, which wraps the ParquetRecordReader provided by Parquet. Then in its next() method, we convert Parquet RowBatch to Hive VectorizedRowBatch. This is the first patch. To complete vectorization feature, we still have work to do in follow-up: 1) support all data types 2) support partition column 3) add more test cases 4) evaluate performance on a real cluster. Improve Parquet Vectorization - Key: HIVE-8128 URL: https://issues.apache.org/jira/browse/HIVE-8128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen Fix For: parquet-branch Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch NO PRECOMMIT TESTS We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. As discussed in PARQUET-131, we will work out Hive POC based on the new Parquet vectorized API, and then finish the implementation after finilized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11311) Avoid dumping AST tree String in Explain unless necessary
[ https://issues.apache.org/jira/browse/HIVE-11311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634827#comment-14634827 ] Jesus Camacho Rodriguez commented on HIVE-11311: [~hsubramaniyan], could you review this one? Thanks Avoid dumping AST tree String in Explain unless necessary - Key: HIVE-11311 URL: https://issues.apache.org/jira/browse/HIVE-11311 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11311.patch Currently, the AST tree String representation is created even if it is not used; we should dump it only if we are going to use it (explain extended). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11325) Infinite loop in HiveHFileOutputFormat
[ https://issues.apache.org/jira/browse/HIVE-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634774#comment-14634774 ] Harsh J commented on HIVE-11325: I missed the srcDir declare, which'd explain the loop (we're walking). I'm checking why it doesn't abort at the family directory. Infinite loop in HiveHFileOutputFormat -- Key: HIVE-11325 URL: https://issues.apache.org/jira/browse/HIVE-11325 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 1.0.0 Reporter: Harsh J No idea why {{hbase_handler_bulk.q}} does not catch this if its being run regularly in Hive builds, but here's the gist of the issue: The condition at https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164 indicates that we will infinitely loop until we find a file whose last path component (the name) is equal to the column family name. In execution, however, the iteration enters an actual infinite loop cause the file we end up considering as the srcDir name, is actually the region file, whose name will never match the family name. This is an example of the IPC the listing loop of a 100% progress task gets stuck in: {code} 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 1: Call - cdh54.vm/172.16.29.132:8020: getListing {src: /user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c startAfter: needLocation: false} 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive sending #510346 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got value #510346 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getListing took 0ms 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 1: Response - cdh54.vm/172.16.29.132:8020: getListing {dirList { partialListing { fileType: IS_FILE path: length: 863 permission { perm: 4600 } owner: hive group: hive modification_time: 1437454718130 access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }} {code} The path we are getting out of the listing results is {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c}}, but instead of checking the path's parent {{family}} we're instead looping infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} cause it does not match {{family}}. It stays in the infinite loop therefore, until the MR framework kills it away due to an idle task timeout (and then since the subsequent task attempts fail outright, the job fails). While doing a {{getPath().getParent()}} will resolve that, is that infinite loop even necessary? Especially given the fact that we throw exceptions if there are no entries or there is more than one entry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8128) Improve Parquet Vectorization
[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-8128: Attachment: HIVE-8128.6-parquet.patch Improve Parquet Vectorization - Key: HIVE-8128 URL: https://issues.apache.org/jira/browse/HIVE-8128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen Fix For: parquet-branch Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch, HIVE-8128.6-parquet.patch NO PRECOMMIT TESTS We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. As discussed in PARQUET-131, we will work out Hive POC based on the new Parquet vectorized API, and then finish the implementation after finilized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634801#comment-14634801 ] Bing Li commented on HIVE-3: Thank you, [~tfriedr] With your fix in HIVE-11326, all the queries could work now. ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. --- Key: HIVE-3 URL: https://issues.apache.org/jira/browse/HIVE-3 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 1.2.1 Environment: Reporter: Shiroy Pigarez Assignee: Pengcheng Xiong Priority: Critical I was trying to perform some column statistics using hive as per the documentation https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive and was encountering the following errors: Seems like a bug. Can you look into this? Thanks in advance. -- HIVE table {noformat} hive create table people_part( name string, address string) PARTITIONED BY (dob string, nationality varchar(2)) row format delimited fields terminated by '\t'; {noformat} --Analyze table with partition dob and nationality with FOR COLUMNS {noformat} hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) COMPUTE STATISTICS FOR COLUMNS; NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) at org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) at org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:95 cannot recognize input near 'EOF' 'EOF' 'EOF' in column name {noformat} --Analyze table with partition dob and nationality values specified with FOR COLUMNS {noformat} hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') COMPUTE STATISTICS FOR COLUMNS; NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) at org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) at org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) at
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634835#comment-14634835 ] wangchangchun commented on HIVE-11055: -- I use git apply command ,and patch my sourcecode OK. Afart built a package, I install it.But still have the problem. the first problem, the bin dir of apache-hive-2.0.0-SNAPSHOT-bin.tar.gz does not have hplsql. So I copy bin dir from master branch. the second problem, the lib dir of apache-hive-2.0.0-SNAPSHOT-bin.tar.gz does not have hive-hplsql-2.0.0-SNAPSHOT.jar and antlr-runtime-4.5.jar. problem above I have solved. the last problem, can not find hplsql-site.xml and hive-site.xml. Can you tell me how to solve the problem? HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Fix For: 2.0.0 Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch, HIVE-11055.3.patch, HIVE-11055.4.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query
[ https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634627#comment-14634627 ] Gopal V commented on HIVE-11303: [~jdere]: +1 LGTM. Getting Tez LimitExceededException after dag execution on large query - Key: HIVE-11303 URL: https://issues.apache.org/jira/browse/HIVE-11303 Project: Hive Issue Type: Bug Components: Tez Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11303.1.patch {noformat} 2015-07-17 18:18:11,830 INFO [main]: counters.Limits (Limits.java:ensureInitialized(59)) - Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - Failed to execute tez graph. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104) at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567) at org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query
[ https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11303: --- Affects Version/s: 1.3.0 Getting Tez LimitExceededException after dag execution on large query - Key: HIVE-11303 URL: https://issues.apache.org/jira/browse/HIVE-11303 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.2.0, 1.3.0 Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11303.1.patch {noformat} 2015-07-17 18:18:11,830 INFO [main]: counters.Limits (Limits.java:ensureInitialized(59)) - Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - Failed to execute tez graph. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104) at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567) at org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11303) Getting Tez LimitExceededException after dag execution on large query
[ https://issues.apache.org/jira/browse/HIVE-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11303: --- Affects Version/s: 1.2.0 Getting Tez LimitExceededException after dag execution on large query - Key: HIVE-11303 URL: https://issues.apache.org/jira/browse/HIVE-11303 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.2.0, 1.3.0 Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11303.1.patch {noformat} 2015-07-17 18:18:11,830 INFO [main]: counters.Limits (Limits.java:ensureInitialized(59)) - Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2015-07-17 18:18:11,841 ERROR [main]: exec.Task (TezTask.java:execute(189)) - Failed to execute tez graph. org.apache.tez.common.counters.LimitExceededException: Too many counters: 1201 max=1200 at org.apache.tez.common.counters.Limits.checkCounters(Limits.java:87) at org.apache.tez.common.counters.Limits.incrCounters(Limits.java:94) at org.apache.tez.common.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:76) at org.apache.tez.common.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:93) at org.apache.tez.common.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:104) at org.apache.tez.dag.api.DagTypeConverters.convertTezCountersFromProto(DagTypeConverters.java:567) at org.apache.tez.dag.api.client.DAGStatus.getDAGCounters(DAGStatus.java:148) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:175) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1673) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1432) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1213) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1064) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11113) ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work.
[ https://issues.apache.org/jira/browse/HIVE-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634646#comment-14634646 ] Thomas Friedrich commented on HIVE-3: - [~libing], [~pxiong], the error message Column [ds] was not found in schema! is a different problem than reported in this JIRA originally and is specific to parquet tables with partitions and not limited to analyze. I opened HIVE-11326 for the parquet problem and added steps to reproduce the problem over there. ANALYZE TABLE .. COMPUTE STATISTICS FOR COLUMNS does not work. --- Key: HIVE-3 URL: https://issues.apache.org/jira/browse/HIVE-3 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 1.2.1 Environment: Reporter: Shiroy Pigarez Assignee: Pengcheng Xiong Priority: Critical I was trying to perform some column statistics using hive as per the documentation https://cwiki.apache.org/confluence/display/Hive/Column+Statistics+in+Hive and was encountering the following errors: Seems like a bug. Can you look into this? Thanks in advance. -- HIVE table {noformat} hive create table people_part( name string, address string) PARTITIONED BY (dob string, nationality varchar(2)) row format delimited fields terminated by '\t'; {noformat} --Analyze table with partition dob and nationality with FOR COLUMNS {noformat} hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality) COMPUTE STATISTICS FOR COLUMNS; NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) at org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) at org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:803) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:95 cannot recognize input near 'EOF' 'EOF' 'EOF' in column name {noformat} --Analyze table with partition dob and nationality values specified with FOR COLUMNS {noformat} hive ANALYZE TABLE people_part PARTITION(dob='2015-10-2',nationality='IE') COMPUTE STATISTICS FOR COLUMNS; NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11627) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:40215) at org.apache.hadoop.hive.ql.parse.HiveParser.columnName(HiveParser.java:33351) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameList(HiveParser.java:33219) at org.apache.hadoop.hive.ql.parse.HiveParser.analyzeStatement(HiveParser.java:17764) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2369) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)