[jira] [Reopened] (HIVE-9605) Remove parquet nested objects from wrapper writable objects
[ https://issues.apache.org/jira/browse/HIVE-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reopened HIVE-9605: Sorry [~spena], seems master has lots of failed cases. I will commit after it comes back to normal. Remove parquet nested objects from wrapper writable objects --- Key: HIVE-9605 URL: https://issues.apache.org/jira/browse/HIVE-9605 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Sergio Peña Assignee: Sergio Peña Fix For: parquet-branch Attachments: HIVE-9605.3.patch, HIVE-9605.4.patch, HIVE-9605.5.patch, HIVE-9605.6.patch Parquet nested types are using an extra wrapper object (ArrayWritable) as a wrapper of map and list elements. This extra object is not needed and causing unnecessary memory allocations. An example of code is on HiveCollectionConverter.java: {noformat} public void end() { parent.set(index, wrapList(new ArrayWritable( Writable.class, list.toArray(new Writable[list.size()]; } {noformat} This object is later unwrapped on AbstractParquetMapInspector, i.e.: {noformat} final Writable[] mapContainer = ((ArrayWritable) data).get(); final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get(); for (final Writable obj : mapArray) { ... } {noformat} We should get rid of this wrapper object to save time and memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()
[ https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8458: - Description: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE was: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE Potential null dereference in Utilities#clearWork() --- Key: HIVE-8458 URL: https://issues.apache.org/jira/browse/HIVE-8458 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-8458_001.patch {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10802) Table join query with some constant field in select fails
[ https://issues.apache.org/jira/browse/HIVE-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-10802: --- Assignee: Aihua Xu Table join query with some constant field in select fails - Key: HIVE-10802 URL: https://issues.apache.org/jira/browse/HIVE-10802 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu The following query fails: {noformat} create table tb1 (year string, month string); create table tb2(month string); select unix_timestamp(a.year) from (select * from tb1 where year='2001') a join tb2 b on (a.month=b.month); {noformat} with the exception {noformat} Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175) {noformat} The issue seems to be: during the query compilation, the field in the select should be replaced with the constant when some UDFs are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559231#comment-14559231 ] Dayue Gao commented on HIVE-10792: -- I don't think the failed tests are related to this patch. [~gopalv] [~thejas] [~sershe] Could you have a look at this? Should it be backported to old releases? PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Fix For: 1.2.1 Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, HIVE-10792.test.sql Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10702) COUNT(*) over windowing 'x preceding and y preceding' doesn't work properly
[ https://issues.apache.org/jira/browse/HIVE-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559118#comment-14559118 ] Aihua Xu commented on HIVE-10702: - Thanks for reviewing and checkingin, Ashutosh. COUNT(*) over windowing 'x preceding and y preceding' doesn't work properly --- Key: HIVE-10702 URL: https://issues.apache.org/jira/browse/HIVE-10702 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Reporter: Aihua Xu Assignee: Aihua Xu Fix For: 1.3.0 Attachments: HIVE-10702.patch Given the following query: {noformat} select ts, f, count(*) over (partition by ts order by f rows between 2 preceding and 1 preceding) from over10k limit 100; {noformat} It returns the result {noformat} 2013-03-01 09:11:58.70307 3.170 2013-03-01 09:11:58.70307 10.89 0 2013-03-01 09:11:58.70307 14.54 1 2013-03-01 09:11:58.70307 14.78 1 2013-03-01 09:11:58.70307 17.85 1 2013-03-01 09:11:58.70307 20.61 1 2013-03-01 09:11:58.70307 28.69 1 2013-03-01 09:11:58.70307 29.22 1 2013-03-01 09:11:58.70307 31.17 1 2013-03-01 09:11:58.70307 38.35 1 2013-03-01 09:11:58.70307 38.61 1 {noformat} Mostly it should return count 2 rather than 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases
[ https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10811: --- Attachment: HIVE-10811.02.patch RelFieldTrimmer throws NoSuchElementException in some cases --- Key: HIVE-10811 URL: https://issues.apache.org/jira/browse/HIVE-10811 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, HIVE-10811.patch RelFieldTrimmer runs into NoSuchElementException in some cases. Stack trace: {noformat} Exception in thread main java.lang.AssertionError: Internal error: While invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' at org.apache.calcite.util.Util.newInternal(Util.java:743) at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) at org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269) at org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536) ... 32 more Caused by: java.lang.AssertionError: Internal error: While invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' at org.apache.calcite.util.Util.newInternal(Util.java:743) at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) at org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269) at
[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560010#comment-14560010 ] Sergey Shelukhin commented on HIVE-10778: - I am clearing the map after build rather than just removing cacheMapWork/etc. parts pertaining to global map, in case I missed somewhere during build that it could be used. LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2 Key: HIVE-10778 URL: https://issues.apache.org/jira/browse/HIVE-10778 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10778.01.patch, HIVE-10778.patch, llap-hs2-heap.png 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-9069: -- Attachment: HIVE-9069.14.patch Simplify filter predicates for CBO -- Key: HIVE-9069 URL: https://issues.apache.org/jira/browse/HIVE-9069 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Jesus Camacho Rodriguez Fix For: 0.14.1 Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.patch Simplify predicates for disjunctive predicates so that can get pushed down to the scan. Looks like this is still an issue, some of the filters can be pushed down to the scan. {code} set hive.cbo.enable=true set hive.stats.fetch.column.stats=true set hive.exec.dynamic.partition.mode=nonstrict set hive.tez.auto.reducer.parallelism=true set hive.auto.convert.join.noconditionaltask.size=32000 set hive.exec.reducers.bytes.per.reducer=1 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager set hive.support.concurrency=false set hive.tez.exec.print.summary=true explain select substr(r_reason_desc,1,20) as r ,avg(ws_quantity) wq ,avg(wr_refunded_cash) ref ,avg(wr_fee) fee from web_sales, web_returns, web_page, customer_demographics cd1, customer_demographics cd2, customer_address, date_dim, reason where web_sales.ws_web_page_sk = web_page.wp_web_page_sk and web_sales.ws_item_sk = web_returns.wr_item_sk and web_sales.ws_order_number = web_returns.wr_order_number and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk and reason.r_reason_sk = web_returns.wr_reason_sk and ( ( cd1.cd_marital_status = 'M' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = '4 yr Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 100.00 and 150.00 ) or ( cd1.cd_marital_status = 'D' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Primary' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 50.00 and 100.00 ) or ( cd1.cd_marital_status = 'U' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Advanced Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 150.00 and 200.00 ) ) and ( ( ca_country = 'United States' and ca_state in ('KY', 'GA', 'NM') and ws_net_profit between 100 and 200 ) or ( ca_country = 'United States' and ca_state in ('MT', 'OR', 'IN') and ws_net_profit between 150 and 300 ) or ( ca_country = 'United States' and ca_state in ('WI', 'MO', 'WV') and ws_net_profit between 50 and 250 ) ) group by r_reason_desc order by r, wq, ref, fee limit 100 OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 9 - Map 1 (BROADCAST_EDGE) Reducer 3 - Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) Reducer 4 - Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) Reducer 5 - Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) Reducer 6 - Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) Reducer 7 - Reducer 6 (SIMPLE_EDGE) Reducer 8 - Reducer 7 (SIMPLE_EDGE) DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 Vertices: Map 1 Map Operator Tree: TableScan alias: web_page filterExpr: wp_web_page_sk is not null (type: boolean) Statistics: Num rows: 4602 Data size: 2696178 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: wp_web_page_sk is not null (type: boolean) Statistics: Num rows: 4602 Data size: 18408 Basic stats:
[jira] [Updated] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false
[ https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10807: Attachment: HIVE-10807.2.patch Invalidate basic stats for insert queries if autogather=false - Key: HIVE-10807 URL: https://issues.apache.org/jira/browse/HIVE-10807 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Attachments: HIVE-10807.2.patch, HIVE-10807.patch if stats.autogather=false leads to incorrect basic stats in case of insert statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286
[ https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-10819: -- Attachment: HIVE-10819.3.patch The test failures doesn't seems related. Reattach the patch and test again. SearchArgumentImpl for Timestamp is broken by HIVE-10286 Key: HIVE-10819 URL: https://issues.apache.org/jira/browse/HIVE-10819 Project: Hive Issue Type: Bug Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 1.2.1 Attachments: HIVE-10819.1.patch, HIVE-10819.2.patch, HIVE-10819.3.patch The work around for kryo bug for Timestamp is accidentally removed by HIVE-10286. Need to bring it back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560009#comment-14560009 ] Sergey Shelukhin commented on HIVE-10778: - [~vikram.dixit] can you take a look? LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2 Key: HIVE-10778 URL: https://issues.apache.org/jira/browse/HIVE-10778 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10778.01.patch, HIVE-10778.patch, llap-hs2-heap.png 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
[ https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559982#comment-14559982 ] Swarnim Kulkarni commented on HIVE-10809: - [~selinazh] Minor feedback: 1. Instead of throwing so many exceptions in the throws, could simply add in a throws Exception to make the test simpler. 2. To make the test stronger, any way we can test that the directories actually existed before the query ran? HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories -- Key: HIVE-10809 URL: https://issues.apache.org/jira/browse/HIVE-10809 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 1.2.0 Reporter: Selina Zhang Assignee: Selina Zhang Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch When static partition is added through HCatStorer or HCatWriter {code} JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader(); STORE JoinedData INTO 'selina.joined_events_e' USING org.apache.hive.hcatalog.pig.HCatStorer('author=selina'); {code} The table directory looks like {noformat} drwx-- - selinaz users 0 2015-05-22 21:19 /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798 drwx-- - selinaz users 0 2015-05-22 21:19 /user/selinaz/joined_events_e/author=selina {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
[ https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560106#comment-14560106 ] Hive QA commented on HIVE-10809: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735409/HIVE-10809.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8974 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4047/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4047/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4047/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12735409 - PreCommit-HIVE-TRUNK-Build HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories -- Key: HIVE-10809 URL: https://issues.apache.org/jira/browse/HIVE-10809 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 1.2.0 Reporter: Selina Zhang Assignee: Selina Zhang Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch When static partition is added through HCatStorer or HCatWriter {code} JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader(); STORE JoinedData INTO 'selina.joined_events_e' USING org.apache.hive.hcatalog.pig.HCatStorer('author=selina'); {code} The table directory looks like {noformat} drwx-- - selinaz users 0 2015-05-22 21:19 /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798 drwx-- - selinaz users 0 2015-05-22 21:19 /user/selinaz/joined_events_e/author=selina {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10804) CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for limit 0 does not work
[ https://issues.apache.org/jira/browse/HIVE-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10804: --- Attachment: HIVE-10804.01.patch CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for limit 0 does not work - Key: HIVE-10804 URL: https://issues.apache.org/jira/browse/HIVE-10804 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10804.01.patch {code} explain select key,value from src order by key limit 0 POSTHOOK: type: QUERY STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: src Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: string), value (type: string) outputColumnNames: key, value Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: key (type: string) sort order: + Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE value expressions: value (type: string) Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: string), VALUE.value (type: string) outputColumnNames: key, value Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases
[ https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559967#comment-14559967 ] Jesus Camacho Rodriguez commented on HIVE-10811: The method {{trimChild}} trims the child columns keeping 1) the columns needed from the parent (fieldsUsed), and 2) the columns on which collations were specified. Currently, the method takes the collations from the parent relation rel, which seems incorrect as we end up referencing column positions that do not exist in the child input. Thus, I changed the method to take the collations from the relation on which we are pruning the columns i.e. input. RelFieldTrimmer throws NoSuchElementException in some cases --- Key: HIVE-10811 URL: https://issues.apache.org/jira/browse/HIVE-10811 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, HIVE-10811.patch RelFieldTrimmer runs into NoSuchElementException in some cases. Stack trace: {noformat} Exception in thread main java.lang.AssertionError: Internal error: While invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' at org.apache.calcite.util.Util.newInternal(Util.java:743) at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) at org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269) at org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536) ... 32 more Caused by: java.lang.AssertionError: Internal error: While invoking method
[jira] [Updated] (HIVE-10788) Change sort_array to support non-primitive types
[ https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10788: Attachment: HIVE-10788.1.patch Like HIVE-10427, UNION type is a little bit tricky to support. Will make that as a follow-up JIRA. Change sort_array to support non-primitive types Key: HIVE-10788 URL: https://issues.apache.org/jira/browse/HIVE-10788 Project: Hive Issue Type: Bug Components: UDF Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10788.1.patch Currently {{sort_array}} only support primitive types. As we already support comparison between non-primitive types, it makes sense to remove this restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9105) Hive-0.13 select constant in union all followed by group by gives wrong result
[ https://issues.apache.org/jira/browse/HIVE-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-9105. --- Resolution: Fixed Hive-0.13 select constant in union all followed by group by gives wrong result -- Key: HIVE-9105 URL: https://issues.apache.org/jira/browse/HIVE-9105 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong select '1' as key from srcpart where ds=2008-04-09 UNION all SELECT key from srcpart where ds=2008-04-09 and hr=11 ) tab group by key will generate wrong results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560083#comment-14560083 ] Mostafa Mokhtar commented on HIVE-10704: [~apivovarov] Ditto for this one. Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559313#comment-14559313 ] Hari Sekhon commented on HIVE-10304: If just recommending to users to use Beeline instead of Hive CLI that is fine but if Hive 1 CLI was every removed that would cause major headaches to users such as myself who have lots of scripts and programs that make calls to Hive CLI and rewriting things that already work fine for years is not cool. In fact it's the opposite of cool. Add deprecation message to HiveCLI -- Key: HIVE-10304 URL: https://issues.apache.org/jira/browse/HIVE-10304 Project: Hive Issue Type: Sub-task Components: CLI Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC1.2 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch As Beeline is now the recommended command line tool to Hive, we should add a message to HiveCLI to indicate that it is deprecated and redirect them to Beeline. This is not suggesting to remove HiveCLI for now, but just a helpful direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-10277) Unable to process Comment line '--' in HIVE-1.1.0
[ https://issues.apache.org/jira/browse/HIVE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reopened HIVE-10277: Patch is reverted because of test failures. Please resubmit patch if problem remains. Unable to process Comment line '--' in HIVE-1.1.0 - Key: HIVE-10277 URL: https://issues.apache.org/jira/browse/HIVE-10277 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0 Reporter: Kaveen Raajan Assignee: Chinna Rao Lalam Priority: Minor Labels: hive Fix For: 1.3.0 Attachments: HIVE-10277-1.patch, HIVE-10277.2.patch, HIVE-10277.patch I tried to use comment line (*--*) in HIVE-1.1.0 grunt shell like, ~hive--this is comment line~ ~hiveshow tables;~ I got error like {quote} NoViableAltException(-1@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java: 1020) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:19 9) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:16 6) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2 07) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754 ) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 2:0 cannot recognize input near 'EOF' 'EOF' 'EO F' {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559395#comment-14559395 ] Gopal V commented on HIVE-10792: [~gaodayue]: not sure if the patch can retain PPD for map-joins. {{alias.size == 1}} might jump out of PPD cases even the dummy operators are the present. PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Fix For: 1.2.1 Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, HIVE-10792.test.sql Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559325#comment-14559325 ] Xuefu Zhang edited comment on HIVE-10304 at 5/26/15 4:23 PM: - The final decision will be replacing Hive CLI's implementation with beeline (HIVE-10511). You still have the script file (hive.sh). Since you have so many scripts using Hive CLI. When HIVE-10511 is in place, it would be great if you can test it with your script. Thanks. was (Author: xuefuz): The final decision will be replacing Hive CLI's implementation with beeline (HIVE-10511). You still have the script. Since you have so many scripts using Hive CLI. When HIVE-10511 is in place, it would be great if you can test it with your script. Thanks. Add deprecation message to HiveCLI -- Key: HIVE-10304 URL: https://issues.apache.org/jira/browse/HIVE-10304 Project: Hive Issue Type: Sub-task Components: CLI Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC1.2 Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch As Beeline is now the recommended command line tool to Hive, we should add a message to HiveCLI to indicate that it is deprecated and redirect them to Beeline. This is not suggesting to remove HiveCLI for now, but just a helpful direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)