[jira] [Resolved] (HIVE-9605) Remove parquet nested objects from wrapper writable objects
[ https://issues.apache.org/jira/browse/HIVE-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu resolved HIVE-9605. Resolution: Fixed Fix Version/s: (was: parquet-branch) 1.2.1 Committed to the master. Thanks [~spena] > Remove parquet nested objects from wrapper writable objects > --- > > Key: HIVE-9605 > URL: https://issues.apache.org/jira/browse/HIVE-9605 > Project: Hive > Issue Type: Sub-task >Affects Versions: 0.14.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 1.2.1 > > Attachments: HIVE-9605.3.patch, HIVE-9605.4.patch, HIVE-9605.5.patch, > HIVE-9605.6.patch > > > Parquet nested types are using an extra wrapper object (ArrayWritable) as a > wrapper of map and list elements. This extra object is not needed and causing > unnecessary memory allocations. > An example of code is on HiveCollectionConverter.java: > {noformat} > public void end() { > parent.set(index, wrapList(new ArrayWritable( > Writable.class, list.toArray(new Writable[list.size()]; > } > {noformat} > This object is later unwrapped on AbstractParquetMapInspector, i.e.: > {noformat} > final Writable[] mapContainer = ((ArrayWritable) data).get(); > final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get(); > for (final Writable obj : mapArray) { > ... > } > {noformat} > We should get rid of this wrapper object to save time and memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10684) Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files
[ https://issues.apache.org/jira/browse/HIVE-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560524#comment-14560524 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-10684: -- [~Ferd] Sorry for the delay, I have a few minor comments here. 1. For public int executeCmd (), can you make this a private function. Also the return value from this function is not used; i think it should either be removed or have it logged for debugging purpose. 2. {code} +// Files.copy(new File("/tmp/" + clazzV2FileName.toString()), dist); {code} The above line can be removed. Thanks Hari > Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary > jar files > -- > > Key: HIVE-10684 > URL: https://issues.apache.org/jira/browse/HIVE-10684 > Project: Hive > Issue Type: Bug > Components: Tests >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10684.1.patch, HIVE-10684.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10777) LLAP: add pre-fragment and per-table cache details
[ https://issues.apache.org/jira/browse/HIVE-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560494#comment-14560494 ] Lefty Leverenz commented on HIVE-10777: --- Doc note: This adds *hive.llap.io.orc.time.counters* to HiveConf.java so I'm linking to HIVE-9850 for documentation. > LLAP: add pre-fragment and per-table cache details > -- > > Key: HIVE-10777 > URL: https://issues.apache.org/jira/browse/HIVE-10777 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: llap > > Attachments: HIVE-10777.01.patch, HIVE-10777.02.patch, > HIVE-10777.WIP.patch, HIVE-10777.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560489#comment-14560489 ] Mostafa Mokhtar commented on HIVE-10704: Done, command line rb was giving me some headache. > Errors in Tez HashTableLoader when estimated table size is 0 > > > Key: HIVE-10704 > URL: https://issues.apache.org/jira/browse/HIVE-10704 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, > HIVE-10704.3.patch > > > Couple of issues: > - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all > tables, the largest small table selection is wrong and could select the large > table (which results in NPE) > - The memory estimates can either divide-by-zero, or allocate 0 memory if the > table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-10793: -- Labels: TODOC1.3 (was: ) > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Labels: TODOC1.3 > Fix For: 1.3.0 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560459#comment-14560459 ] Lefty Leverenz commented on HIVE-10793: --- Doc note: This changes the default value of *hive.mapjoin.optimized.hashtable.wbsize* so the wiki needs to be updated (with version information). * [Configuration Properties -- hive.mapjoin.optimized.hashtable.wbsize | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapjoin.optimized.hashtable.wbsize] The patch also makes minor changes to the definitions of *hive.mapjoin.hybridgrace.minwbsize* and *hive.mapjoin.hybridgrace.minnumpartitions* which do not need any doc changes. > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.3.0 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.
[ https://issues.apache.org/jira/browse/HIVE-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560457#comment-14560457 ] Ashutosh Chauhan commented on HIVE-10716: - [~gopalv] I need to verify, but my guess is https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java#L80 is coming in play here. > Fold case/when udf for expression involving nulls in filter operator. > - > > Key: HIVE-10716 > URL: https://issues.apache.org/jira/browse/HIVE-10716 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 1.2.1 > > Attachments: HIVE-10716.patch > > > From HIVE-10636 comments, more folding is possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.
[ https://issues.apache.org/jira/browse/HIVE-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10716: Affects Version/s: (was: 1.3.0) 1.2.0 > Fold case/when udf for expression involving nulls in filter operator. > - > > Key: HIVE-10716 > URL: https://issues.apache.org/jira/browse/HIVE-10716 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 1.2.1 > > Attachments: HIVE-10716.patch > > > From HIVE-10636 comments, more folding is possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286
[ https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560438#comment-14560438 ] Hive QA commented on HIVE-10819: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735439/HIVE-10819.3.patch {color:red}ERROR:{color} -1 due to 59 failed/errored test(s), 8974 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_null_element org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_multi_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_optional_elements org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_required_elements org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_structs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_groups org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_arrays_of_ints org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_maps org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_nested_complex org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_read_backward_compatible_files org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAmbiguousSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroPrimitiveInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testHiveRequiredGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testMultiFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewOptionalGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewRequiredGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftPrimitiveInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testUnannotatedListOfGroups org.apache.hadoop.hive.ql.io.parquet.TestDataWritableWriter.testSimpleType org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testDoubleMapWithStructValue org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testMapWithComplexKey org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testNestedMap org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalArray org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalIntArray org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOptionalPrimitive org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapRequiredPrimitive org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testRegularMap org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testRegularMap org.apache.hadoop.hive.ql.i
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560431#comment-14560431 ] Alexander Pivovarov commented on HIVE-10704: Mostafa, can you check RB link? I'm not sure it shows HIVE-10704.3.patch > Errors in Tez HashTableLoader when estimated table size is 0 > > > Key: HIVE-10704 > URL: https://issues.apache.org/jira/browse/HIVE-10704 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, > HIVE-10704.3.patch > > > Couple of issues: > - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all > tables, the largest small table selection is wrong and could select the large > table (which results in NPE) > - The memory estimates can either divide-by-zero, or allocate 0 memory if the > table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560432#comment-14560432 ] Alexander Pivovarov commented on HIVE-10704: Mostafa, can you check RB link? I'm not sure it shows HIVE-10704.3.patch > Errors in Tez HashTableLoader when estimated table size is 0 > > > Key: HIVE-10704 > URL: https://issues.apache.org/jira/browse/HIVE-10704 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, > HIVE-10704.3.patch > > > Couple of issues: > - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all > tables, the largest small table selection is wrong and could select the large > table (which results in NPE) > - The memory estimates can either divide-by-zero, or allocate 0 memory if the > table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10830) First column of a Hive table created with LazyBinaryColumnarSerDe is not read properly
[ https://issues.apache.org/jira/browse/HIVE-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lovekesh bansal updated HIVE-10830: --- Description: 1. create external table platdev.table_target ( id INT, message String, state string, date string ) partitioned by (country string) row format delimited fields terminated by ',' stored as RCFILE location '/user/nikgupta/table_target' ; 2. insert overwrite table platdev.table_target partition(country) select case when id=13 then 15 else id end,message,state,date,country from platdev.table_base2 where id between 13 and 16; \n" say now my table is written by default using LazyBinaryColumnarSerDe and has the following data: 15 thirteendelhi 2-12-2014 india 14 fourteendelhi 1-1-2014india 15 fifteen florida 1-1-2014us 16 sixteen florida 2-12-2014 us Now If I try to read the data with a mapreduce program, with map function as given below: public void map(LongWritable key, BytesRefArrayWritable val, Context context) throws IOException, InterruptedException { for (int i = 0; i < val.size(); i++) { BytesRefWritable bytesRefread = val.get(i); byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength()); Text currentCellStr = new Text(currentCell); System.out.println("rowText="+currentCellStr ); } context.write(NullWritable.get(), bytes); } and set the following job configuration parameters:- job.setInputFormatClass(RCFileMapReduceInputFormat.class); job.setOutputFormatClass(RCFileMapReduceOutputFormat.class); jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5) The output shown is as follows: (LazyBinaryColumnarSerDe) rowText= rowText=fifteen rowText=goa rowText=2-2- rowText=us But exactly the same case using the (ColumnarSerDe) explicitly in the table definition would give the following output: rowText=1 rowText=fifteen rowText=goa rowText=2-2- rowText=us Point is that First column value is missing in the case of LazyBinaryColumnarSerDe. was: 1. create external table platdev.table_target ( id INT, message String, state string, date string ) partitioned by (country string) row format delimited fields terminated by ',' stored as RCFILE location '/user/nikgupta/table_target' ; 2. insert overwrite table platdev.table_target partition(country) select case when id=13 then 15 else id end,message,state,date,country from platdev.table_base2 where id between 13 and 16; \n" say now my table has the following data: 15 thirteendelhi 2-12-2014 india 14 fourteendelhi 1-1-2014india 15 fifteen florida 1-1-2014us 16 sixteen florida 2-12-2014 us Now If I try to read the data with a mapreduce program, with map function as given below: public void map(LongWritable key, BytesRefArrayWritable val, Context context) throws IOException, InterruptedException { for (int i = 0; i < val.size(); i++) { BytesRefWritable bytesRefread = val.get(i); byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength()); Text currentCellStr = new Text(currentCell); System.out.println("rowText="+currentCellStr ); } context.write(NullWritable.get(), bytes); } and set the following job configuration parameters:- job.setInputFormatClass(RCFileMapReduceInputFormat.class); job.setOutputFormatClass(RCFileMapReduceOutputFormat.class); jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5) The output shown is as follows: rowText= rowText=fifteen rowText=goa rowText=2-2- rowText=us But exactly the same case using the ColumnarSerDe explicitly in the table definition would give the following output: rowText=1 rowText=fifteen rowText=goa rowText=2-2- rowText=us Point is that First column value is missing. > First column of a Hive table created with LazyBinaryColumnarSerDe is not read > properly > -- > > Key: HIVE-10830 > URL: https://issues.apache.org/jira/browse/HIVE-10830 > Project: Hive > Issue Type: Bug >Reporter: lovekesh bansal > > 1. create external table platdev.table_target ( id INT, message String, state > string, date string ) partitioned by (country string) row format delimited > fields terminated by ',' stored as RCFILE location > '/user/nikgupta/table_target' ; > 2. insert overwrite table platdev.table_target partition(country) select case > when id=13 then 15 else id end,message,state,date,country from > platdev.table_base2 where id between 13 and 16; \n" > say
[jira] [Commented] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.
[ https://issues.apache.org/jira/browse/HIVE-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560403#comment-14560403 ] Gopal V commented on HIVE-10716: The easiest fix to the problem seems to be an additional filter expr to produce an AND() {code} hive> explain select avg(ss_sold_date_sk) from store_sales where (case ss_sold_date when '1998-01-02' then 1 else null end)=1; Map Operator Tree: TableScan alias: store_sales filterExpr: CASE (ss_sold_date) WHEN ('1998-01-02') THEN (true) ELSE (null) END (type: int) Statistics: Num rows: 2474913 Data size: 9899654 Basic stats: COMPLETE Column stats: COMPLETE {code} vs {code} hive> explain select avg(ss_sold_date_sk) from store_sales where (case ss_sold_date when '1998-01-02' then 1 else null end)=1 and ss_sold_time_Sk > 0; Map Operator Tree: TableScan alias: store_sales filterExpr: ((ss_sold_date = '1998-01-02') and (ss_sold_time_sk > 0)) (type: boolean) Statistics: Num rows: 1237456 Data size: 9899654 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (ss_sold_time_sk > 0) (type: boolean) {code} [~ashutoshc]: any idea why the extra filter helps in fixing the PPD case? > Fold case/when udf for expression involving nulls in filter operator. > - > > Key: HIVE-10716 > URL: https://issues.apache.org/jira/browse/HIVE-10716 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Affects Versions: 1.3.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-10716.patch > > > From HIVE-10636 comments, more folding is possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10716) Fold case/when udf for expression involving nulls in filter operator.
[ https://issues.apache.org/jira/browse/HIVE-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560400#comment-14560400 ] Gopal V commented on HIVE-10716: [~ashutoshc]: LGTM - +1 for the count(1) case, but it looks really odd that the {{TableScan::filterExpr}} is not getting folded for this. TableScan FilterExpr is populated before this folding happens, so it might just be an optimization ordering issue? {code} hive> explain select count(1) from store_sales where (case ss_sold_date when 'x' then 1 else null end)=1; STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) DagName: gopal_20150526214205_80c41d84-1694-47e9-ab24-144f8007b187:13 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: CASE (ss_sold_date) WHEN ('x') THEN (true) ELSE (null) END (type: int) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Filter Operator predicate: (ss_sold_date = 'x') (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Select Operator Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Group By Operator aggregations: count(1) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 93 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: bigint) Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) {code} > Fold case/when udf for expression involving nulls in filter operator. > - > > Key: HIVE-10716 > URL: https://issues.apache.org/jira/browse/HIVE-10716 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Affects Versions: 1.3.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-10716.patch > > > From HIVE-10636 comments, more folding is possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false
[ https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10807: Attachment: HIVE-10807.3.patch > Invalidate basic stats for insert queries if autogather=false > - > > Key: HIVE-10807 > URL: https://issues.apache.org/jira/browse/HIVE-10807 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 1.2.0 >Reporter: Gopal V >Assignee: Ashutosh Chauhan > Attachments: HIVE-10807.2.patch, HIVE-10807.3.patch, HIVE-10807.patch > > > if stats.autogather=false leads to incorrect basic stats in case of insert > statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10813) Fix current test failures after HIVE-8769
[ https://issues.apache.org/jira/browse/HIVE-10813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-10813. - Resolution: Fixed Fix Version/s: 1.3.0 Fixed by HIVE-10812 > Fix current test failures after HIVE-8769 > - > > Key: HIVE-10813 > URL: https://issues.apache.org/jira/browse/HIVE-10813 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 1.3.0 > > > We fix the stats annotation in HIVE-8769. However, there are some newly > committed test cases (e.g., udf_sha1.q) that are not covered in the patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation
[ https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10812: Component/s: Statistics Physical Optimizer > Scaling PK/FK's selectivity for stats annotation > > > Key: HIVE-10812 > URL: https://issues.apache.org/jira/browse/HIVE-10812 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer, Statistics >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 1.2.1 > > Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch, > HIVE-10812.03.patch > > > Right now, the computation of the selectivity of FK side based on PK side > does not take into consideration of the range of FK and the range of PK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false
[ https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560370#comment-14560370 ] Hive QA commented on HIVE-10807: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735432/HIVE-10807.2.patch {color:red}ERROR:{color} -1 due to 59 failed/errored test(s), 8974 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_null_element org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_multi_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_optional_elements org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_required_elements org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_structs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_groups org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_arrays_of_ints org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_maps org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_nested_complex org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_read_backward_compatible_files org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAmbiguousSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroPrimitiveInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testHiveRequiredGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testMultiFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewOptionalGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewRequiredGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftPrimitiveInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testUnannotatedListOfGroups org.apache.hadoop.hive.ql.io.parquet.TestDataWritableWriter.testSimpleType org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testDoubleMapWithStructValue org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testMapWithComplexKey org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testNestedMap org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalArray org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalIntArray org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOptionalPrimitive org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapRequiredPrimitive org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testRegularMap org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testRegularMap org.apache.hadoop.hi
[jira] [Updated] (HIVE-686) add UDF substring_index
[ https://issues.apache.org/jira/browse/HIVE-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-686: - Attachment: HIVE-686.1.patch patch #1 - derive substring_index from GenericUDF - add Junit and qtest tests > add UDF substring_index > --- > > Key: HIVE-686 > URL: https://issues.apache.org/jira/browse/HIVE-686 > Project: Hive > Issue Type: New Feature > Components: UDF >Reporter: Namit Jain >Assignee: Alexander Pivovarov > Attachments: HIVE-686.1.patch, HIVE-686.patch, HIVE-686.patch > > > SUBSTRING_INDEX(str,delim,count) > Returns the substring from string str before count occurrences of the > delimiter delim. If count is positive, everything to the left of the final > delimiter (counting from the left) is returned. If count is negative, > everything to the right of the final delimiter (counting from the right) is > returned. SUBSTRING_INDEX() performs a case-sensitive match when searching > for delim. > Examples: > {code} > SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3); > --www.mysql.com > SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2); > --www.mysql > SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1); > --www > SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0); > --'' > SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1); > --com > SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2); > --mysql.com > SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3); > --www.mysql.com > {code} > {code} > --#delim does not exist in str > SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1); > --www.mysql.com > --#delim is 2 chars > SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2); > --www||mysql > --#delim is empty string > SELECT SUBSTRING_INDEX('www.mysql.com', '', 2); > --'' > --#str is empty string > SELECT SUBSTRING_INDEX('', '.', 2); > --'' > {code} > {code} > --#null params > SELECT SUBSTRING_INDEX(null, '.', 1); > --null > SELECT SUBSTRING_INDEX('www.mysql.com', null, 1); > --null > SELECT SUBSTRING_INDEX('www.mysql.com', '.', null); > --null > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10550) Dynamic RDD caching optimization for HoS.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560342#comment-14560342 ] Hive QA commented on HIVE-10550: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735497/HIVE-10550.5-spark.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8721 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/866/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/866/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-866/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12735497 - PreCommit-HIVE-SPARK-Build > Dynamic RDD caching optimization for HoS.[Spark Branch] > --- > > Key: HIVE-10550 > URL: https://issues.apache.org/jira/browse/HIVE-10550 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Attachments: HIVE-10550.1-spark.patch, HIVE-10550.1.patch, > HIVE-10550.2-spark.patch, HIVE-10550.3-spark.patch, HIVE-10550.4-spark.patch, > HIVE-10550.5-spark.patch > > > A Hive query may try to scan the same table multi times, like self-join, > self-union, or even share the same subquery, [TPC-DS > Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql] > is an example. As you may know that, Spark support cache RDD data, which > mean Spark would put the calculated RDD data in memory and get the data from > memory directly for next time, this avoid the calculation cost of this > RDD(and all the cost of its dependencies) at the cost of more memory usage. > Through analyze the query context, we should be able to understand which part > of query could be shared, so that we can reuse the cached RDD in the > generated Spark job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases
[ https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560335#comment-14560335 ] Laljo John Pullokkaran commented on HIVE-10811: --- Why do we need to keep the fields from input that is part of the collation but is not used by parent. If no operators from parent refer to that column then i don't see how preserving sort order is helpful. > RelFieldTrimmer throws NoSuchElementException in some cases > --- > > Key: HIVE-10811 > URL: https://issues.apache.org/jira/browse/HIVE-10811 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, > HIVE-10811.patch > > > RelFieldTrimmer runs into NoSuchElementException in some cases. > Stack trace: > {noformat} > Exception in thread "main" java.lang.AssertionError: Internal error: While > invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult > org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' > at org.apache.calcite.util.Util.newInternal(Util.java:743) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768) > at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536) > ... 32 more > Caused by: java.lang.AssertionError: Internal error: While invoking method > 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult > org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.ut
[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560332#comment-14560332 ] Laljo John Pullokkaran commented on HIVE-9069: -- [~jcamachorodriguez] In extractCommonOperands for a disjunction if any operand doesn't have any of the reductionCondition then we can short circuit and bail out. > Simplify filter predicates for CBO > -- > > Key: HIVE-9069 > URL: https://issues.apache.org/jira/browse/HIVE-9069 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Jesus Camacho Rodriguez > Fix For: 0.14.1 > > Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, > HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, > HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, > HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, > HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, > HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.patch > > > Simplify predicates for disjunctive predicates so that can get pushed down to > the scan. > Looks like this is still an issue, some of the filters can be pushed down to > the scan. > {code} > set hive.cbo.enable=true > set hive.stats.fetch.column.stats=true > set hive.exec.dynamic.partition.mode=nonstrict > set hive.tez.auto.reducer.parallelism=true > set hive.auto.convert.join.noconditionaltask.size=32000 > set hive.exec.reducers.bytes.per.reducer=1 > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager > set hive.support.concurrency=false > set hive.tez.exec.print.summary=true > explain > select substr(r_reason_desc,1,20) as r >,avg(ws_quantity) wq >,avg(wr_refunded_cash) ref >,avg(wr_fee) fee > from web_sales, web_returns, web_page, customer_demographics cd1, > customer_demographics cd2, customer_address, date_dim, reason > where web_sales.ws_web_page_sk = web_page.wp_web_page_sk >and web_sales.ws_item_sk = web_returns.wr_item_sk >and web_sales.ws_order_number = web_returns.wr_order_number >and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 >and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk >and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk >and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk >and reason.r_reason_sk = web_returns.wr_reason_sk >and >( > ( > cd1.cd_marital_status = 'M' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = '4 yr Degree' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 100.00 and 150.00 > ) >or > ( > cd1.cd_marital_status = 'D' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = 'Primary' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 50.00 and 100.00 > ) >or > ( > cd1.cd_marital_status = 'U' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = 'Advanced Degree' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 150.00 and 200.00 > ) >) >and >( > ( > ca_country = 'United States' > and > ca_state in ('KY', 'GA', 'NM') > and ws_net_profit between 100 and 200 > ) > or > ( > ca_country = 'United States' > and > ca_state in ('MT', 'OR', 'IN') > and ws_net_profit between 150 and 300 > ) > or > ( > ca_country = 'United States' > and > ca_state in ('WI', 'MO', 'WV') > and ws_net_profit between 50 and 250 > ) >) > group by r_reason_desc > order by r, wq, ref, fee > limit 100 > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 9 <- Map 1 (BROADCAST_EDGE) > Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) > Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) > Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) > Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 > (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) > Reducer 7 <- Reducer 6 (SIMPLE_EDGE) > Reducer 8 <- Reducer 7 (SIMPLE_EDGE) > DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: web_page > filterExpr: wp_web_page_sk is not n
[jira] [Updated] (HIVE-10829) ATS hook fails for explainTask
[ https://issues.apache.org/jira/browse/HIVE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10829: --- Attachment: HIVE-10829.01.patch > ATS hook fails for explainTask > -- > > Key: HIVE-10829 > URL: https://issues.apache.org/jira/browse/HIVE-10829 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-10829.01.patch > > > Commands: > create table idtable(id string); > create table ctastable as select * from idtable; > With ATS hook: > 2015-05-22 18:54:47,092 INFO [ATS Logger 0]: hooks.ATSHook > (ATSHook.java:run(136)) - Failed to submit plan to ATS: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:589) > at > org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:576) > at > org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:821) > at > org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:965) > at > org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:219) > at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:120) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7723: -- Attachment: HIVE-7723.11.patch > Explain plan for complex query with lots of partitions is slow due to > in-efficient collection used to find a matching ReadEntity > > > Key: HIVE-7723 > URL: https://issues.apache.org/jira/browse/HIVE-7723 > Project: Hive > Issue Type: Bug > Components: CLI, Physical Optimizer >Affects Versions: 0.13.1 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, > HIVE-7723.11.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, > HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, > HIVE-7723.9.patch > > > Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it > showed that ReadEntity.equals is taking ~40% of the CPU. > ReadEntity.equals is called from the snippet below. > Again and again the set is iterated over to get the actual match, a HashMap > is a better option for this case as Set doesn't have a Get method. > Also for ReadEntity equals is case-insensitive while hash is , which is an > undesired behavior. > {code} > public static ReadEntity addInput(Set inputs, ReadEntity > newInput) { > // If the input is already present, make sure the new parent is added to > the input. > if (inputs.contains(newInput)) { > for (ReadEntity input : inputs) { > if (input.equals(newInput)) { > if ((newInput.getParents() != null) && > (!newInput.getParents().isEmpty())) { > input.getParents().addAll(newInput.getParents()); > input.setDirect(input.isDirect() || newInput.isDirect()); > } > return input; > } > } > assert false; > } else { > inputs.add(newInput); > return newInput; > } > // make compile happy > return null; > } > {code} > This is the query used : > {code} > select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number > ,cs1.b_streen_name ,cs1.b_city > ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city > ,cs1.c_zip ,cs1.syear ,cs1.cnt > ,cs1.s1 ,cs1.s2 ,cs1.s3 > ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt > from > (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as > store_name > ,s_zip as store_zip ,ad1.ca_street_number as b_street_number > ,ad1.ca_street_name as b_streen_name > ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as > c_street_number > ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip > as c_zip > ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) > as cnt > ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 > ,sum(ss_coupon_amt) as s3 > FROM store_sales > JOIN store_returns ON store_sales.ss_item_sk = > store_returns.sr_item_sk and store_sales.ss_ticket_number = > store_returns.sr_ticket_number > JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk > JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk > JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk > JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk > JOIN store ON store_sales.ss_store_sk = store.s_store_sk > JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= > cd1.cd_demo_sk > JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = > cd2.cd_demo_sk > JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk > JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = > hd1.hd_demo_sk > JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = > hd2.hd_demo_sk > JOIN customer_address ad1 ON store_sales.ss_addr_sk = > ad1.ca_address_sk > JOIN customer_address ad2 ON customer.c_current_addr_sk = > ad2.ca_address_sk > JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk > JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk > JOIN item ON store_sales.ss_item_sk = item.i_item_sk > JOIN > (select cs_item_sk > ,sum(cs_ext_list_price) as > sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund > from catalog_sales JOIN catalog_returns > ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk > and catalog_sales.cs_order_number = catalog_returns.cr_order_number > group by cs_item_sk > having > sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) > cs_ui >
[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7723: -- Attachment: (was: HIVE-7723.11.patch) > Explain plan for complex query with lots of partitions is slow due to > in-efficient collection used to find a matching ReadEntity > > > Key: HIVE-7723 > URL: https://issues.apache.org/jira/browse/HIVE-7723 > Project: Hive > Issue Type: Bug > Components: CLI, Physical Optimizer >Affects Versions: 0.13.1 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, > HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, > HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch > > > Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it > showed that ReadEntity.equals is taking ~40% of the CPU. > ReadEntity.equals is called from the snippet below. > Again and again the set is iterated over to get the actual match, a HashMap > is a better option for this case as Set doesn't have a Get method. > Also for ReadEntity equals is case-insensitive while hash is , which is an > undesired behavior. > {code} > public static ReadEntity addInput(Set inputs, ReadEntity > newInput) { > // If the input is already present, make sure the new parent is added to > the input. > if (inputs.contains(newInput)) { > for (ReadEntity input : inputs) { > if (input.equals(newInput)) { > if ((newInput.getParents() != null) && > (!newInput.getParents().isEmpty())) { > input.getParents().addAll(newInput.getParents()); > input.setDirect(input.isDirect() || newInput.isDirect()); > } > return input; > } > } > assert false; > } else { > inputs.add(newInput); > return newInput; > } > // make compile happy > return null; > } > {code} > This is the query used : > {code} > select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number > ,cs1.b_streen_name ,cs1.b_city > ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city > ,cs1.c_zip ,cs1.syear ,cs1.cnt > ,cs1.s1 ,cs1.s2 ,cs1.s3 > ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt > from > (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as > store_name > ,s_zip as store_zip ,ad1.ca_street_number as b_street_number > ,ad1.ca_street_name as b_streen_name > ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as > c_street_number > ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip > as c_zip > ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) > as cnt > ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 > ,sum(ss_coupon_amt) as s3 > FROM store_sales > JOIN store_returns ON store_sales.ss_item_sk = > store_returns.sr_item_sk and store_sales.ss_ticket_number = > store_returns.sr_ticket_number > JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk > JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk > JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk > JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk > JOIN store ON store_sales.ss_store_sk = store.s_store_sk > JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= > cd1.cd_demo_sk > JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = > cd2.cd_demo_sk > JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk > JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = > hd1.hd_demo_sk > JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = > hd2.hd_demo_sk > JOIN customer_address ad1 ON store_sales.ss_addr_sk = > ad1.ca_address_sk > JOIN customer_address ad2 ON customer.c_current_addr_sk = > ad2.ca_address_sk > JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk > JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk > JOIN item ON store_sales.ss_item_sk = item.i_item_sk > JOIN > (select cs_item_sk > ,sum(cs_ext_list_price) as > sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund > from catalog_sales JOIN catalog_returns > ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk > and catalog_sales.cs_order_number = catalog_returns.cr_order_number > group by cs_item_sk > having > sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) > cs_ui > ON store_sa
[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560306#comment-14560306 ] Hive QA commented on HIVE-9069: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735433/HIVE-9069.14.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8975 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_7 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4049/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4049/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4049/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12735433 - PreCommit-HIVE-TRUNK-Build > Simplify filter predicates for CBO > -- > > Key: HIVE-9069 > URL: https://issues.apache.org/jira/browse/HIVE-9069 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Jesus Camacho Rodriguez > Fix For: 0.14.1 > > Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, > HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, > HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, > HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, > HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, > HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.patch > > > Simplify predicates for disjunctive predicates so that can get pushed down to > the scan. > Looks like this is still an issue, some of the filters can be pushed down to > the scan. > {code} > set hive.cbo.enable=true > set hive.stats.fetch.column.stats=true > set hive.exec.dynamic.partition.mode=nonstrict > set hive.tez.auto.reducer.parallelism=true > set hive.auto.convert.join.noconditionaltask.size=32000 > set hive.exec.reducers.bytes.per.reducer=1 > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager > set hive.support.concurrency=false > set hive.tez.exec.print.summary=true > explain > select substr(r_reason_desc,1,20) as r >,avg(ws_quantity) wq >,avg(wr_refunded_cash) ref >,avg(wr_fee) fee > from web_sales, web_returns, web_page, customer_demographics cd1, > customer_demographics cd2, customer_address, date_dim, reason > where web_sales.ws_web_page_sk = web_page.wp_web_page_sk >and web_sales.ws_item_sk = web_returns.wr_item_sk >and web_sales.ws_order_number = web_returns.wr_order_number >and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 >and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk >and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk >and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk >and reason.r_reason_sk = web_returns.wr_reason_sk >and >( > ( > cd1.cd_marital_status = 'M' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = '4 yr Degree' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 100.00 and 150.00 > ) >or > ( > cd1.cd_marital_status = 'D' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = 'Primary' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 50.00 and 100.00 > ) >or > ( > cd1.cd_marital_status = 'U' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = 'Advanced Degree' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 150.00 and 200.00 > ) >) >and >( > ( > ca_country = 'United States' > and > ca_state in ('KY', 'GA', 'NM') > and ws_net_profit between 100 and 200 > ) > or > ( > ca_country = 'United States' > and > ca_state in ('MT', 'OR', 'IN') > and ws_net_profit between 150 and 300 > ) > or > ( > ca_country = 'United States' > and > ca_state in ('WI', 'MO', 'WV') > and ws_net_profit between 50 and 250 >
[jira] [Updated] (HIVE-10689) HS2 metadata api calls should use HiveAuthorizer interface for authorization
[ https://issues.apache.org/jira/browse/HIVE-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10689: - Attachment: HIVE-10689.1.patch > HS2 metadata api calls should use HiveAuthorizer interface for authorization > > > Key: HIVE-10689 > URL: https://issues.apache.org/jira/browse/HIVE-10689 > Project: Hive > Issue Type: Bug > Components: Authorization, SQLStandardAuthorization >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-10689.1.patch > > > java.sql.DataBaseMetadata apis in jdbc api result in calls to HS2 metadata > api's and their execution is via separate Hive Operation implementations, > that don't use the Hive Driver class. Invocation of these api's should also > be authorized using the HiveAuthorizer api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10761) Create codahale-based metrics system for Hive
[ https://issues.apache.org/jira/browse/HIVE-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10761: - Attachment: HIVE-10761.2.patch Some loose ends, like make it take in configured list of reporters, and add end-to-end unit test for Metastore metrics, latest patch should be ready for review. > Create codahale-based metrics system for Hive > - > > Key: HIVE-10761 > URL: https://issues.apache.org/jira/browse/HIVE-10761 > Project: Hive > Issue Type: New Feature > Components: Diagnosability >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-10761.2.patch, HIVE-10761.patch, hms-metrics.json > > > There is a current Hive metrics system that hooks up to a JMX reporting, but > all its measurements, models are custom. > This is to make another metrics system that will be based on Codahale (ie > yammer, dropwizard), which has the following advantage: > * Well-defined metric model for frequently-needed metrics (ie JVM metrics) > * Well-defined measurements for all metrics (ie max, mean, stddev, mean_rate, > etc), > * Built-in reporting frameworks like JMX, Console, Log, JSON webserver > It is used for many projects, including several Apache projects like Oozie. > Overall, monitoring tools should find it easier to understand these common > metric, measurement, reporting models. > The existing metric subsystem will be kept and can be enabled if backward > compatibility is desired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10828) Insert...values for fewer number of columns fail
[ https://issues.apache.org/jira/browse/HIVE-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10828: -- Description: Schema on insert queries with fewer number of columns fails with below error message {noformat} ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} *Steps to reproduce:* set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.enforce.bucketing=true; drop table if exists table1; create table table1 (a int, b string, c string) partitioned by (bkt int) clustered by (a) into 2 buckets stored as orc tblproperties ('transactional'='true'); insert into table_1 partition (bkt) (b, a, bkt) values ('part one', 1, 1), ('part one', 2, 1), ('part two', 3, 2), ('part three', 4, 3); was: Schema on insert queries with fewer number of columns fails with below error message ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoo
[jira] [Commented] (HIVE-10828) Insert...values for fewer number of columns fail
[ https://issues.apache.org/jira/browse/HIVE-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560290#comment-14560290 ] Eugene Koifman commented on HIVE-10828: --- Simpler repro case {noformat} set hive.enforce.bucketing=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.cbo.enable=false; drop table if exists acid_partitioned; create table acid_partitioned (a int, c string) partitioned by (p int) clustered by (a) into 1 buckets; insert into acid_partitioned partition (p) (a,p) values(1,1); {noformat} above example disables CBO because it causes additional issues. will file separate ticket for that > Insert...values for fewer number of columns fail > > > Key: HIVE-10828 > URL: https://issues.apache.org/jira/browse/HIVE-10828 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Aswathy Chellammal Sreekumar >Assignee: Eugene Koifman > > Schema on insert queries with fewer number of columns fails with below error > message > ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: > NullPointerException null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) > at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) > at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > *Steps to reproduce:* > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.enforce.bucketing=true; > drop table if exists table1; > create table table1 (a int, b string, c string) >partitioned by (bkt int) >clustered by (a) into 2 buckets >stored as orc >tblproperties ('transactional'='true'); > insert into table_1 partition (bkt) (b, a, bkt) values > ('part one', 1, 1), ('part one', 2, 1), ('part two', 3, 2), ('part > three', 4, 3); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10550) Dynamic RDD caching optimization for HoS.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-10550: - Attachment: HIVE-10550.5-spark.patch > Dynamic RDD caching optimization for HoS.[Spark Branch] > --- > > Key: HIVE-10550 > URL: https://issues.apache.org/jira/browse/HIVE-10550 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Chengxiang Li >Assignee: Chengxiang Li > Attachments: HIVE-10550.1-spark.patch, HIVE-10550.1.patch, > HIVE-10550.2-spark.patch, HIVE-10550.3-spark.patch, HIVE-10550.4-spark.patch, > HIVE-10550.5-spark.patch > > > A Hive query may try to scan the same table multi times, like self-join, > self-union, or even share the same subquery, [TPC-DS > Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql] > is an example. As you may know that, Spark support cache RDD data, which > mean Spark would put the calculated RDD data in memory and get the data from > memory directly for next time, this avoid the calculation cost of this > RDD(and all the cost of its dependencies) at the cost of more memory usage. > Through analyze the query context, we should be able to understand which part > of query could be shared, so that we can reuse the cached RDD in the > generated Spark job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10828) Insert...values for fewer number of columns fail
[ https://issues.apache.org/jira/browse/HIVE-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aswathy Chellammal Sreekumar updated HIVE-10828: Description: Schema on insert queries with fewer number of columns fails with below error message ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Steps to reproduce: set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.enforce.bucketing=true; drop table if exists table1; create table table1 (a int, b string, c string) partitioned by (bkt int) clustered by (a) into 2 buckets stored as orc tblproperties ('transactional'='true'); insert into table_1 partition (bkt) (b, a, bkt) values ('part one', 1, 1), ('part one', 2, 1), ('part two', 3, 2), ('part three', 4, 3); was: Schema on insert queries with fewer number of columns fails with below error message ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.
[jira] [Updated] (HIVE-10828) Insert...values for fewer number of columns fail
[ https://issues.apache.org/jira/browse/HIVE-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aswathy Chellammal Sreekumar updated HIVE-10828: Description: Schema on insert queries with fewer number of columns fails with below error message ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) *Steps to reproduce:* set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.enforce.bucketing=true; drop table if exists table1; create table table1 (a int, b string, c string) partitioned by (bkt int) clustered by (a) into 2 buckets stored as orc tblproperties ('transactional'='true'); insert into table_1 partition (bkt) (b, a, bkt) values ('part one', 1, 1), ('part one', 2, 1), ('part two', 3, 2), ('part three', 4, 3); was: Schema on insert queries with fewer number of columns fails with below error message ERROR ql.Driver (SessionState.java:printError(957)) - FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7277) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6291) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8992) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8883) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9728) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9621) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10094) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:324) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10105) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apac
[jira] [Commented] (HIVE-10788) Change sort_array to support non-primitive types
[ https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560240#comment-14560240 ] Hive QA commented on HIVE-10788: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735427/HIVE-10788.1.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8977 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_sort_array_wrong1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4048/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4048/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4048/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12735427 - PreCommit-HIVE-TRUNK-Build > Change sort_array to support non-primitive types > > > Key: HIVE-10788 > URL: https://issues.apache.org/jira/browse/HIVE-10788 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-10788.1.patch > > > Currently {{sort_array}} only support primitive types. As we already support > comparison between non-primitive types, it makes sense to remove this > restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286
[ https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560238#comment-14560238 ] Ferdinand Xu commented on HIVE-10819: - Hi [~sershe], [~daijy], the problematic commit is already reverted. {noformat} Repository: hive Updated Branches: refs/heads/master db8067f96 -> a00bf4f87 Revert "HIVE-10277: Unable to process Comment line '--' in HIVE-1.1.0 (Chinna via Xuefu)" This reverts commit d66a7347ab97983cc5b9fca6bdabebc81e5a77e5. {noformat} > SearchArgumentImpl for Timestamp is broken by HIVE-10286 > > > Key: HIVE-10819 > URL: https://issues.apache.org/jira/browse/HIVE-10819 > Project: Hive > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 1.2.1 > > Attachments: HIVE-10819.1.patch, HIVE-10819.2.patch, > HIVE-10819.3.patch > > > The work around for kryo bug for Timestamp is accidentally removed by > HIVE-10286. Need to bring it back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules
[ https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-10528: --- Attachment: HIVE-10528.3.patch > Hiveserver2 in HTTP mode is not applying auth_to_local rules > > > Key: HIVE-10528 > URL: https://issues.apache.org/jira/browse/HIVE-10528 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0 > Environment: Centos 6 >Reporter: Abdelrahman Shettia >Assignee: Abdelrahman Shettia > Attachments: HIVE-10528.1.patch, HIVE-10528.1.patch, > HIVE-10528.2.patch, HIVE-10528.3.patch > > > PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local > mappings do not get applied. Because of this various permissions checks > which rely on the local cluster name for a user are going to fail. > STEPS TO REPRODUCE: > 1. Create kerberos cluster and HS2 in HTTP mode > 2. Create a new user, test, along with a kerberos principal for this user > 3. Create a separate principal, mapped-test > 4. Create an auth_to_local rule to make sure that mapped-test is mapped to > test > 5. As the test user, connect to HS2 with beeline and create a simple table: > {code} > CREATE TABLE permtest (field1 int); > {code} > There is no need to load anything into this table. > 6. Establish that it works as the test user: > {code} > show create table permtest; > {code} > 7. Drop the test identity and become mapped-test > 8. Re-connect to HS2 with beeline, re-run the above command: > {code} > show create table permtest; > {code} > You will find that when this is done in HTTP mode, you will get an HDFS error > (because of StorageBasedAuthorization doing a HDFS permissions check) and the > user will be mapped-test and NOT test as it should be. > ANALYSIS: This appears to be HTTP specific and the problem seems to come in > {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}: > {code} > try { > fullKerberosName = > ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal); > } catch (IOException e) { > throw new HttpAuthenticationException(e); > } > return fullKerberosName.getServiceName(); > {code} > getServiceName applies no auth_to_local rules. Seems like maybe this should > be getShortName()? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-4239) Remove lock on compilation stage
[ https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-4239: -- Assignee: Sergey Shelukhin > Remove lock on compilation stage > > > Key: HIVE-4239 > URL: https://issues.apache.org/jira/browse/HIVE-4239 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Query Processor >Reporter: Carl Steinbach >Assignee: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10528) Hiveserver2 in HTTP mode is not applying auth_to_local rules
[ https://issues.apache.org/jira/browse/HIVE-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdelrahman Shettia updated HIVE-10528: --- Attachment: HIVE-10528.2.patch > Hiveserver2 in HTTP mode is not applying auth_to_local rules > > > Key: HIVE-10528 > URL: https://issues.apache.org/jira/browse/HIVE-10528 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0 > Environment: Centos 6 >Reporter: Abdelrahman Shettia >Assignee: Abdelrahman Shettia > Attachments: HIVE-10528.1.patch, HIVE-10528.1.patch, > HIVE-10528.2.patch > > > PROBLEM: Authenticating to HS2 in HTTP mode with Kerberos, auth_to_local > mappings do not get applied. Because of this various permissions checks > which rely on the local cluster name for a user are going to fail. > STEPS TO REPRODUCE: > 1. Create kerberos cluster and HS2 in HTTP mode > 2. Create a new user, test, along with a kerberos principal for this user > 3. Create a separate principal, mapped-test > 4. Create an auth_to_local rule to make sure that mapped-test is mapped to > test > 5. As the test user, connect to HS2 with beeline and create a simple table: > {code} > CREATE TABLE permtest (field1 int); > {code} > There is no need to load anything into this table. > 6. Establish that it works as the test user: > {code} > show create table permtest; > {code} > 7. Drop the test identity and become mapped-test > 8. Re-connect to HS2 with beeline, re-run the above command: > {code} > show create table permtest; > {code} > You will find that when this is done in HTTP mode, you will get an HDFS error > (because of StorageBasedAuthorization doing a HDFS permissions check) and the > user will be mapped-test and NOT test as it should be. > ANALYSIS: This appears to be HTTP specific and the problem seems to come in > {{ThriftHttpServlet$HttpKerberosServerAction.getPrincipalWithoutRealmAndHost()}}: > {code} > try { > fullKerberosName = > ShimLoader.getHadoopShims().getKerberosNameShim(fullPrincipal); > } catch (IOException e) { > throw new HttpAuthenticationException(e); > } > return fullKerberosName.getServiceName(); > {code} > getServiceName applies no auth_to_local rules. Seems like maybe this should > be getShortName()? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10731) NullPointerException in HiveParser.g
[ https://issues.apache.org/jira/browse/HIVE-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560177#comment-14560177 ] Pengcheng Xiong commented on HIVE-10731: [~jpullokkaran], this patch also needs your review. Thanks. > NullPointerException in HiveParser.g > > > Key: HIVE-10731 > URL: https://issues.apache.org/jira/browse/HIVE-10731 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.2.0 >Reporter: Xiu >Assignee: Pengcheng Xiong >Priority: Minor > Attachments: HIVE-10731.01.patch > > > In HiveParser.g: > {code:Java} > protected boolean useSQL11ReservedKeywordsForIdentifier() { > return !HiveConf.getBoolVar(hiveConf, > HiveConf.ConfVars.HIVE_SUPPORT_SQL11_RESERVED_KEYWORDS); > } > {code} > NullPointerException is thrown when hiveConf is not set. > Stack trace: > {code:Java} > java.lang.NullPointerException > at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2583) > at > org.apache.hadoop.hive.ql.parse.HiveParser.useSQL11ReservedKeywordsForIdentifier(HiveParser.java:1000) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.useSQL11ReservedKeywordsForIdentifier(HiveParser_IdentifiersParser.java:726) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10922) > at > org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45808) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38008) > at > org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36167) > at > org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5214) > at > org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2640) > at > org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10804) CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for limit 0 does not work
[ https://issues.apache.org/jira/browse/HIVE-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10804: --- Attachment: HIVE-10804.01.patch > CBO: Calcite Operator To Hive Operator (Calcite Return Path): optimizer for > limit 0 does not work > - > > Key: HIVE-10804 > URL: https://issues.apache.org/jira/browse/HIVE-10804 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-10804.01.patch > > > {code} > explain > select key,value from src order by key limit 0 > POSTHOOK: type: QUERY > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: src > Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: key (type: string), value (type: string) > outputColumnNames: key, value > Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator > key expressions: key (type: string) > sort order: + > Statistics: Num rows: 500 Data size: 5312 Basic stats: > COMPLETE Column stats: NONE > value expressions: value (type: string) > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: string), VALUE.value (type: > string) > outputColumnNames: key, value > Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE > Column stats: NONE > Limit > Number of rows: 0 > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
[ https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560106#comment-14560106 ] Hive QA commented on HIVE-10809: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735409/HIVE-10809.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8974 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4047/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4047/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4047/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12735409 - PreCommit-HIVE-TRUNK-Build > HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories > -- > > Key: HIVE-10809 > URL: https://issues.apache.org/jira/browse/HIVE-10809 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.0 >Reporter: Selina Zhang >Assignee: Selina Zhang > Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch > > > When static partition is added through HCatStorer or HCatWriter > {code} > JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader(); > STORE JoinedData INTO 'selina.joined_events_e' USING > org.apache.hive.hcatalog.pig.HCatStorer('author=selina'); > {code} > The table directory looks like > {noformat} > drwx-- - selinaz users 0 2015-05-22 21:19 > /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798 > drwx-- - selinaz users 0 2015-05-22 21:19 > /user/selinaz/joined_events_e/author=selina > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560083#comment-14560083 ] Mostafa Mokhtar commented on HIVE-10704: [~apivovarov] Ditto for this one. > Errors in Tez HashTableLoader when estimated table size is 0 > > > Key: HIVE-10704 > URL: https://issues.apache.org/jira/browse/HIVE-10704 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, > HIVE-10704.3.patch > > > Couple of issues: > - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all > tables, the largest small table selection is wrong and could select the large > table (which results in NPE) > - The memory estimates can either divide-by-zero, or allocate 0 memory if the > table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286
[ https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-10819: -- Attachment: HIVE-10819.3.patch The test failures doesn't seems related. Reattach the patch and test again. > SearchArgumentImpl for Timestamp is broken by HIVE-10286 > > > Key: HIVE-10819 > URL: https://issues.apache.org/jira/browse/HIVE-10819 > Project: Hive > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 1.2.1 > > Attachments: HIVE-10819.1.patch, HIVE-10819.2.patch, > HIVE-10819.3.patch > > > The work around for kryo bug for Timestamp is accidentally removed by > HIVE-10286. Need to bring it back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10807) Invalidate basic stats for insert queries if autogather=false
[ https://issues.apache.org/jira/browse/HIVE-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10807: Attachment: HIVE-10807.2.patch > Invalidate basic stats for insert queries if autogather=false > - > > Key: HIVE-10807 > URL: https://issues.apache.org/jira/browse/HIVE-10807 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 1.2.0 >Reporter: Gopal V >Assignee: Ashutosh Chauhan > Attachments: HIVE-10807.2.patch, HIVE-10807.patch > > > if stats.autogather=false leads to incorrect basic stats in case of insert > statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-9069: -- Attachment: HIVE-9069.14.patch > Simplify filter predicates for CBO > -- > > Key: HIVE-9069 > URL: https://issues.apache.org/jira/browse/HIVE-9069 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Jesus Camacho Rodriguez > Fix For: 0.14.1 > > Attachments: HIVE-9069.01.patch, HIVE-9069.02.patch, > HIVE-9069.03.patch, HIVE-9069.04.patch, HIVE-9069.05.patch, > HIVE-9069.06.patch, HIVE-9069.07.patch, HIVE-9069.08.patch, > HIVE-9069.08.patch, HIVE-9069.09.patch, HIVE-9069.10.patch, > HIVE-9069.11.patch, HIVE-9069.12.patch, HIVE-9069.13.patch, > HIVE-9069.14.patch, HIVE-9069.14.patch, HIVE-9069.patch > > > Simplify predicates for disjunctive predicates so that can get pushed down to > the scan. > Looks like this is still an issue, some of the filters can be pushed down to > the scan. > {code} > set hive.cbo.enable=true > set hive.stats.fetch.column.stats=true > set hive.exec.dynamic.partition.mode=nonstrict > set hive.tez.auto.reducer.parallelism=true > set hive.auto.convert.join.noconditionaltask.size=32000 > set hive.exec.reducers.bytes.per.reducer=1 > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager > set hive.support.concurrency=false > set hive.tez.exec.print.summary=true > explain > select substr(r_reason_desc,1,20) as r >,avg(ws_quantity) wq >,avg(wr_refunded_cash) ref >,avg(wr_fee) fee > from web_sales, web_returns, web_page, customer_demographics cd1, > customer_demographics cd2, customer_address, date_dim, reason > where web_sales.ws_web_page_sk = web_page.wp_web_page_sk >and web_sales.ws_item_sk = web_returns.wr_item_sk >and web_sales.ws_order_number = web_returns.wr_order_number >and web_sales.ws_sold_date_sk = date_dim.d_date_sk and d_year = 1998 >and cd1.cd_demo_sk = web_returns.wr_refunded_cdemo_sk >and cd2.cd_demo_sk = web_returns.wr_returning_cdemo_sk >and customer_address.ca_address_sk = web_returns.wr_refunded_addr_sk >and reason.r_reason_sk = web_returns.wr_reason_sk >and >( > ( > cd1.cd_marital_status = 'M' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = '4 yr Degree' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 100.00 and 150.00 > ) >or > ( > cd1.cd_marital_status = 'D' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = 'Primary' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 50.00 and 100.00 > ) >or > ( > cd1.cd_marital_status = 'U' > and > cd1.cd_marital_status = cd2.cd_marital_status > and > cd1.cd_education_status = 'Advanced Degree' > and > cd1.cd_education_status = cd2.cd_education_status > and > ws_sales_price between 150.00 and 200.00 > ) >) >and >( > ( > ca_country = 'United States' > and > ca_state in ('KY', 'GA', 'NM') > and ws_net_profit between 100 and 200 > ) > or > ( > ca_country = 'United States' > and > ca_state in ('MT', 'OR', 'IN') > and ws_net_profit between 150 and 300 > ) > or > ( > ca_country = 'United States' > and > ca_state in ('WI', 'MO', 'WV') > and ws_net_profit between 50 and 250 > ) >) > group by r_reason_desc > order by r, wq, ref, fee > limit 100 > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 9 <- Map 1 (BROADCAST_EDGE) > Reducer 3 <- Map 13 (SIMPLE_EDGE), Map 2 (SIMPLE_EDGE) > Reducer 4 <- Map 9 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE) > Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE) > Reducer 6 <- Map 10 (SIMPLE_EDGE), Map 11 (BROADCAST_EDGE), Map 12 > (BROADCAST_EDGE), Reducer 5 (SIMPLE_EDGE) > Reducer 7 <- Reducer 6 (SIMPLE_EDGE) > Reducer 8 <- Reducer 7 (SIMPLE_EDGE) > DagName: mmokhtar_2014161818_f5fd23ba-d783-4b13-8507-7faa65851798:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: web_page > filterExpr: wp_web_page_sk is not null (type: boolean) > Statistics: Num rows: 4602 Data size: 2696178 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator >
[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560010#comment-14560010 ] Sergey Shelukhin commented on HIVE-10778: - I am clearing the map after build rather than just removing cacheMapWork/etc. parts pertaining to global map, in case I missed somewhere during build that it could be used. > LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2 > > > Key: HIVE-10778 > URL: https://issues.apache.org/jira/browse/HIVE-10778 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: llap >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: llap > > Attachments: HIVE-10778.01.patch, HIVE-10778.patch, llap-hs2-heap.png > > > 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. > !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560009#comment-14560009 ] Sergey Shelukhin commented on HIVE-10778: - [~vikram.dixit] can you take a look? > LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2 > > > Key: HIVE-10778 > URL: https://issues.apache.org/jira/browse/HIVE-10778 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: llap >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: llap > > Attachments: HIVE-10778.01.patch, HIVE-10778.patch, llap-hs2-heap.png > > > 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2. > !llap-hs2-heap.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9105) Hive-0.13 select constant in union all followed by group by gives wrong result
[ https://issues.apache.org/jira/browse/HIVE-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-9105. --- Resolution: Fixed > Hive-0.13 select constant in union all followed by group by gives wrong result > -- > > Key: HIVE-9105 > URL: https://issues.apache.org/jira/browse/HIVE-9105 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > > select '1' as key from srcpart where ds="2008-04-09" > UNION all > SELECT key from srcpart where ds="2008-04-09" and hr="11" > ) tab group by key > will generate wrong results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10788) Change sort_array to support non-primitive types
[ https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10788: Attachment: HIVE-10788.1.patch Like HIVE-10427, UNION type is a little bit tricky to support. Will make that as a follow-up JIRA. > Change sort_array to support non-primitive types > > > Key: HIVE-10788 > URL: https://issues.apache.org/jira/browse/HIVE-10788 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-10788.1.patch > > > Currently {{sort_array}} only support primitive types. As we already support > comparison between non-primitive types, it makes sense to remove this > restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
[ https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559982#comment-14559982 ] Swarnim Kulkarni commented on HIVE-10809: - [~selinazh] Minor feedback: 1. Instead of throwing so many exceptions in the throws, could simply add in a throws Exception to make the test simpler. 2. To make the test stronger, any way we can test that the directories actually existed before the query ran? > HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories > -- > > Key: HIVE-10809 > URL: https://issues.apache.org/jira/browse/HIVE-10809 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.0 >Reporter: Selina Zhang >Assignee: Selina Zhang > Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch > > > When static partition is added through HCatStorer or HCatWriter > {code} > JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader(); > STORE JoinedData INTO 'selina.joined_events_e' USING > org.apache.hive.hcatalog.pig.HCatStorer('author=selina'); > {code} > The table directory looks like > {noformat} > drwx-- - selinaz users 0 2015-05-22 21:19 > /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798 > drwx-- - selinaz users 0 2015-05-22 21:19 > /user/selinaz/joined_events_e/author=selina > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases
[ https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559967#comment-14559967 ] Jesus Camacho Rodriguez commented on HIVE-10811: The method {{trimChild}} trims the child columns keeping 1) the columns needed from the parent ("fieldsUsed"), and 2) the columns on which collations were specified. Currently, the method takes the collations from the parent relation "rel", which seems incorrect as we end up referencing column positions that do not exist in the child "input". Thus, I changed the method to take the collations from the relation on which we are pruning the columns i.e. "input". > RelFieldTrimmer throws NoSuchElementException in some cases > --- > > Key: HIVE-10811 > URL: https://issues.apache.org/jira/browse/HIVE-10811 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, > HIVE-10811.patch > > > RelFieldTrimmer runs into NoSuchElementException in some cases. > Stack trace: > {noformat} > Exception in thread "main" java.lang.AssertionError: Internal error: While > invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult > org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' > at org.apache.calcite.util.Util.newInternal(Util.java:743) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768) > at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536) > ...
[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled
[ https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559918#comment-14559918 ] Matt McCline commented on HIVE-10244: - Ya, I know, that is what I thought. But the new prune flag seems to be on in the Reducer even though isGroupingSetsPresent is false. We should talk to the author and reviewer of the change. Jedi Master [~ashutoshc], can you explain to us Padawan Learners [~jpullokkaran] [~mmccline] [~jcamachorodriguez] all about the prune flag? > Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when > hive.vectorized.execution.reduce.enabled is enabled > --- > > Key: HIVE-10244 > URL: https://issues.apache.org/jira/browse/HIVE-10244 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Matt McCline > Attachments: HIVE-10244.01.patch, explain_q80_vectorized_reduce_on.txt > > > Query > {code} > set hive.vectorized.execution.reduce.enabled=true; > with ssr as > (select s_store_id as store_id, > sum(ss_ext_sales_price) as sales, > sum(coalesce(sr_return_amt, 0)) as returns, > sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit > from store_sales left outer join store_returns on > (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), > date_dim, > store, > item, > promotion > where ss_sold_date_sk = d_date_sk >and d_date between cast('1998-08-04' as date) > and (cast('1998-09-04' as date)) >and ss_store_sk = s_store_sk >and ss_item_sk = i_item_sk >and i_current_price > 50 >and ss_promo_sk = p_promo_sk >and p_channel_tv = 'N' > group by s_store_id) > , > csr as > (select cp_catalog_page_id as catalog_page_id, > sum(cs_ext_sales_price) as sales, > sum(coalesce(cr_return_amount, 0)) as returns, > sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit > from catalog_sales left outer join catalog_returns on > (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), > date_dim, > catalog_page, > item, > promotion > where cs_sold_date_sk = d_date_sk >and d_date between cast('1998-08-04' as date) > and (cast('1998-09-04' as date)) > and cs_catalog_page_sk = cp_catalog_page_sk >and cs_item_sk = i_item_sk >and i_current_price > 50 >and cs_promo_sk = p_promo_sk >and p_channel_tv = 'N' > group by cp_catalog_page_id) > , > wsr as > (select web_site_id, > sum(ws_ext_sales_price) as sales, > sum(coalesce(wr_return_amt, 0)) as returns, > sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit > from web_sales left outer join web_returns on > (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), > date_dim, > web_site, > item, > promotion > where ws_sold_date_sk = d_date_sk >and d_date between cast('1998-08-04' as date) > and (cast('1998-09-04' as date)) > and ws_web_site_sk = web_site_sk >and ws_item_sk = i_item_sk >and i_current_price > 50 >and ws_promo_sk = p_promo_sk >and p_channel_tv = 'N' > group by web_site_id) > select channel > , id > , sum(sales) as sales > , sum(returns) as returns > , sum(profit) as profit > from > (select 'store channel' as channel > , concat('store', store_id) as id > , sales > , returns > , profit > from ssr > union all > select 'catalog channel' as channel > , concat('catalog_page', catalog_page_id) as id > , sales > , returns > , profit > from csr > union all > select 'web channel' as channel > , concat('web_site', web_site_id) as id > , sales > , returns > , profit > from wsr > ) x > group by channel, id with rollup > order by channel > ,id > limit 100 > {code} > Exception > {code} > Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, > diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing vector batch (tag=0) > \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8 > \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7 > \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8 > at > org.
[jira] [Updated] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories
[ https://issues.apache.org/jira/browse/HIVE-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Selina Zhang updated HIVE-10809: Attachment: HIVE-10809.2.patch The above unit test failures seem not relevant to this patch. Uploaded a new patch. Add verification in TestHCatStorer to verify the scratch directories are removed. > HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories > -- > > Key: HIVE-10809 > URL: https://issues.apache.org/jira/browse/HIVE-10809 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.0 >Reporter: Selina Zhang >Assignee: Selina Zhang > Attachments: HIVE-10809.1.patch, HIVE-10809.2.patch > > > When static partition is added through HCatStorer or HCatWriter > {code} > JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader(); > STORE JoinedData INTO 'selina.joined_events_e' USING > org.apache.hive.hcatalog.pig.HCatStorer('author=selina'); > {code} > The table directory looks like > {noformat} > drwx-- - selinaz users 0 2015-05-22 21:19 > /user/selinaz/joined_events_e/_SCRATCH0.9157208938193798 > drwx-- - selinaz users 0 2015-05-22 21:19 > /user/selinaz/joined_events_e/author=selina > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10753) hs2 jdbc url - wrong connection string cause error on beeline/jdbc/odbc client, misleading message
[ https://issues.apache.org/jira/browse/HIVE-10753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559893#comment-14559893 ] Thejas M Nair commented on HIVE-10753: -- +1 > hs2 jdbc url - wrong connection string cause error on beeline/jdbc/odbc > client, misleading message > --- > > Key: HIVE-10753 > URL: https://issues.apache.org/jira/browse/HIVE-10753 > Project: Hive > Issue Type: Bug > Components: Beeline, JDBC >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-10753.1.patch, HIVE-10753.2.patch > > > {noformat} > beeline -u > 'jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http' -n > hdiuser > scan complete in 15ms > Connecting to > jdbc:hive2://localhost:10001/default?httpPath=/;transportMode=http > Java heap space > Beeline version 0.14.0.2.2.4.1-1 by Apache Hive > 0: jdbc:hive2://localhost:10001/default (closed)> ^Chdiuser@headnode0:~$ > But it works if I use the deprecated param - > hdiuser@headnode0:~$ beeline -u > 'jdbc:hive2://localhost:10001/default?hive.server2.transport.mode=http;httpPath=/' > -n hdiuser > scan complete in 12ms > Connecting to > jdbc:hive2://localhost:10001/default?hive.server2.transport.mode=http;httpPath=/ > 15/04/28 23:16:46 [main]: WARN jdbc.Utils: * JDBC param deprecation * > 15/04/28 23:16:46 [main]: WARN jdbc.Utils: The use of > hive.server2.transport.mode is deprecated. > 15/04/28 23:16:46 [main]: WARN jdbc.Utils: Please use transportMode like so: > jdbc:hive2://:/dbName;transportMode= > Connected to: Apache Hive (version 0.14.0.2.2.4.1-1) > Driver: Hive JDBC (version 0.14.0.2.2.4.1-1) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 0.14.0.2.2.4.1-1 by Apache Hive > 0: jdbc:hive2://localhost:10001/default> show tables; > +--+--+ > | tab_name | > +--+--+ > | hivesampletable | > +--+--+ > 1 row selected (18.181 seconds) > 0: jdbc:hive2://localhost:10001/default> ^Chdiuser@headnode0:~$ ^C > {noformat} > The reason for the above message is : > The url is wrong. Correct one: > {code} > beeline -u > 'jdbc:hive2://localhost:10001/default;httpPath=/;transportMode=http' -n > hdiuser > {code} > Note the ";" instead of "?". The deprecation msg prints the format as well: > {code} > Please use transportMode like so: > jdbc:hive2://:/dbName;transportMode= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559879#comment-14559879 ] Alan Gates commented on HIVE-10165: --- I'll review if someone else doesn't get to it first. It will take me a few days to get to it as I'm out the rest of this week. As far as the failing tests, the 5 earlier failures didn't look related to your patch. Unless we really broke the trunk it's surprising to see 600+ test failures for your later patch. Have you tried running some of these locally to see whether you can reproduce them? > Improve hive-hcatalog-streaming extensibility and support updates and deletes. > -- > > Key: HIVE-10165 > URL: https://issues.apache.org/jira/browse/HIVE-10165 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 1.2.0 >Reporter: Elliot West >Assignee: Elliot West > Labels: streaming_api > Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, > HIVE-10165.5.patch > > > h3. Overview > I'd like to extend the > [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] > API so that it also supports the writing of record updates and deletes in > addition to the already supported inserts. > h3. Motivation > We have many Hadoop processes outside of Hive that merge changed facts into > existing datasets. Traditionally we achieve this by: reading in a > ground-truth dataset and a modified dataset, grouping by a key, sorting by a > sequence and then applying a function to determine inserted, updated, and > deleted rows. However, in our current scheme we must rewrite all partitions > that may potentially contain changes. In practice the number of mutated > records is very small when compared with the records contained in a > partition. This approach results in a number of operational issues: > * Excessive amount of write activity required for small data changes. > * Downstream applications cannot robustly read these datasets while they are > being updated. > * Due to scale of the updates (hundreds or partitions) the scope for > contention is high. > I believe we can address this problem by instead writing only the changed > records to a Hive transactional table. This should drastically reduce the > amount of data that we need to write and also provide a means for managing > concurrent access to the data. Our existing merge processes can read and > retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to > an updated form of the hive-hcatalog-streaming API which will then have the > required data to perform an update or insert in a transactional manner. > h3. Benefits > * Enables the creation of large-scale dataset merge processes > * Opens up Hive transactional functionality in an accessible manner to > processes that operate outside of Hive. > h3. Implementation > Our changes do not break the existing API contracts. Instead our approach has > been to consider the functionality offered by the existing API and our > proposed API as fulfilling separate and distinct use-cases. The existing API > is primarily focused on the task of continuously writing large volumes of new > data into a Hive table for near-immediate analysis. Our use-case however, is > concerned more with the frequent but not continuous ingestion of mutations to > a Hive table from some ETL merge process. Consequently we feel it is > justifiable to add our new functionality via an alternative set of public > interfaces and leave the existing API as is. This keeps both APIs clean and > focused at the expense of presenting additional options to potential users. > Wherever possible, shared implementation concerns have been factored out into > abstract base classes that are open to third-party extension. A detailed > breakdown of the changes is as follows: > * We've introduced a public {{RecordMutator}} interface whose purpose is to > expose insert/update/delete operations to the user. This is a counterpart to > the write-only {{RecordWriter}}. We've also factored out life-cycle methods > common to these two interfaces into a super {{RecordOperationWriter}} > interface. Note that the row representation has be changed from {{byte[]}} > to {{Object}}. Within our data processing jobs our records are often > available in a strongly typed and decoded form such as a POJO or a Tuple > object. Therefore is seems to make sense that we are able to pass this > through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} > encoding step. This of course still allows users to use {{byte[]}} if they > wish. > * The introduction of {{RecordMutator}} requires that insert/update/delete > operations
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559866#comment-14559866 ] Elliot West commented on HIVE-10165: I'm not quite sure what to do next. I have a '-1' because some (unrelated) tests fail. However I (perhaps naïvely) don't believe this is connected to my patch. Could someone please review? > Improve hive-hcatalog-streaming extensibility and support updates and deletes. > -- > > Key: HIVE-10165 > URL: https://issues.apache.org/jira/browse/HIVE-10165 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 1.2.0 >Reporter: Elliot West >Assignee: Elliot West > Labels: streaming_api > Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, > HIVE-10165.5.patch > > > h3. Overview > I'd like to extend the > [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] > API so that it also supports the writing of record updates and deletes in > addition to the already supported inserts. > h3. Motivation > We have many Hadoop processes outside of Hive that merge changed facts into > existing datasets. Traditionally we achieve this by: reading in a > ground-truth dataset and a modified dataset, grouping by a key, sorting by a > sequence and then applying a function to determine inserted, updated, and > deleted rows. However, in our current scheme we must rewrite all partitions > that may potentially contain changes. In practice the number of mutated > records is very small when compared with the records contained in a > partition. This approach results in a number of operational issues: > * Excessive amount of write activity required for small data changes. > * Downstream applications cannot robustly read these datasets while they are > being updated. > * Due to scale of the updates (hundreds or partitions) the scope for > contention is high. > I believe we can address this problem by instead writing only the changed > records to a Hive transactional table. This should drastically reduce the > amount of data that we need to write and also provide a means for managing > concurrent access to the data. Our existing merge processes can read and > retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to > an updated form of the hive-hcatalog-streaming API which will then have the > required data to perform an update or insert in a transactional manner. > h3. Benefits > * Enables the creation of large-scale dataset merge processes > * Opens up Hive transactional functionality in an accessible manner to > processes that operate outside of Hive. > h3. Implementation > Our changes do not break the existing API contracts. Instead our approach has > been to consider the functionality offered by the existing API and our > proposed API as fulfilling separate and distinct use-cases. The existing API > is primarily focused on the task of continuously writing large volumes of new > data into a Hive table for near-immediate analysis. Our use-case however, is > concerned more with the frequent but not continuous ingestion of mutations to > a Hive table from some ETL merge process. Consequently we feel it is > justifiable to add our new functionality via an alternative set of public > interfaces and leave the existing API as is. This keeps both APIs clean and > focused at the expense of presenting additional options to potential users. > Wherever possible, shared implementation concerns have been factored out into > abstract base classes that are open to third-party extension. A detailed > breakdown of the changes is as follows: > * We've introduced a public {{RecordMutator}} interface whose purpose is to > expose insert/update/delete operations to the user. This is a counterpart to > the write-only {{RecordWriter}}. We've also factored out life-cycle methods > common to these two interfaces into a super {{RecordOperationWriter}} > interface. Note that the row representation has be changed from {{byte[]}} > to {{Object}}. Within our data processing jobs our records are often > available in a strongly typed and decoded form such as a POJO or a Tuple > object. Therefore is seems to make sense that we are able to pass this > through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} > encoding step. This of course still allows users to use {{byte[]}} if they > wish. > * The introduction of {{RecordMutator}} requires that insert/update/delete > operations are then also exposed on a {{TransactionBatch}} type. We've done > this with the introduction of a public {{MutatorTransactionBatch}} interface > which is a counterpart to the write-only {{TransactionBatch}}
[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases
[ https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559854#comment-14559854 ] Laljo John Pullokkaran commented on HIVE-10811: --- [~jcamachorodriguez] I don't get the patch. Shouldn't we be checking collations from "rel" present in "input"? > RelFieldTrimmer throws NoSuchElementException in some cases > --- > > Key: HIVE-10811 > URL: https://issues.apache.org/jira/browse/HIVE-10811 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, > HIVE-10811.patch > > > RelFieldTrimmer runs into NoSuchElementException in some cases. > Stack trace: > {noformat} > Exception in thread "main" java.lang.AssertionError: Internal error: While > invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult > org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' > at org.apache.calcite.util.Util.newInternal(Util.java:743) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768) > at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536) > ... 32 more > Caused by: java.lang.AssertionError: Internal error: While invoking method > 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult > org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' > at org.apache.calcite.util.Util.newInternal(Util.java:
[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559849#comment-14559849 ] Hive QA commented on HIVE-7723: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735389/HIVE-7723.11.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4046/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4046/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4046/ Messages: {noformat} This message was trimmed, see log for full details [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ spark-client --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ spark-client --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf [copy] Copying 11 files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ spark-client --- [INFO] Compiling 5 source files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/test-classes [INFO] [INFO] --- maven-dependency-plugin:2.8:copy (copy-guava-14) @ spark-client --- [INFO] Configured Artifact: com.google.guava:guava:14.0.1:jar [INFO] Copying guava-14.0.1.jar to /data/hive-ptest/working/apache-github-source-source/spark-client/target/dependency/guava-14.0.1.jar [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ spark-client --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ spark-client --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-1.3.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ spark-client --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ spark-client --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/spark-client/target/spark-client-1.3.0-SNAPSHOT.jar to /home/hiveptest/.m2/repository/org/apache/hive/spark-client/1.3.0-SNAPSHOT/spark-client-1.3.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/spark-client/pom.xml to /home/hiveptest/.m2/repository/org/apache/hive/spark-client/1.3.0-SNAPSHOT/spark-client-1.3.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Query Language 1.3.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-exec --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen Generating vector expression code Generating vector expression test code [INFO] Executed tasks [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java added. [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean added. [INFO] So
[jira] [Commented] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation
[ https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559839#comment-14559839 ] Hive QA commented on HIVE-10812: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735375/HIVE-10812.03.patch {color:green}SUCCESS:{color} +1 8974 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4045/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4045/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4045/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12735375 - PreCommit-HIVE-TRUNK-Build > Scaling PK/FK's selectivity for stats annotation > > > Key: HIVE-10812 > URL: https://issues.apache.org/jira/browse/HIVE-10812 > Project: Hive > Issue Type: Improvement >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch, > HIVE-10812.03.patch > > > Right now, the computation of the selectivity of FK side based on PK side > does not take into consideration of the range of FK and the range of PK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10777) LLAP: add pre-fragment and per-table cache details
[ https://issues.apache.org/jira/browse/HIVE-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-10777. - committed to branch > LLAP: add pre-fragment and per-table cache details > -- > > Key: HIVE-10777 > URL: https://issues.apache.org/jira/browse/HIVE-10777 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: llap > > Attachments: HIVE-10777.01.patch, HIVE-10777.02.patch, > HIVE-10777.WIP.patch, HIVE-10777.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10808) Inner join on Null throwing Cast Exception
[ https://issues.apache.org/jira/browse/HIVE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559800#comment-14559800 ] Swarnim Kulkarni commented on HIVE-10808: - Sounds great. Easier to review patches with tests on it which guarantee that the patch actually works ;) > Inner join on Null throwing Cast Exception > -- > > Key: HIVE-10808 > URL: https://issues.apache.org/jira/browse/HIVE-10808 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.1 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Critical > Attachments: HIVE-10808.patch > > > select > > a.col1, > > a.col2, > > a.col3, > > a.col4 > > from > > tab1 a > > inner join > > ( > > select > > max(x) as x > > from > > tab1 > > where > > x < 20130327 > > ) r > > on > > a.x = r.x > > where > > a.col1 = 'F' > > and a.col3 in ('A', 'S', 'G'); > Failed Task log snippet: > 2015-05-18 19:22:17,372 INFO [main] > org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: > __MAP_PLAN__ > 2015-05-18 19:22:17,372 INFO [main] > org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: > __MAP_PLAN__ > 2015-05-18 19:22:17,457 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: Error in configuring > object > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 9 more > Caused by: java.lang.RuntimeException: Error in configuring object > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) > ... 14 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 17 more > Caused by: java.lang.RuntimeException: Map operator initialization failed > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157) > ... 22 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.NullStructSerDe$NullStructSerDeObjectInspector > cannot be cast to > org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector > at > org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:334) > at > org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:352) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126) > ... 22 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.NullStructSerDe$NullStructSerDeObjectInspector > cannot be cast to > org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.serde2.objectinspec
[jira] [Resolved] (HIVE-10653) LLAP: registry logs strange lines on daemons
[ https://issues.apache.org/jira/browse/HIVE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-10653. - Resolution: Fixed Fix Version/s: llap committed to branch > LLAP: registry logs strange lines on daemons > > > Key: HIVE-10653 > URL: https://issues.apache.org/jira/browse/HIVE-10653 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: llap > > > Discovered while looking at HIVE-10648; [~sseth] mentioned that this should > not be happening. > Most of the daemons described as being killed were actually alive. > Several/all LLAP daemons in the cluster logged these messages at > approximately the same time (while AM was stuck, incidentally; perhaps they > were just bored with no work). > {noformat} > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: > Starting to refresh ServiceInstanceSet 515383300 > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker f698eaee-bf6c-484d-9b90-a60d9005760c which mapped to > DynamicServiceInstance [alive=true, > host=cn057-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 9d1f50d1-f237-43c1-a8c5-32741e82d18b which mapped to > DynamicServiceInstance [alive=true, > host=cn041-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker b8a22e2f-652a-4fde-be7a-744786bc93c9 which mapped to > DynamicServiceInstance [alive=true, > host=cn042-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 8394e271-e0d5-4589-817e-0181db0866b9 which mapped to > DynamicServiceInstance [alive=true, > host=cn056-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 1cabdcce-1089-4de6-abdf-315f18a8b4c0 which mapped to > DynamicServiceInstance [alive=true, > host=cn054-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 4027ad61-8c61-4173-90e2-d166ceaad74b which mapped to > DynamicServiceInstance [alive=true, > host=cn051-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 7f71a05f-f849-43d2-8fdb-09ba144d4b93 which mapped to > DynamicServiceInstance [alive=true, > host=cn050-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 41835ca1-69cd-4290-8c8f-8a9583a5d635 which mapped to > DynamicServiceInstance [alive=true, > host=cn053-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 54952e48-41be-48e1-922c-a39d0ee48a33 which mapped to > DynamicServiceInstance [alive=true, > host=cn055-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 980dfe6c-d03b-462b-bee3-35d183c74aee which mapped to > DynamicServiceInstance [alive=true, > host=cn052-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker d524212a-6743-4f18-bcf6-525a0d4b1a0a which mapped to > DynamicServiceInstance [alive=true, > host=cn046-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: > Killing service instance: DynamicServiceInstance [alive=true, > host=cn048-10.l42scl.hortonworks.com:15001 with resources= vCores:6
[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled
[ https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559797#comment-14559797 ] Laljo John Pullokkaran commented on HIVE-10244: --- [~mmccline] How can you end up grouping id without grouping sets? Language prevents referring to grouping id without grouping sets. If grouping sets are present then previous line should bail out right? if (desc.isGroupingSetsPresent()) { LOG.info("Grouping sets not supported in vector mode"); return false; } > Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when > hive.vectorized.execution.reduce.enabled is enabled > --- > > Key: HIVE-10244 > URL: https://issues.apache.org/jira/browse/HIVE-10244 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Mostafa Mokhtar >Assignee: Matt McCline > Attachments: HIVE-10244.01.patch, explain_q80_vectorized_reduce_on.txt > > > Query > {code} > set hive.vectorized.execution.reduce.enabled=true; > with ssr as > (select s_store_id as store_id, > sum(ss_ext_sales_price) as sales, > sum(coalesce(sr_return_amt, 0)) as returns, > sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit > from store_sales left outer join store_returns on > (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), > date_dim, > store, > item, > promotion > where ss_sold_date_sk = d_date_sk >and d_date between cast('1998-08-04' as date) > and (cast('1998-09-04' as date)) >and ss_store_sk = s_store_sk >and ss_item_sk = i_item_sk >and i_current_price > 50 >and ss_promo_sk = p_promo_sk >and p_channel_tv = 'N' > group by s_store_id) > , > csr as > (select cp_catalog_page_id as catalog_page_id, > sum(cs_ext_sales_price) as sales, > sum(coalesce(cr_return_amount, 0)) as returns, > sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit > from catalog_sales left outer join catalog_returns on > (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), > date_dim, > catalog_page, > item, > promotion > where cs_sold_date_sk = d_date_sk >and d_date between cast('1998-08-04' as date) > and (cast('1998-09-04' as date)) > and cs_catalog_page_sk = cp_catalog_page_sk >and cs_item_sk = i_item_sk >and i_current_price > 50 >and cs_promo_sk = p_promo_sk >and p_channel_tv = 'N' > group by cp_catalog_page_id) > , > wsr as > (select web_site_id, > sum(ws_ext_sales_price) as sales, > sum(coalesce(wr_return_amt, 0)) as returns, > sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit > from web_sales left outer join web_returns on > (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), > date_dim, > web_site, > item, > promotion > where ws_sold_date_sk = d_date_sk >and d_date between cast('1998-08-04' as date) > and (cast('1998-09-04' as date)) > and ws_web_site_sk = web_site_sk >and ws_item_sk = i_item_sk >and i_current_price > 50 >and ws_promo_sk = p_promo_sk >and p_channel_tv = 'N' > group by web_site_id) > select channel > , id > , sum(sales) as sales > , sum(returns) as returns > , sum(profit) as profit > from > (select 'store channel' as channel > , concat('store', store_id) as id > , sales > , returns > , profit > from ssr > union all > select 'catalog channel' as channel > , concat('catalog_page', catalog_page_id) as id > , sales > , returns > , profit > from csr > union all > select 'web channel' as channel > , concat('web_site', web_site_id) as id > , sales > , returns > , profit > from wsr > ) x > group by channel, id with rollup > order by channel > ,id > limit 100 > {code} > Exception > {code} > Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, > diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing vector batch (tag=0) > \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8 > \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7 > \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8 >
[jira] [Commented] (HIVE-10808) Inner join on Null throwing Cast Exception
[ https://issues.apache.org/jira/browse/HIVE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559785#comment-14559785 ] Naveen Gangam commented on HIVE-10808: -- [~swarnim] Agreed. However, we received this stack trace from a customer that can no longer reproduce the issue( their infra underwent some changes/upgrades). We have not been able to reproduce this using a test dataset. If I am able to reproduce this more consistently, I can create a unit test for this. Fair? > Inner join on Null throwing Cast Exception > -- > > Key: HIVE-10808 > URL: https://issues.apache.org/jira/browse/HIVE-10808 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.1 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Critical > Attachments: HIVE-10808.patch > > > select > > a.col1, > > a.col2, > > a.col3, > > a.col4 > > from > > tab1 a > > inner join > > ( > > select > > max(x) as x > > from > > tab1 > > where > > x < 20130327 > > ) r > > on > > a.x = r.x > > where > > a.col1 = 'F' > > and a.col3 in ('A', 'S', 'G'); > Failed Task log snippet: > 2015-05-18 19:22:17,372 INFO [main] > org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: > __MAP_PLAN__ > 2015-05-18 19:22:17,372 INFO [main] > org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: > __MAP_PLAN__ > 2015-05-18 19:22:17,457 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: Error in configuring > object > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 9 more > Caused by: java.lang.RuntimeException: Error in configuring object > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) > ... 14 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 17 more > Caused by: java.lang.RuntimeException: Map operator initialization failed > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157) > ... 22 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.NullStructSerDe$NullStructSerDeObjectInspector > cannot be cast to > org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector > at > org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:334) > at > org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:352) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126) > ... 22 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.NullStructSerDe$NullStructSerDeObjectInspector > cannot be cast to > org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectIns
[jira] [Assigned] (HIVE-10653) LLAP: registry logs strange lines on daemons
[ https://issues.apache.org/jira/browse/HIVE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-10653: --- Assignee: Sergey Shelukhin (was: Gopal V) > LLAP: registry logs strange lines on daemons > > > Key: HIVE-10653 > URL: https://issues.apache.org/jira/browse/HIVE-10653 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > Discovered while looking at HIVE-10648; [~sseth] mentioned that this should > not be happening. > Most of the daemons described as being killed were actually alive. > Several/all LLAP daemons in the cluster logged these messages at > approximately the same time (while AM was stuck, incidentally; perhaps they > were just bored with no work). > {noformat} > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: > Starting to refresh ServiceInstanceSet 515383300 > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker f698eaee-bf6c-484d-9b90-a60d9005760c which mapped to > DynamicServiceInstance [alive=true, > host=cn057-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 9d1f50d1-f237-43c1-a8c5-32741e82d18b which mapped to > DynamicServiceInstance [alive=true, > host=cn041-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker b8a22e2f-652a-4fde-be7a-744786bc93c9 which mapped to > DynamicServiceInstance [alive=true, > host=cn042-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 8394e271-e0d5-4589-817e-0181db0866b9 which mapped to > DynamicServiceInstance [alive=true, > host=cn056-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 1cabdcce-1089-4de6-abdf-315f18a8b4c0 which mapped to > DynamicServiceInstance [alive=true, > host=cn054-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 4027ad61-8c61-4173-90e2-d166ceaad74b which mapped to > DynamicServiceInstance [alive=true, > host=cn051-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 7f71a05f-f849-43d2-8fdb-09ba144d4b93 which mapped to > DynamicServiceInstance [alive=true, > host=cn050-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 41835ca1-69cd-4290-8c8f-8a9583a5d635 which mapped to > DynamicServiceInstance [alive=true, > host=cn053-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 54952e48-41be-48e1-922c-a39d0ee48a33 which mapped to > DynamicServiceInstance [alive=true, > host=cn055-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker 980dfe6c-d03b-462b-bee3-35d183c74aee which mapped to > DynamicServiceInstance [alive=true, > host=cn052-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: Adding > new worker d524212a-6743-4f18-bcf6-525a0d4b1a0a which mapped to > DynamicServiceInstance [alive=true, > host=cn046-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,016 [LlapYarnRegistryRefresher()] INFO > org.apache.hadoop.hive.llap.daemon.registry.impl.LlapYarnRegistryImpl: > Killing service instance: DynamicServiceInstance [alive=true, > host=cn048-10.l42scl.hortonworks.com:15001 with resources= vCores:6>] > 2015-05-07 12:14:30,017 [LlapYarnRegistryRe
[jira] [Updated] (HIVE-10777) LLAP: add pre-fragment and per-table cache details
[ https://issues.apache.org/jira/browse/HIVE-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10777: Attachment: HIVE-10777.02.patch Updated the name of the config setting > LLAP: add pre-fragment and per-table cache details > -- > > Key: HIVE-10777 > URL: https://issues.apache.org/jira/browse/HIVE-10777 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: llap > > Attachments: HIVE-10777.01.patch, HIVE-10777.02.patch, > HIVE-10777.WIP.patch, HIVE-10777.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559731#comment-14559731 ] Mostafa Mokhtar commented on HIVE-10711: Yes, please. > Tez HashTableLoader attempts to allocate more memory than available when > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem > -- > > Key: HIVE-10711 > URL: https://issues.apache.org/jira/browse/HIVE-10711 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, > HIVE-10711.3.patch, HIVE-10711.4.patch > > > Tez HashTableLoader bases its memory allocation on > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the > process max memory then this can result in the HashTableLoader trying to use > more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559732#comment-14559732 ] Mostafa Mokhtar commented on HIVE-10711: Yes, please. > Tez HashTableLoader attempts to allocate more memory than available when > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem > -- > > Key: HIVE-10711 > URL: https://issues.apache.org/jira/browse/HIVE-10711 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, > HIVE-10711.3.patch, HIVE-10711.4.patch > > > Tez HashTableLoader bases its memory allocation on > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the > process max memory then this can result in the HashTableLoader trying to use > more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10825) Add parquet branch profile to jenkins-submit-build.sh
[ https://issues.apache.org/jira/browse/HIVE-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559730#comment-14559730 ] Szehon Ho commented on HIVE-10825: -- +1 > Add parquet branch profile to jenkins-submit-build.sh > - > > Key: HIVE-10825 > URL: https://issues.apache.org/jira/browse/HIVE-10825 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-10825.1.patch > > > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559729#comment-14559729 ] Alexander Pivovarov commented on HIVE-10711: Mostafa, lets wait 24 hours before commit. Just to clarify. Do you want me to commit it to master and then do hotfix (cherry-pick) from master to branch-1.2? > Tez HashTableLoader attempts to allocate more memory than available when > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem > -- > > Key: HIVE-10711 > URL: https://issues.apache.org/jira/browse/HIVE-10711 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, > HIVE-10711.3.patch, HIVE-10711.4.patch > > > Tez HashTableLoader bases its memory allocation on > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the > process max memory then this can result in the HashTableLoader trying to use > more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10825) Add parquet branch profile to jenkins-submit-build.sh
[ https://issues.apache.org/jira/browse/HIVE-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10825: --- Description: NO PRECOMMIT TESTS (was: NO PRECOMMIT TEST) > Add parquet branch profile to jenkins-submit-build.sh > - > > Key: HIVE-10825 > URL: https://issues.apache.org/jira/browse/HIVE-10825 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-10825.1.patch > > > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10825) Add parquet branch profile to jenkins-submit-build.sh
[ https://issues.apache.org/jira/browse/HIVE-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10825: --- Description: NO PRECOMMIT TEST > Add parquet branch profile to jenkins-submit-build.sh > - > > Key: HIVE-10825 > URL: https://issues.apache.org/jira/browse/HIVE-10825 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-10825.1.patch > > > NO PRECOMMIT TEST -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10825) Add parquet branch profile to jenkins-submit-build.sh
[ https://issues.apache.org/jira/browse/HIVE-10825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10825: --- Attachment: HIVE-10825.1.patch > Add parquet branch profile to jenkins-submit-build.sh > - > > Key: HIVE-10825 > URL: https://issues.apache.org/jira/browse/HIVE-10825 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-10825.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10793: Fix Version/s: (was: 1.2.1) 1.3.0 > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.3.0 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7723: -- Attachment: HIVE-7723.11.patch > Explain plan for complex query with lots of partitions is slow due to > in-efficient collection used to find a matching ReadEntity > > > Key: HIVE-7723 > URL: https://issues.apache.org/jira/browse/HIVE-7723 > Project: Hive > Issue Type: Bug > Components: CLI, Physical Optimizer >Affects Versions: 0.13.1 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, > HIVE-7723.11.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, > HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, > HIVE-7723.9.patch > > > Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it > showed that ReadEntity.equals is taking ~40% of the CPU. > ReadEntity.equals is called from the snippet below. > Again and again the set is iterated over to get the actual match, a HashMap > is a better option for this case as Set doesn't have a Get method. > Also for ReadEntity equals is case-insensitive while hash is , which is an > undesired behavior. > {code} > public static ReadEntity addInput(Set inputs, ReadEntity > newInput) { > // If the input is already present, make sure the new parent is added to > the input. > if (inputs.contains(newInput)) { > for (ReadEntity input : inputs) { > if (input.equals(newInput)) { > if ((newInput.getParents() != null) && > (!newInput.getParents().isEmpty())) { > input.getParents().addAll(newInput.getParents()); > input.setDirect(input.isDirect() || newInput.isDirect()); > } > return input; > } > } > assert false; > } else { > inputs.add(newInput); > return newInput; > } > // make compile happy > return null; > } > {code} > This is the query used : > {code} > select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number > ,cs1.b_streen_name ,cs1.b_city > ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city > ,cs1.c_zip ,cs1.syear ,cs1.cnt > ,cs1.s1 ,cs1.s2 ,cs1.s3 > ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt > from > (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as > store_name > ,s_zip as store_zip ,ad1.ca_street_number as b_street_number > ,ad1.ca_street_name as b_streen_name > ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as > c_street_number > ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip > as c_zip > ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) > as cnt > ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 > ,sum(ss_coupon_amt) as s3 > FROM store_sales > JOIN store_returns ON store_sales.ss_item_sk = > store_returns.sr_item_sk and store_sales.ss_ticket_number = > store_returns.sr_ticket_number > JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk > JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk > JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk > JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk > JOIN store ON store_sales.ss_store_sk = store.s_store_sk > JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= > cd1.cd_demo_sk > JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = > cd2.cd_demo_sk > JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk > JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = > hd1.hd_demo_sk > JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = > hd2.hd_demo_sk > JOIN customer_address ad1 ON store_sales.ss_addr_sk = > ad1.ca_address_sk > JOIN customer_address ad2 ON customer.c_current_addr_sk = > ad2.ca_address_sk > JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk > JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk > JOIN item ON store_sales.ss_item_sk = item.i_item_sk > JOIN > (select cs_item_sk > ,sum(cs_ext_list_price) as > sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund > from catalog_sales JOIN catalog_returns > ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk > and catalog_sales.cs_order_number = catalog_returns.cr_order_number > group by cs_item_sk > having > sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) > cs_ui >
[jira] [Commented] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
[ https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559704#comment-14559704 ] Mostafa Mokhtar commented on HIVE-10793: [~sushanth] [~sershe] Can this go to 1.2.1 as well? > Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront > > > Key: HIVE-10793 > URL: https://issues.apache.org/jira/browse/HIVE-10793 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10793.1.patch, HIVE-10793.2.patch > > > HybridHashTableContainer will allocate memory based on estimate, which means > if the actual is less than the estimate the allocated memory won't be used. > Number of partitions is calculated based on estimated data size > {code} > numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, > minNumParts, minWbSize, > nwayConf); > {code} > Then based on number of partitions writeBufferSize is set > {code} > writeBufferSize = (int)(estimatedTableSize / numPartitions); > {code} > Each hash partition will allocate 1 WriteBuffer, with no further allocation > if the estimate data size is correct. > Suggested solution is to reduce writeBufferSize by a factor such that only X% > of the memory is preallocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559699#comment-14559699 ] Mostafa Mokhtar commented on HIVE-10711: [~sushanth] FYI [~apivovarov] Can you please commit the change to 1.2.1? > Tez HashTableLoader attempts to allocate more memory than available when > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem > -- > > Key: HIVE-10711 > URL: https://issues.apache.org/jira/browse/HIVE-10711 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, > HIVE-10711.3.patch, HIVE-10711.4.patch > > > Tez HashTableLoader bases its memory allocation on > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the > process max memory then this can result in the HashTableLoader trying to use > more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10819) SearchArgumentImpl for Timestamp is broken by HIVE-10286
[ https://issues.apache.org/jira/browse/HIVE-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559696#comment-14559696 ] Sergey Shelukhin commented on HIVE-10819: - this breaks a lot of tests... > SearchArgumentImpl for Timestamp is broken by HIVE-10286 > > > Key: HIVE-10819 > URL: https://issues.apache.org/jira/browse/HIVE-10819 > Project: Hive > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 1.2.1 > > Attachments: HIVE-10819.1.patch, HIVE-10819.2.patch > > > The work around for kryo bug for Timestamp is accidentally removed by > HIVE-10286. Need to bring it back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559667#comment-14559667 ] Alexander Pivovarov commented on HIVE-10711: +1 > Tez HashTableLoader attempts to allocate more memory than available when > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem > -- > > Key: HIVE-10711 > URL: https://issues.apache.org/jira/browse/HIVE-10711 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, > HIVE-10711.3.patch, HIVE-10711.4.patch > > > Tez HashTableLoader bases its memory allocation on > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the > process max memory then this can result in the HashTableLoader trying to use > more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10749) Implement Insert ACID statement for parquet [Parquet branch]
[ https://issues.apache.org/jira/browse/HIVE-10749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10749: --- Attachment: HIVE-10749.2-parquet.patch Re-attaching patch to allow jenkins job to execute tests on parquet branch > Implement Insert ACID statement for parquet [Parquet branch] > > > Key: HIVE-10749 > URL: https://issues.apache.org/jira/browse/HIVE-10749 > Project: Hive > Issue Type: Sub-task >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10749.1.patch, HIVE-10749.1.patch, > HIVE-10749.2-parquet.patch, HIVE-10749.2.patch, HIVE-10749.patch > > > We need to implement insert statement for parquet format like ORC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10749) Implement Insert ACID statement for parquet [Parquet branch]
[ https://issues.apache.org/jira/browse/HIVE-10749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10749: --- Summary: Implement Insert ACID statement for parquet [Parquet branch] (was: Implement Insert ACID statement for parquet) > Implement Insert ACID statement for parquet [Parquet branch] > > > Key: HIVE-10749 > URL: https://issues.apache.org/jira/browse/HIVE-10749 > Project: Hive > Issue Type: Sub-task >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-10749.1.patch, HIVE-10749.1.patch, > HIVE-10749.2.patch, HIVE-10749.patch > > > We need to implement insert statement for parquet format like ORC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
[ https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559618#comment-14559618 ] Mostafa Mokhtar commented on HIVE-10711: [~apivovarov] do you have anymore feedback? > Tez HashTableLoader attempts to allocate more memory than available when > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem > -- > > Key: HIVE-10711 > URL: https://issues.apache.org/jira/browse/HIVE-10711 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Mostafa Mokhtar > Fix For: 1.2.1 > > Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, > HIVE-10711.3.patch, HIVE-10711.4.patch > > > Tez HashTableLoader bases its memory allocation on > HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the > process max memory then this can result in the HashTableLoader trying to use > more memory than available to the process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation
[ https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-10812: --- Attachment: HIVE-10812.03.patch > Scaling PK/FK's selectivity for stats annotation > > > Key: HIVE-10812 > URL: https://issues.apache.org/jira/browse/HIVE-10812 > Project: Hive > Issue Type: Improvement >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch, > HIVE-10812.03.patch > > > Right now, the computation of the selectivity of FK side based on PK side > does not take into consideration of the range of FK and the range of PK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10812) Scaling PK/FK's selectivity for stats annotation
[ https://issues.apache.org/jira/browse/HIVE-10812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559538#comment-14559538 ] Laljo John Pullokkaran commented on HIVE-10812: --- +1 > Scaling PK/FK's selectivity for stats annotation > > > Key: HIVE-10812 > URL: https://issues.apache.org/jira/browse/HIVE-10812 > Project: Hive > Issue Type: Improvement >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-10812.01.patch, HIVE-10812.02.patch > > > Right now, the computation of the selectivity of FK side based on PK side > does not take into consideration of the range of FK and the range of PK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases
[ https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559540#comment-14559540 ] Hive QA commented on HIVE-10811: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735331/HIVE-10811.02.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8973 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_crc32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sha1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4044/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4044/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4044/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12735331 - PreCommit-HIVE-TRUNK-Build > RelFieldTrimmer throws NoSuchElementException in some cases > --- > > Key: HIVE-10811 > URL: https://issues.apache.org/jira/browse/HIVE-10811 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, > HIVE-10811.patch > > > RelFieldTrimmer runs into NoSuchElementException in some cases. > Stack trace: > {noformat} > Exception in thread "main" java.lang.AssertionError: Internal error: While > invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult > org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' > at org.apache.calcite.util.Util.newInternal(Util.java:743) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768) > at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorIm
[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559395#comment-14559395 ] Gopal V commented on HIVE-10792: [~gaodayue]: not sure if the patch can retain PPD for map-joins. {{alias.size == 1}} might jump out of PPD cases even the dummy operators are the present. > PPD leads to wrong answer when mapper scans the same table with multiple > aliases > > > Key: HIVE-10792 > URL: https://issues.apache.org/jira/browse/HIVE-10792 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1 >Reporter: Dayue Gao >Assignee: Dayue Gao >Priority: Critical > Fix For: 1.2.1 > > Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, > HIVE-10792.test.sql > > > Here's the steps to reproduce the bug. > First of all, prepare a simple ORC table with one row > {code} > create table test_orc (c0 int, c1 int) stored as ORC; > {code} > Table: test_orc > ||c0||c1|| > |0|1| > The following SQL gets empty result which is not expected > {code} > select * from test_orc t1 > union all > select * from test_orc t2 > where t2.c0 = 1 > {code} > Self join is also broken > {code} > set hive.auto.convert.join=false; -- force common join > select * from test_orc t1 > left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); > {code} > It gets empty result while the expected answer is > ||t1.c0||t1.c1||t2.c0||t2.c1|| > |0|1|NULL|NULL| > In these cases, we pushdown predicates into OrcInputFormat. As a result, > TableScanOperator for "t1" can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6991) History not able to disable/enable after session started
[ https://issues.apache.org/jira/browse/HIVE-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559359#comment-14559359 ] Hive QA commented on HIVE-6991: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735306/HIVE-6991.2.patch {color:red}ERROR:{color} -1 due to 637 failed/errored test(s), 8973 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_protect_mode org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition_authorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_change_schema org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_comments org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_compression_enabled org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_compression_enabled_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_date org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_deserialize_map_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_evolved_schemas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_nullable_fields org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_sanity_test org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_timestamp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_type_evolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_udfs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketconte
[jira] [Comment Edited] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559325#comment-14559325 ] Xuefu Zhang edited comment on HIVE-10304 at 5/26/15 4:23 PM: - The final decision will be replacing Hive CLI's implementation with beeline (HIVE-10511). You still have the script file (hive.sh). Since you have so many scripts using Hive CLI. When HIVE-10511 is in place, it would be great if you can test it with your script. Thanks. was (Author: xuefuz): The final decision will be replacing Hive CLI's implementation with beeline (HIVE-10511). You still have the script. Since you have so many scripts using Hive CLI. When HIVE-10511 is in place, it would be great if you can test it with your script. Thanks. > Add deprecation message to HiveCLI > -- > > Key: HIVE-10304 > URL: https://issues.apache.org/jira/browse/HIVE-10304 > Project: Hive > Issue Type: Sub-task > Components: CLI >Affects Versions: 1.1.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Labels: TODOC1.2 > Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch > > > As Beeline is now the recommended command line tool to Hive, we should add a > message to HiveCLI to indicate that it is deprecated and redirect them to > Beeline. > This is not suggesting to remove HiveCLI for now, but just a helpful > direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559325#comment-14559325 ] Xuefu Zhang commented on HIVE-10304: The final decision will be replacing Hive CLI's implementation with beeline (HIVE-10511). You still have the script. Since you have so many scripts using Hive CLI. When HIVE-10511 is in place, it would be great if you can test it with your script. Thanks. > Add deprecation message to HiveCLI > -- > > Key: HIVE-10304 > URL: https://issues.apache.org/jira/browse/HIVE-10304 > Project: Hive > Issue Type: Sub-task > Components: CLI >Affects Versions: 1.1.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Labels: TODOC1.2 > Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch > > > As Beeline is now the recommended command line tool to Hive, we should add a > message to HiveCLI to indicate that it is deprecated and redirect them to > Beeline. > This is not suggesting to remove HiveCLI for now, but just a helpful > direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10277) Unable to process Comment line '--' in HIVE-1.1.0
[ https://issues.apache.org/jira/browse/HIVE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559321#comment-14559321 ] Xuefu Zhang commented on HIVE-10277: Thank you. I have reverted it. @Chinna, I'll reopen the JIRA. Could you resubmit a patch if it's still a problem, and make sure that tests passes. Thanks, Xuefu On Tue, May 26, 2015 at 7:00 AM, Ferdinand Xu (JIRA) > Unable to process Comment line '--' in HIVE-1.1.0 > - > > Key: HIVE-10277 > URL: https://issues.apache.org/jira/browse/HIVE-10277 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0 >Reporter: Kaveen Raajan >Assignee: Chinna Rao Lalam >Priority: Minor > Labels: hive > Fix For: 1.3.0 > > Attachments: HIVE-10277-1.patch, HIVE-10277.2.patch, HIVE-10277.patch > > > I tried to use comment line (*--*) in HIVE-1.1.0 grunt shell like, > ~hive>--this is comment line~ > ~hive>show tables;~ > I got error like > {quote} > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java: > 1020) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:19 > 9) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:16 > 6) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2 > 07) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754 > ) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > FAILED: ParseException line 2:0 cannot recognize input near '' '' > ' F>' > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10304) Add deprecation message to HiveCLI
[ https://issues.apache.org/jira/browse/HIVE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559313#comment-14559313 ] Hari Sekhon commented on HIVE-10304: If just recommending to users to use Beeline instead of Hive CLI that is fine but if Hive 1 CLI was every removed that would cause major headaches to users such as myself who have lots of scripts and programs that make calls to Hive CLI and rewriting things that already work fine for years is not cool. In fact it's the opposite of cool. > Add deprecation message to HiveCLI > -- > > Key: HIVE-10304 > URL: https://issues.apache.org/jira/browse/HIVE-10304 > Project: Hive > Issue Type: Sub-task > Components: CLI >Affects Versions: 1.1.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Labels: TODOC1.2 > Attachments: HIVE-10304.2.patch, HIVE-10304.3.patch, HIVE-10304.patch > > > As Beeline is now the recommended command line tool to Hive, we should add a > message to HiveCLI to indicate that it is deprecated and redirect them to > Beeline. > This is not suggesting to remove HiveCLI for now, but just a helpful > direction for user to know the direction to focus attention in Beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-10277) Unable to process Comment line '--' in HIVE-1.1.0
[ https://issues.apache.org/jira/browse/HIVE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reopened HIVE-10277: Patch is reverted because of test failures. Please resubmit patch if problem remains. > Unable to process Comment line '--' in HIVE-1.1.0 > - > > Key: HIVE-10277 > URL: https://issues.apache.org/jira/browse/HIVE-10277 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.0.0 >Reporter: Kaveen Raajan >Assignee: Chinna Rao Lalam >Priority: Minor > Labels: hive > Fix For: 1.3.0 > > Attachments: HIVE-10277-1.patch, HIVE-10277.2.patch, HIVE-10277.patch > > > I tried to use comment line (*--*) in HIVE-1.1.0 grunt shell like, > ~hive>--this is comment line~ > ~hive>show tables;~ > I got error like > {quote} > NoViableAltException(-1@[]) > at > org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java: > 1020) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:19 > 9) > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:16 > 6) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2 > 07) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754 > ) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > FAILED: ParseException line 2:0 cannot recognize input near '' '' > ' F>' > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly
[ https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-10815: - Attachment: (was: HIVE-10815.patch) > Let HiveMetaStoreClient Choose MetaStore Randomly > - > > Key: HIVE-10815 > URL: https://issues.apache.org/jira/browse/HIVE-10815 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Metastore >Affects Versions: 1.2.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-10815.patch > > > Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs > when multiple metastores configured. > Choosing MetaStore Randomly will be good for load balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly
[ https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-10815: - Attachment: HIVE-10815.patch > Let HiveMetaStoreClient Choose MetaStore Randomly > - > > Key: HIVE-10815 > URL: https://issues.apache.org/jira/browse/HIVE-10815 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Metastore >Affects Versions: 1.2.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-10815.patch, HIVE-10815.patch > > > Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs > when multiple metastores configured. > Choosing MetaStore Randomly will be good for load balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10802) Table join query with some constant field in select fails
[ https://issues.apache.org/jira/browse/HIVE-10802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-10802: --- Assignee: Aihua Xu > Table join query with some constant field in select fails > - > > Key: HIVE-10802 > URL: https://issues.apache.org/jira/browse/HIVE-10802 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > > The following query fails: > {noformat} > create table tb1 (year string, month string); > create table tb2(month string); > select unix_timestamp(a.year) > from (select * from tb1 where year='2001') a join tb2 b on (a.month=b.month); > {noformat} > with the exception {noformat} > Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) > at java.util.ArrayList.get(ArrayList.java:411) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:109) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175) > {noformat} > The issue seems to be: during the query compilation, the field in the select > should be replaced with the constant when some UDFs are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559231#comment-14559231 ] Dayue Gao commented on HIVE-10792: -- I don't think the failed tests are related to this patch. [~gopalv] [~thejas] [~sershe] Could you have a look at this? Should it be backported to old releases? > PPD leads to wrong answer when mapper scans the same table with multiple > aliases > > > Key: HIVE-10792 > URL: https://issues.apache.org/jira/browse/HIVE-10792 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Processor >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1 >Reporter: Dayue Gao >Assignee: Dayue Gao >Priority: Critical > Fix For: 1.2.1 > > Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, > HIVE-10792.test.sql > > > Here's the steps to reproduce the bug. > First of all, prepare a simple ORC table with one row > {code} > create table test_orc (c0 int, c1 int) stored as ORC; > {code} > Table: test_orc > ||c0||c1|| > |0|1| > The following SQL gets empty result which is not expected > {code} > select * from test_orc t1 > union all > select * from test_orc t2 > where t2.c0 = 1 > {code} > Self join is also broken > {code} > set hive.auto.convert.join=false; -- force common join > select * from test_orc t1 > left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); > {code} > It gets empty result while the expected answer is > ||t1.c0||t1.c1||t2.c0||t2.c1|| > |0|1|NULL|NULL| > In these cases, we pushdown predicates into OrcInputFormat. As a result, > TableScanOperator for "t1" can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9069) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559220#comment-14559220 ] Hive QA commented on HIVE-9069: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735298/HIVE-9069.14.patch {color:red}ERROR:{color} -1 due to 636 failed/errored test(s), 8974 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_protect_mode org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition_authorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_serde2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_add_column3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_change_schema org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_comments org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_compression_enabled org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_compression_enabled_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_date org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_deserialize_map_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_evolved_schemas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_nullable_fields org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_sanity_test org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_timestamp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_type_evolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_udfs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcont
[jira] [Updated] (HIVE-10811) RelFieldTrimmer throws NoSuchElementException in some cases
[ https://issues.apache.org/jira/browse/HIVE-10811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10811: --- Attachment: HIVE-10811.02.patch > RelFieldTrimmer throws NoSuchElementException in some cases > --- > > Key: HIVE-10811 > URL: https://issues.apache.org/jira/browse/HIVE-10811 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-10811.01.patch, HIVE-10811.02.patch, > HIVE-10811.patch > > > RelFieldTrimmer runs into NoSuchElementException in some cases. > Stack trace: > {noformat} > Exception in thread "main" java.lang.AssertionError: Internal error: While > invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult > org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' > at org.apache.calcite.util.Util.newInternal(Util.java:743) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:269) > at > org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:175) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:947) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:820) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:768) > at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:730) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:105) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:607) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:244) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:536) > ... 32 more > Caused by: java.lang.AssertionError: Internal error: While invoking method > 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult > org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Sort,org.apache.calcite.util.ImmutableBitSet,java.util.Set)' > at org.apache.calcite.util.Util.newInternal(Util.java:743) > at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:543) > at > org.apache.calcite.sql2rel.RelFiel
[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()
[ https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8458: - Description: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null && reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE was: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null && reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE > Potential null dereference in Utilities#clearWork() > --- > > Key: HIVE-8458 > URL: https://issues.apache.org/jira/browse/HIVE-8458 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Ted Yu >Assignee: skrho >Priority: Minor > Attachments: HIVE-8458_001.patch > > > {code} > Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); > Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); > // if the plan path hasn't been initialized just return, nothing to clean. > if (mapPath == null && reducePath == null) { > return; > } > try { > FileSystem fs = mapPath.getFileSystem(conf); > {code} > If mapPath is null but reducePath is not null, getFileSystem() call would > produce NPE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-9605) Remove parquet nested objects from wrapper writable objects
[ https://issues.apache.org/jira/browse/HIVE-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reopened HIVE-9605: Sorry [~spena], seems master has lots of failed cases. I will commit after it comes back to normal. > Remove parquet nested objects from wrapper writable objects > --- > > Key: HIVE-9605 > URL: https://issues.apache.org/jira/browse/HIVE-9605 > Project: Hive > Issue Type: Sub-task >Affects Versions: 0.14.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: parquet-branch > > Attachments: HIVE-9605.3.patch, HIVE-9605.4.patch, HIVE-9605.5.patch, > HIVE-9605.6.patch > > > Parquet nested types are using an extra wrapper object (ArrayWritable) as a > wrapper of map and list elements. This extra object is not needed and causing > unnecessary memory allocations. > An example of code is on HiveCollectionConverter.java: > {noformat} > public void end() { > parent.set(index, wrapList(new ArrayWritable( > Writable.class, list.toArray(new Writable[list.size()]; > } > {noformat} > This object is later unwrapped on AbstractParquetMapInspector, i.e.: > {noformat} > final Writable[] mapContainer = ((ArrayWritable) data).get(); > final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get(); > for (final Writable obj : mapArray) { > ... > } > {noformat} > We should get rid of this wrapper object to save time and memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)