[jira] [Created] (HIVE-10408) LLAP: query fails - execution is rejected when it shouldn't be
Sergey Shelukhin created HIVE-10408: --- Summary: LLAP: query fails - execution is rejected when it shouldn't be Key: HIVE-10408 URL: https://issues.apache.org/jira/browse/HIVE-10408 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth {noformat} Caused by: org.apache.hadoop.ipc.RemoteException(java.util.concurrent.RejectedExecutionException): Queues are full. Rejecting request. at org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.schedule(TaskExecutorService.java:182) at org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.schedule(TaskExecutorService.java:63) at org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:202) at org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:258) at org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.submitWork(LlapDaemonProtocolServerImpl.java:71) at org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:8698) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:972) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2056) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2052) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2050) at org.apache.hadoop.ipc.Client.call(Client.java:1492) at org.apache.hadoop.ipc.Client.call(Client.java:1423) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) ... 8 more {noformat} The query, running alone on 10-node cluster, dumped 1000 mappers into running; with 3 completed it failed with that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: VOTE: move to git
INFRA jira has been filed: https://issues.apache.org/jira/browse/INFRA-9488 On 15/4/16, 17:38, Vikram Dixit vik...@hortonworks.com wrote: +1. Hope to see this happen soon. On 4/16/15, 11:42 AM, Sergey Shelukhin ser...@hortonworks.com wrote: It’s hard to tell from bylaws how long this vote should run. I’d say 3 days (release vote length) should be appropriate. After ~51 more hours (on Monday) I will file JIRA with INFRA if the consensus stands. On 15/4/15, 23:08, Thejas Nair thejas.n...@gmail.com wrote: Thanks Sergey, that's an useful link. From there, I found the reference to the accumulo git process guide, that looks like a great starting point for hive as well - http://accumulo.apache.org/git.html . We could use some of the suggestions there to make the git history saner. For example, do a rebase of the pull request before merging, or do a 'merge --squash'. We should create a copy of that in hive wiki. +1 to the move to git. A big plus is also that we won't need to separately create review board entries. On Wed, Apr 15, 2015 at 6:21 PM, Sergey Shelukhin ser...@hortonworks.com wrote: The example process can be viewed here: https://issues.apache.org/jira/browse/INFRA-7768 I assume we will use commits and not pull requests. On 15/4/15, 16:59, Thejas Nair thejas.n...@gmail.com wrote: We need to be clear on how the process works with move to git, I am not familiar with the process followed by other groups that use git currently. A [DISCUSS] thread for discussing the specifics might be appropriate. I assume the central git repository would still be the one in apache. Does that we mean we work using *github* pull requests like what spark does ? - https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spar k # Co ntributingtoSpark-ContributingCode I haven't done this, but I assume it is possible to pull a github pull request into the apache git. Would that be the way to merge the patch in ? One (misplaced?) concern I have about git, is that merging pull requests would result in lots of branches and it would be hard to understand the sequence of changes. (maybe rebasing the changes in pull request before commit would help). I would like to see more clarity on this before we move ahead. On Wed, Apr 15, 2015 at 2:46 PM, Sergey Shelukhin ser...@apache.org wrote: Hi. We’ve been discussing this some time ago; this time I¹d like to start an official vote about moving Hive project to git from svn. I volunteer to facilitate the move; that seems to be just filing INFRA jira, and following instructions such as verifying that the new repo is sane. Please vote: +1 move to git 0 don’t care -1 stay on svn +1.
[jira] [Created] (HIVE-10409) Webhcat tests need to be updated, to accomodate HADOOP-10193
Aswathy Chellammal Sreekumar created HIVE-10409: --- Summary: Webhcat tests need to be updated, to accomodate HADOOP-10193 Key: HIVE-10409 URL: https://issues.apache.org/jira/browse/HIVE-10409 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 1.2.0 Reporter: Aswathy Chellammal Sreekumar Assignee: Aswathy Chellammal Sreekumar Priority: Minor Fix For: 1.2.0 Webhcat tests need to be updated to accommodate the url change brought in by HADOOP-10193. Add ?user.name=user-name for the templeton calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10407) separate out the timestamp ranges for testing purposes
Owen O'Malley created HIVE-10407: Summary: separate out the timestamp ranges for testing purposes Key: HIVE-10407 URL: https://issues.apache.org/jira/browse/HIVE-10407 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Some platforms have limits for date ranges, so separate out the test cases that are outside of the range 1970 to 2038. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10411) LLAP: NPE caused by HIVE-10397
Prasanth Jayachandran created HIVE-10411: Summary: LLAP: NPE caused by HIVE-10397 Key: HIVE-10411 URL: https://issues.apache.org/jira/browse/HIVE-10411 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix NPE caused by HIVE-10397 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10412) CBO : Calculate join selectivity when computing HiveJoin cost
Mostafa Mokhtar created HIVE-10412: -- Summary: CBO : Calculate join selectivity when computing HiveJoin cost Key: HIVE-10412 URL: https://issues.apache.org/jira/browse/HIVE-10412 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran This is from TPC-DS Q7 Because we don't compute the selectivity of sub-expression in a HiveJoin we assume that selective and non-selective joins have the similar cost. {code} select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, item where store_sales.ss_item_sk = item.i_item_sk and store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and cd_gender = 'F' and cd_marital_status = 'W' and cd_education_status = 'Primary' group by i_item_id order by i_item_id limit 100 {code} Cardinality {code} item 462,000 customer_demographics 1,920,800 store_sales 82,510,879,939 {code} NDVs {code} item.i_item_sk 439501 customer_demographics.cd_demo_sk 1835839 store_sales.ss_cdemo_sk 1835839 {code} From the logs {code} 2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for: HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 io}]) HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]]) HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3]) HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]]) 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io} 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 rows, 2.1362E11 cpu, 1.07207098E7 io} 2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(78)) - MapJoin selected 2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for: HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not available]) HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], cost=[{8.2511341939E10 rows, 2.1362E11 cpu, 1.07207098E7 io}]) HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]]) HiveProject(i_item_sk=[$0], i_item_id=[$1]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]]) HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], cd_education_status=[$3]) HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))]) HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]]) 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.25108951834E10 rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io} 2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel (HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 io} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures
Richard Williams created HIVE-10410: --- Summary: Apparent race condition in HiveServer2 causing intermittent query failures Key: HIVE-10410 URL: https://issues.apache.org/jira/browse/HIVE-10410 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.1 Environment: CDH 5.3.3 CentOS 6.5 Reporter: Richard Williams On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC occasionally trigger odd Thrift exceptions with messages such as Read a negative frame size (-2147418110)! or out of sequence response in HiveServer2's connections to the metastore. For certain metastore calls (for example, showDatabases), these Thrift exceptions are converted to MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient from retrying these calls and thus causes the failure to bubble out to the JDBC client. Note that as far as we can tell, this issue appears to only affect queries that are submitted with the runAsync flag on TExecuteStatementReq set to true (which, in practice, seems to mean all JDBC queries), and it appears to only manifest when HiveServer2 is using the new HTTP transport mechanism. When both these conditions hold, we are able to fairly reliably reproduce the issue by spawning about 100 simple, concurrent hive queries (we have been using show databases), two or three of which typically fail. However, when either of these conditions do not hold, we are no longer able to reproduce the issue. Some example stack traces from the HiveServer2 logs: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33251/ --- (Updated April 21, 2015, 1:37 a.m.) Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang. Changes --- Changed the assumption. The small tables are cache only for the same work. Bugs: HIVE-10302 https://issues.apache.org/jira/browse/HIVE-10302 Repository: hive-git Description --- Cached the small table containter so that mapjoin tasks can use it if the task is executed on the same Spark executor. The cache is released right before the next job after the mapjoin job is done. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java fe108c4 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 97b3471 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 72ab913 Diff: https://reviews.apache.org/r/33251/diff/ Testing --- Ran several queries in live cluster. ptest pending. Thanks, Jimmy Xiang
[jira] [Created] (HIVE-10413) [CBO] Return path assumes distinct column cant be same as grouping column
Ashutosh Chauhan created HIVE-10413: --- Summary: [CBO] Return path assumes distinct column cant be same as grouping column Key: HIVE-10413 URL: https://issues.apache.org/jira/browse/HIVE-10413 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Found in cbo_udf_udaf.q tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10405) LLAP: Provide runtime information to daemons to decide on preemption order
Siddharth Seth created HIVE-10405: - Summary: LLAP: Provide runtime information to daemons to decide on preemption order Key: HIVE-10405 URL: https://issues.apache.org/jira/browse/HIVE-10405 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
herfindahl index UDAF
Hi Everyone Do you think it is possible to create UDAF to calculate Herfindahl Index (HHI) http://en.wikipedia.org/wiki/Herfindahl_index Calculation: vratio ratio^2 100 0.10.01 100 0.10.01 300 0.30.09 100 0.10.01 200 0.20.04 200 0.20.04 SUM 1000 1 0.2 HHI = 0.2 Currently I use the following SQL: select sum(pow(t.v / t2.sum_v, 2)) from t join (select sum(v) sum_v from t) t2 Is it possible to create UDAF for it? e.g. select hhi(v) from t; Lets assume set of v can fit in reducer memory.
[jira] [Created] (HIVE-10406) LLAP: Make use of additional information to determine run/preemption order
Siddharth Seth created HIVE-10406: - Summary: LLAP: Make use of additional information to determine run/preemption order Key: HIVE-10406 URL: https://issues.apache.org/jira/browse/HIVE-10406 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Fix For: llap The preemption will evolve as it's tested. Had a discussion offline with [~hagleitn]. The initial policy will likely be the following. Within a running DAG, the priority / topo order decides which fragment runs / is a candidate for preemption. Beyond this, the number of tasks in the current vertex + upstream vertices will be used as a measure of the size of the query to determine which fragment gets to run, if there's multiple fragments queued up. Fragments with a lower count will be preferred to push through what are expected to be shorter running queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10399) from_unixtime_millis() Hive UDF
Hari Sekhon created HIVE-10399: -- Summary: from_unixtime_millis() Hive UDF Key: HIVE-10399 URL: https://issues.apache.org/jira/browse/HIVE-10399 Project: Hive Issue Type: New Feature Components: UDF Environment: HDP 2.2 Reporter: Hari Sekhon Priority: Minor Feature request for a {code}from_unixtime_millis(){code} Hive UDF - from_unixtime() accepts only secs since epoch, and right now the solution is to create a custom UDF, but this seems like quite a standard thing to support millisecond precision dates in Hive natively. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc
Prasanth Jayachandran created HIVE-10397: Summary: LLAP: Implement Tez SplitSizeEstimator for Orc Key: HIVE-10397 URL: https://issues.apache.org/jira/browse/HIVE-10397 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran This is patch for HIVE-7428. For now this will be in llap branch as hive has not bumped up the tez version yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10398) Variables to raise errors on select from non-existent or empty partitions rather than just return 0 rows
Hari Sekhon created HIVE-10398: -- Summary: Variables to raise errors on select from non-existent or empty partitions rather than just return 0 rows Key: HIVE-10398 URL: https://issues.apache.org/jira/browse/HIVE-10398 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.14.0 Environment: HDP 2.2 Reporter: Hari Sekhon Priority: Minor Feature request to add 2 new variables to raise errors on queries that select from empty or non-existent partitions, eg: {code}set hive.error.select.non-existent.partition=true; SELECT * FROM myTable WHERE date='2015-02-29'; raise some error here {code} Currently the behaviour is to return success with zero rows, which doesn't make practical sense in many cases, and I only detected this because my bulk jobs started completing too quickly. I work around this now by listing all partitions and then checking against that before launching the bulk job but it would be more convenient to have the query just fail as is logical in these sorts of scenarios. There should also be a similar variable for selecting from empty partitions for people who expect their partitions to be populated (a very common expectation), such as: {code}hive.error.select.empty.partition{code} This is somewhat similar to the existing variable {code}hive.error.on.empty.partition{code} except this existing one only covers dynamic partition inserts that generate empty partitions, so the suggested query side variable would be a logical counterpart to that. Having these variables as 'false' by default these would make these completely backwards compatible improvements. Hari Sekhon http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10401) splitCondition does not behave correctly when one side of the condition references columns from different inputs
Jesus Camacho Rodriguez created HIVE-10401: -- Summary: splitCondition does not behave correctly when one side of the condition references columns from different inputs Key: HIVE-10401 URL: https://issues.apache.org/jira/browse/HIVE-10401 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10400) CBO (Calcite Return Path): Exception when column name contains dot or colon characters
Jesus Camacho Rodriguez created HIVE-10400: -- Summary: CBO (Calcite Return Path): Exception when column name contains dot or colon characters Key: HIVE-10400 URL: https://issues.apache.org/jira/browse/HIVE-10400 Project: Hive Issue Type: Sub-task Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez If return path is on, this query produces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 33171: HIVE-10307:Support to use number literals in partition column
On April 17, 2015, 12:28 p.m., Lefty Leverenz wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 1951-1954 https://reviews.apache.org/r/33171/diff/4/?file=931959#file931959line1951 typo: covert should be convert in line 1953 Looks good in new patch. Thanks Chaoyu Tang! (Oops, forgot to publish.) - Lefty --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33171/#review80511 --- On April 17, 2015, 1:30 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33171/ --- (Updated April 17, 2015, 1:30 p.m.) Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-10307 https://issues.apache.org/jira/browse/HIVE-10307 Repository: hive-git Description --- Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as literals with postfix like Y, S, L, or BD appended to the number. These literals work in most Hive queries, but do not when they are used as partition column value. This patch is to address the issue of number literals used in partition specification. Highlights of the changes: 1. Validate, convert and normalize the partVal in partSpec to match its column type when hive.partition.check.column.type is set to true (default). It not only applies to opertion insert which used to be controlled by hive.typecheck.on.insert, but also for other partition operations (e.g. alter table .. partition, partition statistics etc). The hive.typecheck.on.insert is now removed. 2. Convert and normalize legacy partition column data by using alter table partition .. rename with hive.partition.check.old.column.type.in.rename set to true. this property only allows the partVal in old PartSpec to skip the type check, conversion in partition rename. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e138800 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 19234b5 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java e8066be ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 8302067 ql/src/test/queries/clientpositive/alter_partition_coltype.q 8c9945c ql/src/test/queries/clientpositive/partition_coltype_literals.q PRE-CREATION ql/src/test/queries/clientpositive/partition_type_check.q c9bca99 ql/src/test/results/clientnegative/archive_partspec1.q.out da4817c ql/src/test/results/clientnegative/archive_partspec5.q.out c18de52 ql/src/test/results/clientpositive/partition_coltype_literals.q.out PRE-CREATION ql/src/test/results/clientpositive/partition_timestamp.q.out bc6ab10 ql/src/test/results/clientpositive/partition_timestamp2.q.out 365df69 Diff: https://reviews.apache.org/r/33171/diff/ Testing --- 1. Manaully tests covering various number literals (Y, S, L, BD) 2. new qfile test (partition_coltype_literals.q) 3. Precommit build Thanks, Chaoyu Tang
Re: Review Request 33171: HIVE-10307:Support to use number literals in partition column
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33171/#review80737 --- Thanks for the explanation. The patch looks good to me. - Jimmy Xiang On April 17, 2015, 8:30 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33171/ --- (Updated April 17, 2015, 8:30 p.m.) Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-10307 https://issues.apache.org/jira/browse/HIVE-10307 Repository: hive-git Description --- Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as literals with postfix like Y, S, L, or BD appended to the number. These literals work in most Hive queries, but do not when they are used as partition column value. This patch is to address the issue of number literals used in partition specification. Highlights of the changes: 1. Validate, convert and normalize the partVal in partSpec to match its column type when hive.partition.check.column.type is set to true (default). It not only applies to opertion insert which used to be controlled by hive.typecheck.on.insert, but also for other partition operations (e.g. alter table .. partition, partition statistics etc). The hive.typecheck.on.insert is now removed. 2. Convert and normalize legacy partition column data by using alter table partition .. rename with hive.partition.check.old.column.type.in.rename set to true. this property only allows the partVal in old PartSpec to skip the type check, conversion in partition rename. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e138800 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 19234b5 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java e8066be ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 8302067 ql/src/test/queries/clientpositive/alter_partition_coltype.q 8c9945c ql/src/test/queries/clientpositive/partition_coltype_literals.q PRE-CREATION ql/src/test/queries/clientpositive/partition_type_check.q c9bca99 ql/src/test/results/clientnegative/archive_partspec1.q.out da4817c ql/src/test/results/clientnegative/archive_partspec5.q.out c18de52 ql/src/test/results/clientpositive/partition_coltype_literals.q.out PRE-CREATION ql/src/test/results/clientpositive/partition_timestamp.q.out bc6ab10 ql/src/test/results/clientpositive/partition_timestamp2.q.out 365df69 Diff: https://reviews.apache.org/r/33171/diff/ Testing --- 1. Manaully tests covering various number literals (Y, S, L, BD) 2. new qfile test (partition_coltype_literals.q) 3. Precommit build Thanks, Chaoyu Tang
[jira] [Created] (HIVE-10404) hive.exec.parallel=true causes out of sequence response and SocketTimeoutException: Read timed out
Eugene Koifman created HIVE-10404: - Summary: hive.exec.parallel=true causes out of sequence response and SocketTimeoutException: Read timed out Key: HIVE-10404 URL: https://issues.apache.org/jira/browse/HIVE-10404 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Eugene Koifman With hive.exec.parallel=true, Driver.lauchTask() calls Task.initialize() from 1 thread on several Tasks. It then starts new threads to run those tasks. Taks.initiazlie() gets an instance of Hive and holds on to it. Hive.java internally uses ThreadLocal to hand out instances, but since Task.initialize() is called by a single thread from the Driver multiple tasks share an instance of Hive. Each Hive instances has a single instance of MetaStoreClient; the later is not thread safe. With hive.exec.parallel=true, different threads actually execute the tasks, different threads end up sharing the same MetaStoreClient. If you make 2 concurrent calls, for example Hive.getTable(String), the Thrift responses may return to the wrong caller. Thus the first caller gets out of sequence response, drops this message and reconnects. If the timing is right, it will consume the other's response, but the the other caller will block for hive.metastore.client.socket.timeout since its response message has now been lost. This is just one concrete example. One possible fix is to make Task.db use ThreadLocal. This could be related to HIVE-6893 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10402) LLAP: scheduling occasionally schedules all the work on one machine
Sergey Shelukhin created HIVE-10402: --- Summary: LLAP: scheduling occasionally schedules all the work on one machine Key: HIVE-10402 URL: https://issues.apache.org/jira/browse/HIVE-10402 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Gopal V That seems to have started happening after random scheduling changes. Myself and [~seth.siddha...@gmail.com] have run into the situations where all (or vast majority) of the work is scheduled on one machine (out of 6 and 3), the one first in the list in my case, even in absence of failures and after taking all the setup precautions. Logs were available some time earlier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join
Wei Zheng created HIVE-10403: Summary: Add n-way join support for Hybrid Grace Hash Join Key: HIVE-10403 URL: https://issues.apache.org/jira/browse/HIVE-10403 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Currently Hybrid Grace Hash Join only supports 2-way join (one big table and one small table). This task will enable n-way join (one big table and multiple small tables). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33367/ --- Review request for hive. Bugs: HIVE-10382 https://issues.apache.org/jira/browse/HIVE-10382 Repository: hive-git Description --- Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e Diff: https://reviews.apache.org/r/33367/diff/ Testing --- Thanks, Vaibhav Gumashta
Re: Review Request 32549: HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32549/ --- (Updated April 20, 2015, 6:42 p.m.) Review request for hive, Gunther Hagleitner and Vikram Dixit Kumaraswamy. Repository: hive-git Description --- In q.test environment with src table, execute the following query: {code} CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE; CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE; FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1 UNION all select s2.key as key, s2.value as value from src s2) unionsrc INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key, unionsrc.value; select * from DEST1; select * from DEST2; {code} DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row tst1 500 1 Diffs (updated) - common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Vertex.java b45c782 itests/src/test/resources/testconfiguration.properties 0a5d839 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java 90616ad ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 4dcdf91 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 0990894 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWorkWalker.java 08fd61e ql/src/test/queries/clientpositive/explainuser_2.q 03264ca ql/src/test/queries/clientpositive/tez_union_multiinsert.q PRE-CREATION ql/src/test/results/clientpositive/tez/explainuser_2.q.out ea6b558 ql/src/test/results/clientpositive/tez/tez_union_multiinsert.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32549/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 33299: HIVE-10376: Move code to create jar for ivydownload.q to a separate id in maven ant-run-plugin in itests/pom.xml
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33299/#review80658 --- LGTM, just one suggestion. itests/qtest/pom.xml https://reviews.apache.org/r/33299/#comment130788 I think we might add the required java classes in the itest project and use it in qtest from maven installation folder. Just one suggestion. - cheng xu On April 17, 2015, 11:12 a.m., Anant Nag wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33299/ --- (Updated April 17, 2015, 11:12 a.m.) Review request for hive. Bugs: HIVE-10376 https://issues.apache.org/jira/browse/HIVE-10376 Repository: hive-git Description --- Currently the code to create an example jar for ivyDownload.q is piggybanked on the download-spark ant-run-plugin id. This code should be moved to a separate execution id called something like create-ivytest-jar or more generally itests-setup. Diffs - itests/pom.xml 6f6cf742c41a11647589692ac4f266f467be2812 itests/qtest/pom.xml 1c3f74c113df4c1d45664f68098ec428666e1d3e Diff: https://reviews.apache.org/r/33299/diff/ Testing --- Thanks, Anant Nag
[jira] [Created] (HIVE-10396) decimal_precision2.q test is failing on trunk
Ashutosh Chauhan created HIVE-10396: --- Summary: decimal_precision2.q test is failing on trunk Key: HIVE-10396 URL: https://issues.apache.org/jira/browse/HIVE-10396 Project: Hive Issue Type: Test Components: Types Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan seems like missing golden file update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)