[jira] [Created] (HIVE-10408) LLAP: query fails - execution is rejected when it shouldn't be

2015-04-20 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10408:
---

 Summary: LLAP: query fails - execution is rejected when it 
shouldn't be
 Key: HIVE-10408
 URL: https://issues.apache.org/jira/browse/HIVE-10408
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth


{noformat}
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.util.concurrent.RejectedExecutionException):
 Queues are full. Rejecting request.
at 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.schedule(TaskExecutorService.java:182)
at 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.schedule(TaskExecutorService.java:63)
at 
org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:202)
at 
org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:258)
at 
org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.submitWork(LlapDaemonProtocolServerImpl.java:71)
at 
org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:8698)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:972)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2056)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2052)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2050)

at org.apache.hadoop.ipc.Client.call(Client.java:1492)
at org.apache.hadoop.ipc.Client.call(Client.java:1423)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
... 8 more
{noformat}

The query, running alone on 10-node cluster, dumped 1000 mappers into running; 
with 3 completed it failed with that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: VOTE: move to git

2015-04-20 Thread Sergey Shelukhin
INFRA jira has been filed:
https://issues.apache.org/jira/browse/INFRA-9488


On 15/4/16, 17:38, Vikram Dixit vik...@hortonworks.com wrote:

+1. Hope to see this happen soon.

On 4/16/15, 11:42 AM, Sergey Shelukhin ser...@hortonworks.com wrote:

It’s hard to tell from bylaws how long this vote should run. I’d say  3
days (release vote length) should be appropriate. After ~51 more hours
(on
Monday) I will file JIRA with INFRA if the consensus stands.

On 15/4/15, 23:08, Thejas Nair thejas.n...@gmail.com wrote:

Thanks Sergey, that's an useful link.
From there, I found the reference to the accumulo git process guide,
that looks like a great starting point for hive as well -
http://accumulo.apache.org/git.html .

We could use some of the suggestions there to make the git history
saner. For example, do a rebase of the pull request before merging, or
do a 'merge --squash'. We should create a copy of that in hive wiki.

+1 to the move to git. A big plus is also that we won't need to
separately create review board entries.




On Wed, Apr 15, 2015 at 6:21 PM, Sergey Shelukhin
ser...@hortonworks.com wrote:
 The example process can be viewed here:
 https://issues.apache.org/jira/browse/INFRA-7768

 I assume we will use commits and not pull requests.

 On 15/4/15, 16:59, Thejas Nair thejas.n...@gmail.com wrote:

We need to be clear on how the process works with move to git, I am
not familiar with the process followed by other groups that use git
currently.

A [DISCUSS] thread for discussing the specifics might be appropriate.
I assume the central git repository would still be the one in apache.
Does that we mean we work using *github* pull requests like what spark
does ? -
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spar
k
#
Co
ntributingtoSpark-ContributingCode

I haven't done this, but I assume it is possible to pull a github pull
request into the apache git. Would that be the way to merge the patch
in ?
One (misplaced?) concern I have about git, is that merging pull
requests would result in lots of branches and it would be hard to
understand the sequence of changes. (maybe rebasing the changes in
pull request before commit would help).

I would like to see more clarity on this before we move ahead.


On Wed, Apr 15, 2015 at 2:46 PM, Sergey Shelukhin ser...@apache.org
wrote:
 Hi.
 We’ve been discussing this some time ago; this time I¹d like to
start
an
 official vote about moving Hive project to git from svn.

 I volunteer to facilitate the move; that seems to be just filing
INFRA
 jira, and following instructions such as verifying that the new repo
is
 sane.

 Please vote:
 +1 move to git
 0 don’t care
 -1 stay on svn

 +1.









[jira] [Created] (HIVE-10409) Webhcat tests need to be updated, to accomodate HADOOP-10193

2015-04-20 Thread Aswathy Chellammal Sreekumar (JIRA)
Aswathy Chellammal Sreekumar created HIVE-10409:
---

 Summary: Webhcat tests need to be updated, to accomodate 
HADOOP-10193
 Key: HIVE-10409
 URL: https://issues.apache.org/jira/browse/HIVE-10409
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 1.2.0
Reporter: Aswathy Chellammal Sreekumar
Assignee: Aswathy Chellammal Sreekumar
Priority: Minor
 Fix For: 1.2.0


Webhcat tests need to be updated to accommodate the url change brought in by 
HADOOP-10193. Add ?user.name=user-name for the templeton calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10407) separate out the timestamp ranges for testing purposes

2015-04-20 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10407:


 Summary: separate out the timestamp ranges for testing purposes
 Key: HIVE-10407
 URL: https://issues.apache.org/jira/browse/HIVE-10407
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Some platforms have limits for date ranges, so separate out the test cases that 
are outside of the range 1970 to 2038.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10411) LLAP: NPE caused by HIVE-10397

2015-04-20 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-10411:


 Summary: LLAP: NPE caused by HIVE-10397
 Key: HIVE-10411
 URL: https://issues.apache.org/jira/browse/HIVE-10411
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Fix NPE caused by HIVE-10397



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10412) CBO : Calculate join selectivity when computing HiveJoin cost

2015-04-20 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created HIVE-10412:
--

 Summary: CBO : Calculate join selectivity when computing HiveJoin 
cost
 Key: HIVE-10412
 URL: https://issues.apache.org/jira/browse/HIVE-10412
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran


This is from TPC-DS Q7
Because we don't compute the selectivity of sub-expression in a HiveJoin we 
assume that selective and non-selective joins have the similar cost.

{code}
select  i_item_id, 
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4 
 from store_sales, customer_demographics, item
 where store_sales.ss_item_sk = item.i_item_sk and
   store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk and
   cd_gender = 'F' and 
   cd_marital_status = 'W' and
   cd_education_status = 'Primary'
 group by i_item_id
 order by i_item_id
 limit 100
{code}

Cardinality 
{code}
item 462,000
customer_demographics 1,920,800
store_sales 82,510,879,939
{code}

NDVs
{code}
item.i_item_sk 439501
customer_demographics.cd_demo_sk 1835839
store_sales.ss_cdemo_sk 1835839
{code}



From the logs 
{code}
2015-04-20 21:09:58,055 DEBUG [main]: cost.HiveCostModel 
(HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
HiveJoin(condition=[=($0, $10)], joinType=[inner], algorithm=[none], cost=[not 
available])
  HiveJoin(condition=[=($1, $6)], joinType=[inner], algorithm=[MapJoin], 
cost=[{8.25108951834E10 rows, 2.324083308641975E8 cpu, 275417.56 
io}])
HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], 
ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]])
HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], 
cd_education_status=[$3])
  HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])

HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]])
  HiveProject(i_item_sk=[$0], i_item_id=[$1])
HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]])

2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
(HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {6.553102534841269E8 
rows, 4.0217814199458417E18 cpu, 3.499540319862703E7 io}
2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
(HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {6.553102534841269E8 rows, 
2.1362E11 cpu, 1.07207098E7 io}
2015-04-20 21:09:58,056 DEBUG [main]: cost.HiveCostModel 
(HiveCostModel.java:getJoinCost(78)) - MapJoin selected
2015-04-20 21:09:58,057 DEBUG [main]: cost.HiveCostModel 
(HiveCostModel.java:getJoinCost(60)) - Join algorithm selection for:
HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], cost=[not 
available])
  HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[MapJoin], 
cost=[{8.2511341939E10 rows, 2.1362E11 cpu, 1.07207098E7 io}])
HiveProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], ss_quantity=[$9], 
ss_list_price=[$11], ss_sales_price=[$12], ss_coupon_amt=[$18])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.store_sales]])
HiveProject(i_item_sk=[$0], i_item_id=[$1])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.item]])
  HiveProject(cd_demo_sk=[$0], cd_gender=[$1], cd_marital_status=[$2], 
cd_education_status=[$3])
HiveFilter(condition=[AND(=($1, 'F'), =($2, 'W'), =($3, 'Primary'))])
  
HiveTableScan(table=[[tpcds_bin_partitioned_orc_3.customer_demographics]])

2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel 
(HiveCostModel.java:getJoinCost(69)) - CommonJoin cost: {8.25108951834E10 
rows, 2.6089279242468144E21 cpu, 4.901146588836599E9 io}
2015-04-20 21:09:58,058 DEBUG [main]: cost.HiveCostModel 
(HiveCostModel.java:getJoinCost(69)) - MapJoin cost: {8.25108951834E10 
rows, 2.324083308641975E8 cpu, 275417.56 io}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures

2015-04-20 Thread Richard Williams (JIRA)
Richard Williams created HIVE-10410:
---

 Summary: Apparent race condition in HiveServer2 causing 
intermittent query failures
 Key: HIVE-10410
 URL: https://issues.apache.org/jira/browse/HIVE-10410
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.1
 Environment: CDH 5.3.3
CentOS 6.5
Reporter: Richard Williams


On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC 
occasionally trigger odd Thrift exceptions with messages such as Read a 
negative frame size (-2147418110)! or out of sequence response in 
HiveServer2's connections to the metastore. For certain metastore calls (for 
example, showDatabases), these Thrift exceptions are converted to 
MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient 
from retrying these calls and thus causes the failure to bubble out to the JDBC 
client.

Note that as far as we can tell, this issue appears to only affect queries that 
are submitted with the runAsync flag on TExecuteStatementReq set to true 
(which, in practice, seems to mean all JDBC queries), and it appears to only 
manifest when HiveServer2 is using the new HTTP transport mechanism. When both 
these conditions hold, we are able to fairly reliably reproduce the issue by 
spawning about 100 simple, concurrent hive queries (we have been using show 
databases), two or three of which typically fail. However, when either of 
these conditions do not hold, we are no longer able to reproduce the issue.

Some example stack traces from the HiveServer2 logs:




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

2015-04-20 Thread Jimmy Xiang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33251/
---

(Updated April 21, 2015, 1:37 a.m.)


Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.


Changes
---

Changed the assumption. The small tables are cache only for the same work.


Bugs: HIVE-10302
https://issues.apache.org/jira/browse/HIVE-10302


Repository: hive-git


Description
---

Cached the small table containter so that mapjoin tasks can use it if the task 
is executed on the same Spark executor.
The cache is released right before the next job after the mapjoin job is done.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java fe108c4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
3f240f5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
97b3471 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 72ab913 

Diff: https://reviews.apache.org/r/33251/diff/


Testing
---

Ran several queries in live cluster. ptest pending.


Thanks,

Jimmy Xiang



[jira] [Created] (HIVE-10413) [CBO] Return path assumes distinct column cant be same as grouping column

2015-04-20 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-10413:
---

 Summary: [CBO] Return path assumes distinct column cant be same as 
grouping column
 Key: HIVE-10413
 URL: https://issues.apache.org/jira/browse/HIVE-10413
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Found in cbo_udf_udaf.q tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10405) LLAP: Provide runtime information to daemons to decide on preemption order

2015-04-20 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10405:
-

 Summary: LLAP: Provide runtime information to daemons to decide on 
preemption order
 Key: HIVE-10405
 URL: https://issues.apache.org/jira/browse/HIVE-10405
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


herfindahl index UDAF

2015-04-20 Thread Alexander Pivovarov
Hi Everyone

Do you think it is possible to create UDAF to calculate Herfindahl Index
(HHI)
http://en.wikipedia.org/wiki/Herfindahl_index

Calculation:

vratio  ratio^2
100   0.10.01
100   0.10.01
300   0.30.09
100   0.10.01
200   0.20.04
200   0.20.04

SUM
1000  1  0.2

HHI = 0.2

Currently I use the following SQL:
select sum(pow(t.v / t2.sum_v, 2))
from t
join (select sum(v) sum_v from t) t2


Is it possible to create UDAF for it?
e.g.
select hhi(v) from t;

Lets assume set of v can fit in reducer memory.


[jira] [Created] (HIVE-10406) LLAP: Make use of additional information to determine run/preemption order

2015-04-20 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10406:
-

 Summary: LLAP: Make use of additional information to determine 
run/preemption order
 Key: HIVE-10406
 URL: https://issues.apache.org/jira/browse/HIVE-10406
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
 Fix For: llap


The preemption will evolve as it's tested.
Had a discussion offline with [~hagleitn]. The initial policy will likely be 
the following.

Within a running DAG, the priority / topo order decides which fragment runs / 
is a candidate for preemption.
Beyond this, the number of tasks in the current vertex + upstream vertices will 
be used as a measure of the size of the query to determine which fragment gets 
to run, if there's multiple fragments queued up. Fragments with a lower count 
will be preferred to push through what are expected to be shorter running 
queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10399) from_unixtime_millis() Hive UDF

2015-04-20 Thread Hari Sekhon (JIRA)
Hari Sekhon created HIVE-10399:
--

 Summary: from_unixtime_millis() Hive UDF
 Key: HIVE-10399
 URL: https://issues.apache.org/jira/browse/HIVE-10399
 Project: Hive
  Issue Type: New Feature
  Components: UDF
 Environment: HDP 2.2
Reporter: Hari Sekhon
Priority: Minor


Feature request for a
{code}from_unixtime_millis(){code}
Hive UDF - from_unixtime() accepts only secs since epoch, and right now the 
solution is to create a custom UDF, but this seems like quite a standard thing 
to support millisecond precision dates in Hive natively.

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc

2015-04-20 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-10397:


 Summary: LLAP: Implement Tez SplitSizeEstimator for Orc
 Key: HIVE-10397
 URL: https://issues.apache.org/jira/browse/HIVE-10397
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


This is patch for HIVE-7428. For now this will be in llap branch as hive has 
not bumped up the tez version yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10398) Variables to raise errors on select from non-existent or empty partitions rather than just return 0 rows

2015-04-20 Thread Hari Sekhon (JIRA)
Hari Sekhon created HIVE-10398:
--

 Summary: Variables to raise errors on select from non-existent or 
empty partitions rather than just return 0 rows
 Key: HIVE-10398
 URL: https://issues.apache.org/jira/browse/HIVE-10398
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: HDP 2.2
Reporter: Hari Sekhon
Priority: Minor


Feature request to add 2 new variables to raise errors on queries that select 
from empty or non-existent partitions, eg:
{code}set hive.error.select.non-existent.partition=true;
SELECT * FROM myTable WHERE date='2015-02-29';
raise some error here
{code}
Currently the behaviour is to return success with zero rows, which doesn't make 
practical sense in many cases, and I only detected this because my bulk jobs 
started completing too quickly. I work around this now by listing all 
partitions and then checking against that before launching the bulk job but it 
would be more convenient to have the query just fail as is logical in these 
sorts of scenarios.

There should also be a similar variable for selecting from empty partitions for 
people who expect their partitions to be populated (a very common expectation), 
such as:
{code}hive.error.select.empty.partition{code}
This is somewhat similar to the existing variable
{code}hive.error.on.empty.partition{code}
except this existing one only covers dynamic partition inserts that generate 
empty partitions, so the suggested query side variable would be a logical 
counterpart to that.

Having these variables as 'false' by default these would make these completely 
backwards compatible improvements.

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10401) splitCondition does not behave correctly when one side of the condition references columns from different inputs

2015-04-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-10401:
--

 Summary: splitCondition does not behave correctly when one side of 
the condition references columns from different inputs
 Key: HIVE-10401
 URL: https://issues.apache.org/jira/browse/HIVE-10401
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10400) CBO (Calcite Return Path): Exception when column name contains dot or colon characters

2015-04-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-10400:
--

 Summary: CBO (Calcite Return Path): Exception when column name 
contains dot or colon characters
 Key: HIVE-10400
 URL: https://issues.apache.org/jira/browse/HIVE-10400
 Project: Hive
  Issue Type: Sub-task
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


If return path is on, this query produces the problem:

{noformat}
select cbo_t3.c_int, c, count(*)
from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1
where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int  0 or cbo_t1.c_float = 0)
group by c_float, cbo_t1.c_int, key order by a) cbo_t1
join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2
where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int  0 or cbo_t2.c_float = 0)
group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on 
cbo_t1.a=p
join cbo_t3 on cbo_t1.a=key
where (b + cbo_t2.q = 0) and (b  0 or c_int = 0)
group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33171: HIVE-10307:Support to use number literals in partition column

2015-04-20 Thread Lefty Leverenz


 On April 17, 2015, 12:28 p.m., Lefty Leverenz wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 1951-1954
  https://reviews.apache.org/r/33171/diff/4/?file=931959#file931959line1951
 
  typo:  covert should be convert in line 1953

Looks good in new patch.  Thanks Chaoyu Tang!

(Oops, forgot to publish.)


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33171/#review80511
---


On April 17, 2015, 1:30 p.m., Chaoyu Tang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33171/
 ---
 
 (Updated April 17, 2015, 1:30 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-10307
 https://issues.apache.org/jira/browse/HIVE-10307
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as 
 literals with postfix like Y, S, L, or BD appended to the number. These 
 literals work in most Hive queries, but do not when they are used as 
 partition column value. This patch is to address the issue of number literals 
 used in partition specification.
 Highlights of the changes:
 1. Validate, convert and normalize the partVal in partSpec to match its 
 column type when hive.partition.check.column.type is set to true (default). 
 It not only applies to opertion insert which used to be controlled by 
 hive.typecheck.on.insert, but also for other partition operations (e.g. 
 alter table .. partition, partition statistics etc). The 
 hive.typecheck.on.insert is now removed.
 2. Convert and normalize legacy partition column data by using alter table 
 partition .. rename with hive.partition.check.old.column.type.in.rename set 
 to true. this property only allows the partVal in old PartSpec to skip the 
 type check, conversion in partition rename.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e138800 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 19234b5 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
 e8066be 
   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
 8302067 
   ql/src/test/queries/clientpositive/alter_partition_coltype.q 8c9945c 
   ql/src/test/queries/clientpositive/partition_coltype_literals.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/partition_type_check.q c9bca99 
   ql/src/test/results/clientnegative/archive_partspec1.q.out da4817c 
   ql/src/test/results/clientnegative/archive_partspec5.q.out c18de52 
   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/partition_timestamp.q.out bc6ab10 
   ql/src/test/results/clientpositive/partition_timestamp2.q.out 365df69 
 
 Diff: https://reviews.apache.org/r/33171/diff/
 
 
 Testing
 ---
 
 1. Manaully tests covering various number literals (Y, S, L, BD)
 2. new qfile test (partition_coltype_literals.q)
 3. Precommit build
 
 
 Thanks,
 
 Chaoyu Tang
 




Re: Review Request 33171: HIVE-10307:Support to use number literals in partition column

2015-04-20 Thread Jimmy Xiang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33171/#review80737
---


Thanks for the explanation. The patch looks good to me.

- Jimmy Xiang


On April 17, 2015, 8:30 p.m., Chaoyu Tang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33171/
 ---
 
 (Updated April 17, 2015, 8:30 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-10307
 https://issues.apache.org/jira/browse/HIVE-10307
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as 
 literals with postfix like Y, S, L, or BD appended to the number. These 
 literals work in most Hive queries, but do not when they are used as 
 partition column value. This patch is to address the issue of number literals 
 used in partition specification.
 Highlights of the changes:
 1. Validate, convert and normalize the partVal in partSpec to match its 
 column type when hive.partition.check.column.type is set to true (default). 
 It not only applies to opertion insert which used to be controlled by 
 hive.typecheck.on.insert, but also for other partition operations (e.g. 
 alter table .. partition, partition statistics etc). The 
 hive.typecheck.on.insert is now removed.
 2. Convert and normalize legacy partition column data by using alter table 
 partition .. rename with hive.partition.check.old.column.type.in.rename set 
 to true. this property only allows the partVal in old PartSpec to skip the 
 type check, conversion in partition rename.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e138800 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 19234b5 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
 e8066be 
   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
 8302067 
   ql/src/test/queries/clientpositive/alter_partition_coltype.q 8c9945c 
   ql/src/test/queries/clientpositive/partition_coltype_literals.q 
 PRE-CREATION 
   ql/src/test/queries/clientpositive/partition_type_check.q c9bca99 
   ql/src/test/results/clientnegative/archive_partspec1.q.out da4817c 
   ql/src/test/results/clientnegative/archive_partspec5.q.out c18de52 
   ql/src/test/results/clientpositive/partition_coltype_literals.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/partition_timestamp.q.out bc6ab10 
   ql/src/test/results/clientpositive/partition_timestamp2.q.out 365df69 
 
 Diff: https://reviews.apache.org/r/33171/diff/
 
 
 Testing
 ---
 
 1. Manaully tests covering various number literals (Y, S, L, BD)
 2. new qfile test (partition_coltype_literals.q)
 3. Precommit build
 
 
 Thanks,
 
 Chaoyu Tang
 




[jira] [Created] (HIVE-10404) hive.exec.parallel=true causes out of sequence response and SocketTimeoutException: Read timed out

2015-04-20 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-10404:
-

 Summary: hive.exec.parallel=true causes out of sequence response 
and SocketTimeoutException: Read timed out
 Key: HIVE-10404
 URL: https://issues.apache.org/jira/browse/HIVE-10404
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Eugene Koifman


With hive.exec.parallel=true, Driver.lauchTask() calls Task.initialize() from 1 
thread on several Tasks.  It then starts new threads to run those tasks.
Taks.initiazlie() gets an instance of Hive and holds on to it.  Hive.java 
internally uses ThreadLocal to hand out instances, but since Task.initialize() 
is called by a single thread from the Driver multiple tasks share an instance 
of Hive.

Each Hive instances has a single instance of MetaStoreClient; the later is not 
thread safe.

With hive.exec.parallel=true, different threads actually execute the tasks, 
different threads end up sharing the same MetaStoreClient.

If you make 2 concurrent calls, for example Hive.getTable(String), the Thrift 
responses may return to the wrong caller.
Thus the first caller gets out of sequence response, drops this message and 
reconnects.  If the timing is right, it will consume the other's response, but 
the the other caller will block for hive.metastore.client.socket.timeout since 
its response message has now been lost.

This is just one concrete example.

One possible fix is to make Task.db use ThreadLocal.

This could be related to HIVE-6893



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10402) LLAP: scheduling occasionally schedules all the work on one machine

2015-04-20 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10402:
---

 Summary: LLAP: scheduling occasionally schedules all the work on 
one machine
 Key: HIVE-10402
 URL: https://issues.apache.org/jira/browse/HIVE-10402
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Gopal V


That seems to have started happening after random scheduling changes. Myself 
and [~seth.siddha...@gmail.com] have run into the situations where all (or vast 
majority) of the work is scheduled on one machine (out of 6 and 3), the one 
first in the list in my case, even in absence of failures and after taking all 
the setup precautions.
Logs were available some time earlier.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join

2015-04-20 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-10403:


 Summary: Add n-way join support for Hybrid Grace Hash Join
 Key: HIVE-10403
 URL: https://issues.apache.org/jira/browse/HIVE-10403
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng


Currently Hybrid Grace Hash Join only supports 2-way join (one big table and 
one small table). This task will enable n-way join (one big table and multiple 
small tables).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

2015-04-20 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/
---

Review request for hive.


Bugs: HIVE-10382
https://issues.apache.org/jira/browse/HIVE-10382


Repository: hive-git


Description
---

Similar to the work done on the HBase branch (HIVE-9693), the stats cache can 
potentially have performance gains.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
  common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
  common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
  common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
  common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java 
PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
bf169c9 
  
metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
  ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
  ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java 
d0f3a5e 

Diff: https://reviews.apache.org/r/33367/diff/


Testing
---


Thanks,

Vaibhav Gumashta



Re: Review Request 32549: HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data

2015-04-20 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32549/
---

(Updated April 20, 2015, 6:42 p.m.)


Review request for hive, Gunther Hagleitner and Vikram Dixit Kumaraswamy.


Repository: hive-git


Description
---

In q.test environment with src table, execute the following query: 
{code}
CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE;

CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE;

FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
 UNION all 
  select s2.key as key, s2.value as value from src s2) unionsrc
INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT 
SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key
INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, 
COUNT(DISTINCT SUBSTR(unionsrc.value,5)) 
GROUP BY unionsrc.key, unionsrc.value;

select * from DEST1;
select * from DEST2;
{code}

DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row tst1  
  500 1


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Vertex.java 
b45c782 
  itests/src/test/resources/testconfiguration.properties 0a5d839 
  ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java 90616ad 
  ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 4dcdf91 
  ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 0990894 
  ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWorkWalker.java 08fd61e 
  ql/src/test/queries/clientpositive/explainuser_2.q 03264ca 
  ql/src/test/queries/clientpositive/tez_union_multiinsert.q PRE-CREATION 
  ql/src/test/results/clientpositive/tez/explainuser_2.q.out ea6b558 
  ql/src/test/results/clientpositive/tez/tez_union_multiinsert.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/32549/diff/


Testing
---


Thanks,

pengcheng xiong



Re: Review Request 33299: HIVE-10376: Move code to create jar for ivydownload.q to a separate id in maven ant-run-plugin in itests/pom.xml

2015-04-20 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33299/#review80658
---


LGTM, just one suggestion.


itests/qtest/pom.xml
https://reviews.apache.org/r/33299/#comment130788

I think we might add the required java classes in the itest project and use 
it in qtest from maven installation folder. Just one suggestion.


- cheng xu


On April 17, 2015, 11:12 a.m., Anant  Nag wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33299/
 ---
 
 (Updated April 17, 2015, 11:12 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10376
 https://issues.apache.org/jira/browse/HIVE-10376
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently the code to create an example jar for ivyDownload.q is piggybanked 
 on the download-spark ant-run-plugin id. This code should be moved to a 
 separate execution id called something like create-ivytest-jar or more 
 generally itests-setup.
 
 
 Diffs
 -
 
   itests/pom.xml 6f6cf742c41a11647589692ac4f266f467be2812 
   itests/qtest/pom.xml 1c3f74c113df4c1d45664f68098ec428666e1d3e 
 
 Diff: https://reviews.apache.org/r/33299/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Anant  Nag
 




[jira] [Created] (HIVE-10396) decimal_precision2.q test is failing on trunk

2015-04-20 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-10396:
---

 Summary: decimal_precision2.q test is failing on trunk
 Key: HIVE-10396
 URL: https://issues.apache.org/jira/browse/HIVE-10396
 Project: Hive
  Issue Type: Test
  Components: Types
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan


seems like missing golden file update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)