[jira] [Created] (HIVE-17406) Non-generic UDAF throws IllegalArgumentException for a complex input when column stats is not provided

2017-08-29 Thread Makoto Yui (JIRA)
Makoto Yui created HIVE-17406:
-

 Summary: Non-generic UDAF throws IllegalArgumentException for a 
complex input when column stats is not provided
 Key: HIVE-17406
 URL: https://issues.apache.org/jira/browse/HIVE-17406
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.2.0
Reporter: Makoto Yui
Priority: Minor


I found that Non-generic UDAF of Hive v2.2.0 throws IllegalArgumentException 
for a complex input when column stats is not provided. The exception does not 
occur in v2.1.0.

https://github.com/apache/hive/blob/34eebff194e81180202d198200e84058c4910d95/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L1156

{code:sql}
select version();
> 2.3.0-amzn-0 rcb482944667f96f43c89932dcb66d61ee7e4ac1d

with t2 as ( 
  select array(1,2) as c1 
  union all 
  select array(2,3) as c1
) 
select collect_list(c1) from t2;

> FAILED: IllegalArgumentException Size requested for unknown type: 
> java.util.Collection
{code}

On the other hand, it succeeds when colunm stats is provided as follows:

{code:sql}
create table t1 as (
  select array(1,2) as c1 
  union all
  select array(2,3) as c1
);

> select collect_list(c1) from t1;
[[1,2],[2,3]]

> desc formatted t1;
...   
Table Parameters:
COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
numFiles2   
numRows 2   
rawDataSize 6   
totalSize   8   
transient_lastDdlTime   1503990290
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17407) TPC-DS/query65 hangs on HoS in certain settings

2017-08-29 Thread liyunzhang_intel (JIRA)
liyunzhang_intel created HIVE-17407:
---

 Summary: TPC-DS/query65 hangs on HoS in certain settings
 Key: HIVE-17407
 URL: https://issues.apache.org/jira/browse/HIVE-17407
 Project: Hive
  Issue Type: Bug
Reporter: liyunzhang_intel


[TPC-DS/query65.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query65.sql]
 hangs when using following settings on 3TB scale.
{code}
set hive.auto.convert.join.noconditionaltask.size=300;
{code}
  the explain is attached in explain65. The screenshot shows that it hanged in 
the Stage5.

Let's explain why hang.
{code}
   Reducer 10 <- Map 9 (GROUP, 1009)
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 5 (PARTITION-LEVEL 
SORT, 1), Reducer 7 (PARTITION-LEVEL SORT, 1)
Reducer 3 <- Reducer 10 (PARTITION-LEVEL SORT, 1009), Reducer 2 
(PARTITION-LEVEL SORT, 1009)
Reducer 4 <- Reducer 3 (SORT, 1)
Reducer 7 <- Map 6 (GROUP PARTITION-LEVEL SORT, 1009)
{code}

The numPartitions of SparkEdgeProperty which connects Reducer 2 and Reducer 3 
is 1. This is because 
org.apache.hadoop.hive.ql.parse.spark.GenSparkUtils#createReduceWork
{code}
public ReduceWork createReduceWork(GenSparkProcContext context, Operator 
root,
SparkWork sparkWork) throws SemanticException {
   
for (Operator parentOfRoot : 
root.getParentOperators()) {
  Preconditions.checkArgument(parentOfRoot instanceof ReduceSinkOperator,
  "AssertionError: expected parentOfRoot to be an "
  + "instance of ReduceSinkOperator, but was "
  + parentOfRoot.getClass().getName());
  ReduceSinkOperator reduceSink = (ReduceSinkOperator) parentOfRoot;
  maxExecutors = Math.max(maxExecutors, 
reduceSink.getConf().getNumReducers());
}
reduceWork.setNumReduceTasks(maxExecutors);

{code}
here the numReducers of all parentOfRoot is 1( in the explain, the parallelism 
of Map 1, Map 5, Reducer 7 is 1), so the numPartitions of SparkEdgeProperty 
which connects Reducer 2 and Reducer 3 is 1. 
More explain why the parallelism of Map 1, Map 5,Reducer 7 are 1. The physical 
plan of the query is 
{code}
TS[0]-FIL[50]-RS[2]-JOIN[5]-FIL[49]-SEL[7]-GBY[8]-RS[9]-GBY[10]-SEL[11]-GBY[15]-SEL[16]-RS[33]-JOIN[34]-RS[36]-JOIN[39]-FIL[48]-SEL[41]-RS[42]-SEL[43]-LIM[44]-FS[45]
TS[1]-FIL[51]-RS[4]-JOIN[5]
TS[17]-FIL[53]-RS[19]-JOIN[22]-FIL[52]-SEL[24]-GBY[25]-RS[26]-GBY[27]-RS[38]-JOIN[39]
TS[18]-FIL[54]-RS[21]-JOIN[22]
TS[29]-FIL[55]-RS[31]-JOIN[34]
TS[30]-FIL[56]-RS[32]-JOIN[34]
{code}
The related RS of Map1, Map5, Reducer 7 is RS\[31\], RS\[32\], RS\[33\]. The 
parallelism is set by 
[SemanticAnalyzer#genJoinReduceSinkChild|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L8267]
It seems that there is no logical error in the code. But it is not reasonable 
to use 1 task to execute to deal with so big data(more than 30GB). Is there any 
way to pass the query in this situation( the reason why i set 
hive.auto.convert.join.noconditionaltask.size as 300, if the join is 
converted to the map join, it will throw disk error).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] hive pull request #239: HIVE-17359 Split out the typeinfo

2017-08-29 Thread alanfgates
GitHub user alanfgates opened a pull request:

https://github.com/apache/hive/pull/239

HIVE-17359 Split out the typeinfo

I took the approach of duplicating the necessary parts of TypeInfo on the 
metastore side.

Note that this does not solve the SerDe used by 
HiveMetaStore.get_fields_with_environment_context.  We'll solve that separately.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanfgates/hive hive17359

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #239


commit c799a52aeab95542aa5d9eb55dc3987af26f2878
Author: Alan Gates 
Date:   2017-08-02T18:38:27Z

Removed use of TypeInfo from metastore classes.  Added metastore specific 
type information.

commit 1e5e99e897df83bd2384f181797bbd9d4b810179
Author: Alan Gates 
Date:   2017-08-17T23:32:41Z

Fixed broken compilation in test.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 61956: HIVE-17323

2017-08-29 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61956/
---

(Updated Aug. 29, 2017, 6:24 p.m.)


Review request for hive, Gopal V and Jason Dere.


Changes
---

Added a fix for ConcurrentModificationException and using correct way to 
implement DFS.


Bugs: HIVE-17323
https://issues.apache.org/jira/browse/HIVE-17323


Repository: hive-git


Description
---

HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
https://issues.apache.org/jira/browse/HIVE-16260
However, it should also consider dynamic partition pruning edge like semijoin 
without removing it while traversing the query tree.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
478b0828a3 


Diff: https://reviews.apache.org/r/61956/diff/2/

Changes: https://reviews.apache.org/r/61956/diff/1-2/


Testing
---


Thanks,

Deepak Jaiswal



Re: Review Request 61956: HIVE-17323

2017-08-29 Thread Gopal V

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61956/#review184080
---




ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out
Line 2667 (original), 2893 (patched)


Result change?


- Gopal V


On Aug. 29, 2017, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61956/
> ---
> 
> (Updated Aug. 29, 2017, 6:24 p.m.)
> 
> 
> Review request for hive, Gopal V and Jason Dere.
> 
> 
> Bugs: HIVE-17323
> https://issues.apache.org/jira/browse/HIVE-17323
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
> https://issues.apache.org/jira/browse/HIVE-16260
> However, it should also consider dynamic partition pruning edge like semijoin 
> without removing it while traversing the query tree.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 478b0828a3 
> 
> 
> Diff: https://reviews.apache.org/r/61956/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



[jira] [Created] (HIVE-17408) replication distcp should only be invoked if number of files AND file size cross configured limits

2017-08-29 Thread anishek (JIRA)
anishek created HIVE-17408:
--

 Summary: replication distcp should only be invoked if number of 
files AND file size cross configured limits
 Key: HIVE-17408
 URL: https://issues.apache.org/jira/browse/HIVE-17408
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
Assignee: anishek
Priority: Trivial
 Fix For: 3.0.0


CopyUtils currently invokes distcp on whether "hive.exec.copyfile.maxnumfiles" 
or "hive.exec.copyfile.maxsize" condition is breached,  should only be invoked 
when both are breached so should be AND rather than OR. 

distcp cannot do a distributed copy of a large single file hence more reason to 
do the above change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 61956: HIVE-17323

2017-08-29 Thread Deepak Jaiswal


> On Aug. 29, 2017, 6:28 p.m., Gopal V wrote:
> > ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out
> > Line 2667 (original), 2893 (patched)
> > 
> >
> > Result change?

There are two changes to the tests.
1. An accidental copy-paste mistake led to same test being used twice, that was 
fixed causing the results to change. (notice the query modification in .q file)
2. Added a new test for which new results are added to the output.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61956/#review184080
---


On Aug. 29, 2017, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61956/
> ---
> 
> (Updated Aug. 29, 2017, 6:24 p.m.)
> 
> 
> Review request for hive, Gopal V and Jason Dere.
> 
> 
> Bugs: HIVE-17323
> https://issues.apache.org/jira/browse/HIVE-17323
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
> https://issues.apache.org/jira/browse/HIVE-16260
> However, it should also consider dynamic partition pruning edge like semijoin 
> without removing it while traversing the query tree.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 478b0828a3 
> 
> 
> Diff: https://reviews.apache.org/r/61956/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 61956: HIVE-17323

2017-08-29 Thread Deepak Jaiswal


> On Aug. 29, 2017, 6:28 p.m., Gopal V wrote:
> > ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out
> > Line 2667 (original), 2893 (patched)
> > 
> >
> > Result change?
> 
> Deepak Jaiswal wrote:
> There are two changes to the tests.
> 1. An accidental copy-paste mistake led to same test being used twice, 
> that was fixed causing the results to change. (notice the query modification 
> in .q file)
> 2. Added a new test for which new results are added to the output.

Oh wait, I missed this, let me take a look.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61956/#review184080
---


On Aug. 29, 2017, 6:24 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61956/
> ---
> 
> (Updated Aug. 29, 2017, 6:24 p.m.)
> 
> 
> Review request for hive, Gopal V and Jason Dere.
> 
> 
> Bugs: HIVE-17323
> https://issues.apache.org/jira/browse/HIVE-17323
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
> https://issues.apache.org/jira/browse/HIVE-16260
> However, it should also consider dynamic partition pruning edge like semijoin 
> without removing it while traversing the query tree.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 478b0828a3 
> 
> 
> Diff: https://reviews.apache.org/r/61956/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 61956: HIVE-17323

2017-08-29 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61956/
---

(Updated Aug. 29, 2017, 7:28 p.m.)


Review request for hive, Gopal V and Jason Dere.


Changes
---

Updated the test to work with and without semijoin reduction for better 
comparison.
The result is different because it is working on a smaller table created with 
only 40 rows.


Bugs: HIVE-17323
https://issues.apache.org/jira/browse/HIVE-17323


Repository: hive-git


Description
---

HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
https://issues.apache.org/jira/browse/HIVE-16260
However, it should also consider dynamic partition pruning edge like semijoin 
without removing it while traversing the query tree.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
478b0828a3 


Diff: https://reviews.apache.org/r/61956/diff/3/

Changes: https://reviews.apache.org/r/61956/diff/2-3/


Testing
---


Thanks,

Deepak Jaiswal



[jira] [Created] (HIVE-17409) refactor LLAP ZK registry to make the ZK-registry part reusable

2017-08-29 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17409:
---

 Summary: refactor LLAP ZK registry to make the ZK-registry part 
reusable
 Key: HIVE-17409
 URL: https://issues.apache.org/jira/browse/HIVE-17409
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Review Request 61976: HIVE-17409 refactor LLAP ZK registry to make the ZK-registry part reusable

2017-08-29 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61976/
---

Review request for hive and Prasanth_J.


Repository: hive-git


Description
---

see jira


Diffs
-

  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/LlapServiceInstanceSet.java
 PRE-CREATION 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceInstance.java 
70515c4ad3 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceInstanceSet.java
 cc124e76ee 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceInstanceStateChangeListener.java
 92eb8bdd13 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceRegistry.java 
5739d72994 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/InactiveServiceInstance.java
 9f2f3b4c3b 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java
 ebc32a155c 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapRegistryService.java
 76fc9c73a2 
  
llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 ad17144177 
  
llap-client/src/java/org/apache/hadoop/hive/llap/security/LlapTokenClient.java 
ace94759ac 
  llap-client/src/java/org/apache/hadoop/hive/registry/ServiceInstance.java 
PRE-CREATION 
  
llap-client/src/java/org/apache/hadoop/hive/registry/impl/ServiceInstanceBase.java
 PRE-CREATION 
  llap-client/src/java/org/apache/hadoop/hive/registry/impl/ZkRegistryBase.java 
PRE-CREATION 
  llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java 
201f5fa555 
  
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java
 1b57e38b05 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/services/impl/LlapWebServices.java
 ebc3437245 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java
 ff00aba110 
  
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
 6bedccbd18 
  
llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskSchedulerService.java
 339f513eb4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/Utils.java 2b57d906a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java
 a5ed308da1 


Diff: https://reviews.apache.org/r/61976/diff/1/


Testing
---


Thanks,

Sergey Shelukhin



Re: Review Request 61603: HIVE-17304: ThreadMXBean based memory allocation monitory for hash table loader

2017-08-29 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61603/
---

(Updated Aug. 29, 2017, 8:41 p.m.)


Review request for hive, Gopal V and Sergey Shelukhin.


Bugs: HIVE-17304
https://issues.apache.org/jira/browse/HIVE-17304


Repository: hive-git


Description (updated)
---

removed dead code


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
056f2d78346b6b306d34dfb610e3a7fed4ca68aa 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 
5bb9d7efcc09dc5df7b5a84d664ecd273793b2a5 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastHashTableLoader.java
 6c1ae2c356065dce4ff064f5c50103111c26c780 


Diff: https://reviews.apache.org/r/61603/diff/2/

Changes: https://reviews.apache.org/r/61603/diff/1-2/


Testing
---


Thanks,

Prasanth_J



[jira] [Created] (HIVE-17410) repl load task during subsequent DAG generation does not start from the last partition processed

2017-08-29 Thread anishek (JIRA)
anishek created HIVE-17410:
--

 Summary: repl load task during subsequent DAG generation does not 
start from the last partition processed
 Key: HIVE-17410
 URL: https://issues.apache.org/jira/browse/HIVE-17410
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek


DAG generation for repl load task was to be generated dynamically such that if 
the load break happens at a partition load time then for subsequent runs we 
should start post the last partition processed.

We currently identify the point from where we have to process the event but 
reinitialize the iterator to start from beginning of all partition's to process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 61165: HIVE-16811 Estimate statistics in absence of stats

2017-08-29 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61165/#review184071
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 1672 (patched)


varchar has length in its type. We shall use that.



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
Lines 199 (patched)


Need to pass shouldEstimateStats to this function so that it respects it.



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
Lines 231 (patched)


if config is false, it will return 1 as numrows, is that what we want?
It seems like caller has no way to figure out that stats are not estimated 
so it shall turn off stats based optimization.



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
Line 1156 (original), 1316 (patched)


Can you leave a TODO here?



ql/src/test/results/clientpositive/llap/explainuser_1.q.out
Line 138 (original), 138 (patched)


This doesn't look correct. Is this overflow?



ql/src/test/results/clientpositive/llap/insert1.q.out
Line 46 (original), 46 (patched)


Is change to COMPLETE expected?



ql/src/test/results/clientpositive/llap/jdbc_handler.q.out
Line 129 (original), 129 (patched)


Expected?



ql/src/test/results/clientpositive/llap/metadata_only_queries.q.out
Line 231 (original), 231 (patched)


Overflow in data size calculation?



ql/src/test/results/clientpositive/llap/orc_predicate_pushdown.q.out
Line 145 (original), 145 (patched)


Seems to happen only with Gby operator.



ql/src/test/results/clientpositive/llap/sqlmerge.q.out
Line 55 (original), 55 (patched)


Expected change?



ql/src/test/results/clientpositive/llap/subquery_select.q.out
Line 2490 (original), 2490 (patched)


stats change.. expected?



ql/src/test/results/clientpositive/llap/vector_leftsemi_mapjoin.q.out
Line  (original),  (patched)


Expected?



ql/src/test/results/clientpositive/llap/vector_number_compare_projection.q.out
Line 200 (original), 200 (patched)


Overflow.



ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out
Line 3567 (original), 3567 (patched)


State change expected?


- Ashutosh Chauhan


On Aug. 27, 2017, 11:11 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61165/
> ---
> 
> (Updated Aug. 27, 2017, 11:11 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-16811
> https://issues.apache.org/jira/browse/HIVE-16811
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This patch introduces estimation of statistics if stats doesn't already exist.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 0d8d7ae030 
>   itests/src/test/resources/testconfiguration.properties fa6a2aaea0 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java 
> 22790de209 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
>  ad29d65abb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ColStatistics.java f2d2e2dc0b 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 487a823d92 
>   ql/src/test/queries/clientpositive/join_reordering_no_stats.q PRE-CREATION 
>   ql/src/test/results/clientpositive/annotate_stats_filter.q.out e22c3ef0fc 
>   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out fccfabd5d1 
>   ql/src/test/results/clientpositive/annotate_stats_part.q.out 866d30a8ea 
>   ql/src/test/results/clientpositive/annotate_stats_select.q.out e3f08ea555 
>   ql/src/test/results/clientpositive/annotate_stats_table.q.out efc3c1f123 
>   ql/src/test/results/clientpositive/auto_join_reordering_values.q.out 
> 156be41502 
>   ql/src/test/results/clientpositive/auto_join_stats.q.out e80af96fcb 
>   ql/src/test/results/clientpositive/auto_join_stats2.q.out 6ea5afa920 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out d129807f55 
>   ql/src/test/results/clientpositive/cbo_rp_annot

[jira] [Created] (HIVE-17411) LLAP IO may incorrectly release a refcount in some cases

2017-08-29 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17411:
---

 Summary: LLAP IO may incorrectly release a refcount in some cases
 Key: HIVE-17411
 URL: https://issues.apache.org/jira/browse/HIVE-17411
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Not sure why this doesn't happen much more often, actually.
In a large stream whose buffers are not reused (e.g. a dictionary, that is 
locked once for all RGs), separated into many buffers (e.g. due to a small ORC 
compression buffer size), it may happen that some, but not all, buffers are 
evicted from cache.
If CacheBuffer follows BufferChunk in the buffer list, the latter will be 
converted to ProcCacheChunk;  it is possible for early refcount release logic 
from the former to release the refcount (for a dictionary it would always be 
released cause by definition there's no reuse), and then backtrack to the 
latter, and try to decref an uninitialized MemoryBuffer in ProcCacheChunk 
because ProcCacheChunk looks like a CacheChunk. PCC initial refcounts are 
released separately after the data is uncompressed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 61956: HIVE-17323

2017-08-29 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61956/
---

(Updated Aug. 30, 2017, 2:21 a.m.)


Review request for hive, Gopal V and Jason Dere.


Changes
---

Redid the code to track the parallel semijoin edges when it encounters a 
MapJoinOperator in small table side.


Bugs: HIVE-17323
https://issues.apache.org/jira/browse/HIVE-17323


Repository: hive-git


Description
---

HIVE-16260 allows removal of parallel edges of semijoin with mapjoins.
https://issues.apache.org/jira/browse/HIVE-16260
However, it should also consider dynamic partition pruning edge like semijoin 
without removing it while traversing the query tree.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
478b0828a3 


Diff: https://reviews.apache.org/r/61956/diff/4/

Changes: https://reviews.apache.org/r/61956/diff/3-4/


Testing
---


Thanks,

Deepak Jaiswal



Review Request 61985: HIVE-17399

2017-08-29 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61985/
---

Review request for hive, Gopal V and Jason Dere.


Bugs: HIVE-17399
https://issues.apache.org/jira/browse/HIVE-17399


Repository: hive-git


Description
---

Do not remove semijoin branch if it feeds to TS->DPP_EVENT


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java 
5d7b9e5c6d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
3bd35bf2d8 


Diff: https://reviews.apache.org/r/61985/diff/1/


Testing
---


Thanks,

Deepak Jaiswal



Re: Review Request 61985: HIVE-17399

2017-08-29 Thread Gopal V

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61985/#review184120
---




ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java
Lines 30 (patched)


I prefer a Boolean object, which goes from null -> false/true, so that you 
can encode (not-set, true, false) in one field.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java
Lines 57 (patched)


Add a Preconditions checkState that prevents this from going from true -> 
false.



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1371 (patched)


NPE issues possible?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1404 (patched)


Is that a break or a continue?

If this is a break out of the loop, then what happens to the rest of the 
items in dequeue?



ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q
Lines 115 (patched)


Pick a query with a non-zero result, so that we can see when it has false 
negatives (i.e loses rows it is meant to have?).


- Gopal V


On Aug. 30, 2017, 3:07 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61985/
> ---
> 
> (Updated Aug. 30, 2017, 3:07 a.m.)
> 
> 
> Review request for hive, Gopal V and Jason Dere.
> 
> 
> Bugs: HIVE-17399
> https://issues.apache.org/jira/browse/HIVE-17399
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Do not remove semijoin branch if it feeds to TS->DPP_EVENT
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java 
> 5d7b9e5c6d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 3bd35bf2d8 
> 
> 
> Diff: https://reviews.apache.org/r/61985/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 61663: WebUI query plan graphs

2017-08-29 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61663/#review184121
---



Patch looks good. I didn't read the js files, but hope it's okay. a few minor 
comments.


ql/src/java/org/apache/hadoop/hive/ql/QueryDisplay.java
Lines 127 (patched)


What happens if this is a map-only task?



ql/src/java/org/apache/hadoop/hive/ql/QueryDisplay.java
Lines 132 (patched)


It might be better if the null check is put in getCountersJson() method.



service/src/resources/hive-webapps/static/css/query-plan-graph.css
Lines 1 (patched)


Apache license header if possible.



service/src/resources/hive-webapps/static/js/query-plan-graph.js
Lines 1 (patched)


I think we need apache license header.



service/src/resources/hive-webapps/static/js/vis.min.js
Lines 1 (patched)


Apache license header.


- Xuefu Zhang


On Aug. 16, 2017, 1:55 p.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61663/
> ---
> 
> (Updated Aug. 16, 2017, 1:55 p.m.)
> 
> 
> Review request for hive, Peter Vary and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below.
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info.
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/LogUtils.java 0a3e0c7201 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3c158a6692 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 4e7c80f184 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 4b6051485e 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryDisplay.java bf6cb91745 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> 3c0719717c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 1bd4db7805 
>   service/src/jamon/org/apache/hive/tmpl/QueryProfileTmpl.jamon ff7476ee02 
>   service/src/resources/hive-webapps/static/css/query-plan-graph.css 
> PRE-CREATION 
>   service/src/resources/hive-webapps/static/js/query-plan-graph.js 
> PRE-CREATION 
>   service/src/resources/hive-webapps/static/js/vis.min.js PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/61663/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



[jira] [Created] (HIVE-17412) Add "-- SORT_QUERY_RESULTS" for spark_vectorized_dynamic_partition_pruning.q

2017-08-29 Thread liyunzhang_intel (JIRA)
liyunzhang_intel created HIVE-17412:
---

 Summary: Add "-- SORT_QUERY_RESULTS" for 
spark_vectorized_dynamic_partition_pruning.q
 Key: HIVE-17412
 URL: https://issues.apache.org/jira/browse/HIVE-17412
 Project: Hive
  Issue Type: Bug
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel


for query
{code}
 set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.vectorized.execution.enabled=true;
set hive.strict.checks.cartesian.product=false;
select distinct ds from srcpart;
{code}

the result is 
{code}
2008-04-09
2008-04-08
{code}
the result of groupby in spark is not in order. Sometimes it returns 
{code}
2008-04-08
2008-04-09
{code}
Sometimes it returns
{code}
2008-04-09
2008-04-08
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 61985: HIVE-17399

2017-08-29 Thread Deepak Jaiswal


> On Aug. 30, 2017, 3:35 a.m., Gopal V wrote:
> >

Thanks for quick turnaround. Working on the comments.


> On Aug. 30, 2017, 3:35 a.m., Gopal V wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java
> > Lines 30 (patched)
> > 
> >
> > I prefer a Boolean object, which goes from null -> false/true, so that 
> > you can encode (not-set, true, false) in one field.

Will do that.


> On Aug. 30, 2017, 3:35 a.m., Gopal V wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java
> > Lines 57 (patched)
> > 
> >
> > Add a Preconditions checkState that prevents this from going from true 
> > -> false.

Makes sense.


> On Aug. 30, 2017, 3:35 a.m., Gopal V wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
> > Lines 1371 (patched)
> > 
> >
> > NPE issues possible?

It can't happen. However, I can put this in a try catch block and assert if it 
happens.


> On Aug. 30, 2017, 3:35 a.m., Gopal V wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
> > Lines 1404 (patched)
> > 
> >
> > Is that a break or a continue?
> > 
> > If this is a break out of the loop, then what happens to the rest of 
> > the items in dequeue?

Yes, it is a break. For a given TS, there is only one DPP Event. No need to 
process the rest of the graph.


> On Aug. 30, 2017, 3:35 a.m., Gopal V wrote:
> > ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q
> > Lines 115 (patched)
> > 
> >
> > Pick a query with a non-zero result, so that we can see when it has 
> > false negatives (i.e loses rows it is meant to have?).

Sure.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61985/#review184120
---


On Aug. 30, 2017, 3:07 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61985/
> ---
> 
> (Updated Aug. 30, 2017, 3:07 a.m.)
> 
> 
> Review request for hive, Gopal V and Jason Dere.
> 
> 
> Bugs: HIVE-17399
> https://issues.apache.org/jira/browse/HIVE-17399
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Do not remove semijoin branch if it feeds to TS->DPP_EVENT
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java 
> 5d7b9e5c6d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 3bd35bf2d8 
> 
> 
> Diff: https://reviews.apache.org/r/61985/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



[GitHub] hive pull request #212: HIVE-17195: Long chain of tasks created by REPL LOAD...

2017-08-29 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/212


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 61985: HIVE-17399

2017-08-29 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61985/
---

(Updated Aug. 30, 2017, 5:58 a.m.)


Review request for hive, Gopal V and Jason Dere.


Changes
---

Implemented the review suggestions except for converting to Boolean Object. The 
usecase is pretty basic here.


Bugs: HIVE-17399
https://issues.apache.org/jira/browse/HIVE-17399


Repository: hive-git


Description
---

Do not remove semijoin branch if it feeds to TS->DPP_EVENT


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java 
5d7b9e5c6d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
3bd35bf2d8 


Diff: https://reviews.apache.org/r/61985/diff/2/

Changes: https://reviews.apache.org/r/61985/diff/1-2/


Testing
---


Thanks,

Deepak Jaiswal



Re: Review Request 61985: HIVE-17399

2017-08-29 Thread Deepak Jaiswal


> On Aug. 30, 2017, 3:35 a.m., Gopal V wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java
> > Lines 30 (patched)
> > 
> >
> > I prefer a Boolean object, which goes from null -> false/true, so that 
> > you can encode (not-set, true, false) in one field.
> 
> Deepak Jaiswal wrote:
> Will do that.

In this case, we have very straightforward states so keeping it as it is.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61985/#review184120
---


On Aug. 30, 2017, 5:58 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61985/
> ---
> 
> (Updated Aug. 30, 2017, 5:58 a.m.)
> 
> 
> Review request for hive, Gopal V and Jason Dere.
> 
> 
> Bugs: HIVE-17399
> https://issues.apache.org/jira/browse/HIVE-17399
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Do not remove semijoin branch if it feeds to TS->DPP_EVENT
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemiJoinBranchInfo.java 
> 5d7b9e5c6d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1671773d4a 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q b22890bc9d 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 3bd35bf2d8 
> 
> 
> Diff: https://reviews.apache.org/r/61985/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>