date:20170626

[jira] [Commented] (HIVE-16967) hive use tez engine occur org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0)

2017-06-26 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064319#comment-16064319
 ] 

Gopal V commented on HIVE-16967:


This looks like a duplicate of HIVE-9832, please confirm with an explain plan.

> hive use tez engine occur org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0)
> ---
>
> Key: HIVE-16967
> URL: https://issues.apache.org/jira/browse/HIVE-16967
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.1.0
> Environment: hive version:1.1.0
> tez version:0.85
> hadoop version:hadoop2.6 -cdh-5.10.0
>Reporter: xujie
>
> Status: Running (Executing on YARN cluster with App id 
> application_1494499520849_1063145)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 .   KILLED 15 1401   0  
>  1
> Map 10 .   SUCCEEDED 15 1500   0  
>  0
> Map 12 .   SUCCEEDED 15 1500   0  
>  0
> Map 13 .   SUCCEEDED  1  100   0  
>  0
> Map 14 .   SUCCEEDED  1  100   0  
>  0
> Map 15 .   SUCCEEDED  1  100   0  
>  0
> Map 16 .   SUCCEEDED  1  100   0  
>  0
> Map 17 .   SUCCEEDED  1  100   0  
>  0
> Map 4 ..   SUCCEEDED 15 1500   0  
>  0
> Map 5 ..   SUCCEEDED  1  100   0  
>  0
> Map 6 ..   SUCCEEDED  1  100   0  
>  0
> Map 7 ..   SUCCEEDED  1  100   0  
>  0
> Map 8 ..   SUCCEEDED  1  100   0  
>  0
> Map 9 ..   SUCCEEDED  1  100   0  
>  0
> Reducer 11FAILED315  00  315 774 
> 314
> Reducer 2 KILLED315  00  315   0 
> 315
> 
> VERTICES: 14/16  [==>>] 9%ELAPSED TIME: 130.89 s  
>  
> 
> Status: Failed
> Vertex re-running, vertexName=Map 12, 
> vertexId=vertex_1494499520849_1063145_1_07
> Vertex re-running, vertexName=Map 1, 
> vertexId=vertex_1494499520849_1063145_1_11
> Vertex re-running, vertexName=Map 10, 
> vertexId=vertex_1494499520849_1063145_1_06
> Vertex re-running, vertexName=Map 4, 
> vertexId=vertex_1494499520849_1063145_1_03
> Vertex re-running, vertexName=Map 10, 
> vertexId=vertex_1494499520849_1063145_1_06
> Vertex re-running, vertexName=Map 4, 
> vertexId=vertex_1494499520849_1063145_1_03
> Vertex re-running, vertexName=Map 12, 
> vertexId=vertex_1494499520849_1063145_1_07
> Vertex re-running, vertexName=Map 4, 
> vertexId=vertex_1494499520849_1063145_1_03
> Vertex re-running, vertexName=Map 10, 
> vertexId=vertex_1494499520849_1063145_1_06
> Vertex re-running, vertexName=Map 5, 
> vertexId=vertex_1494499520849_1063145_1_02
> Vertex failed, vertexName=Reducer 11, 
> vertexId=vertex_1494499520849_1063145_1_14, diagnostics=[Task failed, 
> taskId=task_1494499520849_1063145_1_14_13, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1494499520849_1063145_1_14_13_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":"0002991000585141671"},"value":{"_col0":"","_col1":""}}
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:187)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hado

[jira] [Commented] (HIVE-16969) Improvement performance of MapOperator for Parquet

2017-06-26 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064252#comment-16064252
 ] 

Ferdinand Xu commented on HIVE-16969:
-

+1 pending to the test

> Improvement performance of MapOperator for Parquet
> --
>
> Key: HIVE-16969
> URL: https://issues.apache.org/jira/browse/HIVE-16969
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16969.001.patch
>
>
> For a table with many partition files, 
> MapOperator.cloneConfsForNestedColPruning() will update the 
> hive.io.file.readNestedColumn.paths many times. The larger value of 
> hive.io.file.readNestedColumn.paths will cause the poor performance for 
> ParquetHiveSerDe.processRawPrunedPaths(). 
> So, the unnecessary paths should not be appended to 
> hive.io.file.readNestedColumn.paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16970) General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils

2017-06-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16970:
---
Status: Patch Available  (was: Open)

> General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils
> -
>
> Key: HIVE-16970
> URL: https://issues.apache.org/jira/browse/HIVE-16970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16970.1.patch
>
>
> # Simplify
> # Do not initiate empty collections
> # Parsing is incorrect:
> {code:title=org.apache.hadoop.hive.metastore.cache.CacheUtils}
>   public static String buildKey(String dbName, String tableName, List 
> partVals) {
> String key = buildKey(dbName, tableName);
> if (partVals == null || partVals.size() == 0) {
>   return key;
> }
> // missing a delimiter between the "tableName" and the first "partVal"
> for (int i = 0; i < partVals.size(); i++) {
>   key += partVals.get(i);
>   if (i != partVals.size() - 1) {
> key += delimit;
>   }
> }
> return key;
>   }
> public static Object[] splitPartitionColStats(String key) {
> // ...
> }
> {code}
> The result of passing the key to the "split" method is:
> {code}
> buildKey("db","Table",["Part1","Part2","Part3"], "col");
> [db, tablePart1, [Part2, Part3], col]
> // "table" and "Part1" is mistakenly concatenated
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16970) General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils

2017-06-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16970:
---
Description: 
# Simplify
# Do not initiate empty collections
# Parsing is incorrect:

{code:title=org.apache.hadoop.hive.metastore.cache.CacheUtils}
  public static String buildKey(String dbName, String tableName, List 
partVals) {
String key = buildKey(dbName, tableName);
if (partVals == null || partVals.size() == 0) {
  return key;
}
// missing a delimiter between the "tableName" and the first "partVal"
for (int i = 0; i < partVals.size(); i++) {
  key += partVals.get(i);
  if (i != partVals.size() - 1) {
key += delimit;
  }
}
return key;
  }

public static Object[] splitPartitionColStats(String key) {
// ...
}
{code}

The result of passing the key to the "split" method is:

{code}
buildKey("db","Table",["Part1","Part2","Part3"], "col");
[db, tablePart1, [Part2, Part3], col]
// "table" and "Part1" is mistakenly concatenated
{code}

  was:
# Simplify
# Do not initiate empty collections
# Parsing is incorrect:

{code:title=org.apache.hadoop.hive.metastore.cache.CacheUtils}
  public static String buildKey(String dbName, String tableName, List 
partVals) {
String key = buildKey(dbName, tableName);
if (partVals == null || partVals.size() == 0) {
  return key;
}
for (int i = 0; i < partVals.size(); i++) {
  key += partVals.get(i);
  if (i != partVals.size() - 1) {
key += delimit;
  }
}
return key;
  }

public static Object[] splitPartitionColStats(String key) {
// ...
}
{code}

The result of passing the key to the "split" method is:

{code}
buildKey("db","Table",["Part1","Part2","Part3"], "col");
[db, tablePart1, [Part2, Part3], col]
// "table" and "Part1" is mistakenly concatenated
{code}


> General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils
> -
>
> Key: HIVE-16970
> URL: https://issues.apache.org/jira/browse/HIVE-16970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16970.1.patch
>
>
> # Simplify
> # Do not initiate empty collections
> # Parsing is incorrect:
> {code:title=org.apache.hadoop.hive.metastore.cache.CacheUtils}
>   public static String buildKey(String dbName, String tableName, List 
> partVals) {
> String key = buildKey(dbName, tableName);
> if (partVals == null || partVals.size() == 0) {
>   return key;
> }
> // missing a delimiter between the "tableName" and the first "partVal"
> for (int i = 0; i < partVals.size(); i++) {
>   key += partVals.get(i);
>   if (i != partVals.size() - 1) {
> key += delimit;
>   }
> }
> return key;
>   }
> public static Object[] splitPartitionColStats(String key) {
> // ...
> }
> {code}
> The result of passing the key to the "split" method is:
> {code}
> buildKey("db","Table",["Part1","Part2","Part3"], "col");
> [db, tablePart1, [Part2, Part3], col]
> // "table" and "Part1" is mistakenly concatenated
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16970) General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils

2017-06-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16970:
---
Attachment: HIVE-16970.1.patch

> General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils
> -
>
> Key: HIVE-16970
> URL: https://issues.apache.org/jira/browse/HIVE-16970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16970.1.patch
>
>
> # Simplify
> # Do not initiate empty collections
> # Parsing is incorrect:
> {code:title=org.apache.hadoop.hive.metastore.cache.CacheUtils}
>   public static String buildKey(String dbName, String tableName, List 
> partVals) {
> String key = buildKey(dbName, tableName);
> if (partVals == null || partVals.size() == 0) {
>   return key;
> }
> // missing a delimiter between the "tableName" and the first "partVal"
> for (int i = 0; i < partVals.size(); i++) {
>   key += partVals.get(i);
>   if (i != partVals.size() - 1) {
> key += delimit;
>   }
> }
> return key;
>   }
> public static Object[] splitPartitionColStats(String key) {
> // ...
> }
> {code}
> The result of passing the key to the "split" method is:
> {code}
> buildKey("db","Table",["Part1","Part2","Part3"], "col");
> [db, tablePart1, [Part2, Part3], col]
> // "table" and "Part1" is mistakenly concatenated
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16970) General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils

2017-06-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16970:
---
Description: 
# Simplify
# Do not initiate empty collections
# Parsing is incorrect:

{code:title=org.apache.hadoop.hive.metastore.cache.CacheUtils}
  public static String buildKey(String dbName, String tableName, List 
partVals) {
String key = buildKey(dbName, tableName);
if (partVals == null || partVals.size() == 0) {
  return key;
}
for (int i = 0; i < partVals.size(); i++) {
  key += partVals.get(i);
  if (i != partVals.size() - 1) {
key += delimit;
  }
}
return key;
  }

public static Object[] splitPartitionColStats(String key) {
// ...
}
{code}

The result of passing the key to the "split" method is:

{code}
buildKey("db","Table",["Part1","Part2","Part3"], "col");
[db, tablePart1, [Part2, Part3], col]
// "table" and "Part1" is mistakenly concatenated
{code}

  was:
# Simplify
# Do not initiate empty collections


> General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils
> -
>
> Key: HIVE-16970
> URL: https://issues.apache.org/jira/browse/HIVE-16970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
>
> # Simplify
> # Do not initiate empty collections
> # Parsing is incorrect:
> {code:title=org.apache.hadoop.hive.metastore.cache.CacheUtils}
>   public static String buildKey(String dbName, String tableName, List 
> partVals) {
> String key = buildKey(dbName, tableName);
> if (partVals == null || partVals.size() == 0) {
>   return key;
> }
> for (int i = 0; i < partVals.size(); i++) {
>   key += partVals.get(i);
>   if (i != partVals.size() - 1) {
> key += delimit;
>   }
> }
> return key;
>   }
> public static Object[] splitPartitionColStats(String key) {
> // ...
> }
> {code}
> The result of passing the key to the "split" method is:
> {code}
> buildKey("db","Table",["Part1","Part2","Part3"], "col");
> [db, tablePart1, [Part2, Part3], col]
> // "table" and "Part1" is mistakenly concatenated
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16969) Improvement performance of MapOperator for Parquet

2017-06-26 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16969:

Description: 
For a table with many partition files, 
MapOperator.cloneConfsForNestedColPruning() will update the 
hive.io.file.readNestedColumn.paths many times. The larger value of 
hive.io.file.readNestedColumn.paths will cause the poor performance for 
ParquetHiveSerDe.processRawPrunedPaths(). 
So, the unnecessary paths should not be appended to 
hive.io.file.readNestedColumn.paths.

  was:
For a table with many partition files, 
MapOperator.cloneConfsForNestedColPruning() will update the 
hive.io.file.readNestedColumn.paths many times. The larger value of 
hive.io.file.readNestedColumn.paths will cause the poor performance for 
ParquetHiveSerDe.processRawPrunedPaths(). 
So, the unnecessary paths should be appended to 
hive.io.file.readNestedColumn.paths.


> Improvement performance of MapOperator for Parquet
> --
>
> Key: HIVE-16969
> URL: https://issues.apache.org/jira/browse/HIVE-16969
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16969.001.patch
>
>
> For a table with many partition files, 
> MapOperator.cloneConfsForNestedColPruning() will update the 
> hive.io.file.readNestedColumn.paths many times. The larger value of 
> hive.io.file.readNestedColumn.paths will cause the poor performance for 
> ParquetHiveSerDe.processRawPrunedPaths(). 
> So, the unnecessary paths should not be appended to 
> hive.io.file.readNestedColumn.paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16969) Improvement performance of MapOperator for Parquet

2017-06-26 Thread Colin Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064191#comment-16064191
 ] 

Colin Ma commented on HIVE-16969:
-

[~csun], [~Ferd], can you help to review the patch? Thanks.

> Improvement performance of MapOperator for Parquet
> --
>
> Key: HIVE-16969
> URL: https://issues.apache.org/jira/browse/HIVE-16969
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16969.001.patch
>
>
> For a table with many partition files, 
> MapOperator.cloneConfsForNestedColPruning() will update the 
> hive.io.file.readNestedColumn.paths many times. The larger value of 
> hive.io.file.readNestedColumn.paths will cause the poor performance for 
> ParquetHiveSerDe.processRawPrunedPaths(). 
> So, the unnecessary paths should be appended to 
> hive.io.file.readNestedColumn.paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16969) Improvement performance of MapOperator for Parquet

2017-06-26 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16969:

Status: Patch Available  (was: Open)

> Improvement performance of MapOperator for Parquet
> --
>
> Key: HIVE-16969
> URL: https://issues.apache.org/jira/browse/HIVE-16969
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16969.001.patch
>
>
> For a table with many partition files, 
> MapOperator.cloneConfsForNestedColPruning() will update the 
> hive.io.file.readNestedColumn.paths many times. The larger value of 
> hive.io.file.readNestedColumn.paths will cause the poor performance for 
> ParquetHiveSerDe.processRawPrunedPaths(). 
> So, the unnecessary paths should be appended to 
> hive.io.file.readNestedColumn.paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16969) Improvement performance of MapOperator for Parquet

2017-06-26 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-16969:

Attachment: HIVE-16969.001.patch

With the patch, I test the query13 of TPC-DS in my local cluster, The cluster 
includes 6 nodes, 128G memory/per node, CPU is Intel(R) Xeon(R) E5-2680, 1G 
network. With the 10G data scale and spark as executor engine. The table is 
stored as Parquet file, and the partition number of the largest table is 1825. 
The result shows the execution time from {color:red}85s{color} to 
{color:#14892c}71s{color}, and the initial time of MapOperator from 
{color:red}15s{color} to {color:#14892c}less than 1s{color}.

> Improvement performance of MapOperator for Parquet
> --
>
> Key: HIVE-16969
> URL: https://issues.apache.org/jira/browse/HIVE-16969
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
> Attachments: HIVE-16969.001.patch
>
>
> For a table with many partition files, 
> MapOperator.cloneConfsForNestedColPruning() will update the 
> hive.io.file.readNestedColumn.paths many times. The larger value of 
> hive.io.file.readNestedColumn.paths will cause the poor performance for 
> ParquetHiveSerDe.processRawPrunedPaths(). 
> So, the unnecessary paths should be appended to 
> hive.io.file.readNestedColumn.paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16971) improve explain when invalidate stats

2017-06-26 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16971:
--


> improve explain when invalidate stats
> -
>
> Key: HIVE-16971
> URL: https://issues.apache.org/jira/browse/HIVE-16971
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> for example, in a load statement, we use statsTask to invalidate stats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16970) General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils

2017-06-26 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR reassigned HIVE-16970:
--

Assignee: BELUGA BEHR

> General Improvements To org.apache.hadoop.hive.metastore.cache.CacheUtils
> -
>
> Key: HIVE-16970
> URL: https://issues.apache.org/jira/browse/HIVE-16970
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
>
> # Simplify
> # Do not initiate empty collections



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16969) Improvement performance of MapOperator for Parquet

2017-06-26 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma reassigned HIVE-16969:
---


> Improvement performance of MapOperator for Parquet
> --
>
> Key: HIVE-16969
> URL: https://issues.apache.org/jira/browse/HIVE-16969
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
> Fix For: 3.0.0
>
>
> For a table with many partition files, 
> MapOperator.cloneConfsForNestedColPruning() will update the 
> hive.io.file.readNestedColumn.paths many times. The larger value of 
> hive.io.file.readNestedColumn.paths will cause the poor performance for 
> ParquetHiveSerDe.processRawPrunedPaths(). 
> So, the unnecessary paths should be appended to 
> hive.io.file.readNestedColumn.paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16958) Setting hive.merge.sparkfiles=true will retrun an error when generating parquet databases

2017-06-26 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064070#comment-16064070
 ] 

liyunzhang_intel commented on HIVE-16958:
-

[~lirui]: this also happened in MR.

> Setting hive.merge.sparkfiles=true will retrun an error when generating 
> parquet databases 
> --
>
> Key: HIVE-16958
> URL: https://issues.apache.org/jira/browse/HIVE-16958
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0
> Environment: centos7 hadoop2.7.3 spark2.0.0
>Reporter: Liu Chunxiao
>Priority: Minor
> Attachments: parquet-hivemergesparkfiles.txt, sale.sql
>
>
> The process will return 
> Job failed with java.lang.NullPointerException
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 1.0 (TID 31, bdpe822n1): java.io.IOException: 
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:217)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:695)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.GeneratedConstructorAccessor26.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
>   ... 17 more
> Caused by: java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:118)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:189)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:84)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:74)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReader

[jira] [Commented] (HIVE-16967) hive use tez engine occur org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0)

2017-06-26 Thread xujie (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064053#comment-16064053
 ] 

xujie commented on HIVE-16967:
--

set hive.execution.engine=tez;
WITH TEMP_S02_CARD_DATE_H AS
(SELECT
CARD_NBR AS CARD_NUM
,MIN(CARD_ACTIVE_DATE) AS CARD_ACTIVE_DATE
,MAX(DATE_ONE_ACTIVE) AS DATE_ONE_ACTIVE
,MAX(DATE_EXPIRE) AS DATE_EXPIRE
--FROM MID.FS_VP_AMED_CUR
FROM BASE.VP_AMED
WHERE DT = '20170604'
GROUP BY CARD_NBR
)
,
– 卡片信息
TMP_CREDIT_CARD_01 AS
(SELECT
POST_TO_ACCT AS POST_TO_ACCT – 对应帐户的帐号
,CARD_NBR AS CARD_NBR – 卡号
,ORG AS ORG – 币种
,USER_CODE_2 AS USER_CODE_2 – 是否该帐户第一张卡标志
,DATE_EXPIRE AS DATE_EXPIRE – 卡片到期日
,BLOCK_CODE AS BLOCK_CODE – BLOCK_CODE
,CARD_ACTIVE_DATE AS CARD_ACTIVE_DATE – 卡片激活日期
,DATE_ONE_ACTIVE AS DATE_ONE_ACTIVE – 一帐通激活日期
,LOGO AS LOGO – TYPE卡种卡类
,CARD_STATUS_CD AS CARD_STATUS_CD – 卡状态代码
,FIRST_CARD_VERIFY_DATE AS FIRST_CARD_VERIFY_DATE – 可激活日期 --20130914
--,ROW_NUMBER() OVER(PARTITION BY POST_TO_ACCT ORDER BY APPLICATION_NO) AS 
CARD_SEQ
FROM(SELECT
E1.POST_TO_ACCT AS POST_TO_ACCT
,E1.CARD_NBR AS CARD_NBR
,E1.ORG AS ORG
,E1.USER_CODE_2 AS USER_CODE_2
,E1.USER_3 AS APPLICATION_NO
,E3.DATE_EXPIRE AS DATE_EXPIRE
,E1.BLOCK_CODE AS BLOCK_CODE
,E3.CARD_ACTIVE_DATE AS CARD_ACTIVE_DATE
,E3.DATE_ONE_ACTIVE AS DATE_ONE_ACTIVE
,E1.LOGO AS LOGO
,E1.CURR_FIRST_USAGE_FLAG AS CARD_STATUS_CD
,E1.DATE_FIRST_CARD_VERIFY AS FIRST_CARD_VERIFY_DATE
--FROM MID.FS_VP_AMED_CUR E1
FROM BASE.VP_AMED E1
LEFT JOIN TEMP_S02_CARD_DATE_H E3
ON E1.CARD_NBR = E3.CARD_NUM
WHERE E1.DT = '20170604'
) T
)
,
– 注销账户判断
TEMP_ACCT_WRITE_OFF AS
(SELECT
ACCT
,CASE WHEN SUM(IND) > 0
THEN 1
ELSE 0
END AS PASS_WRITE_OFF_IND
--账户双边每边至少有一个BLOCK CODE为B,Q,A,M,S之一
FROM
(SELECT
ACCT
,CASE WHEN SUM(IND) >= 1 THEN 1
ELSE 0
END AS IND
FROM(SELECT
ACCT
,ORG
,CASE WHEN (BLOCK_CODE_1 IN ('B','Q','A','M','S') OR BLOCK_CODE_2 IN 
('B','Q','A','M','S')) THEN 1
ELSE 0
END AS IND
FROM BASE.VP_AMBS_TMP_CREDIT_ACCT_H_01
WHERE DT = '20170604'
) T
GROUP BY ACCT
UNION ALL
--卡片存在X
SELECT
ACCT
,CASE WHEN SUM(CARD_NBR_SUM) >= 1 THEN 1
ELSE 0
END AS IND
FROM(SELECT
POST_TO_ACCT AS ACCT
,SUM(CASE WHEN COALESCE(BLOCK_CODE,'') ='X' THEN 1
ELSE 0
END) AS CARD_NBR_SUM
FROM TMP_CREDIT_CARD_01
GROUP BY POST_TO_ACCT) T
GROUP BY ACCT
) T
GROUP BY ACCT)
,
--取账单日和下一账单日
TEMP_STMT_DATE AS
(SELECT
F1.ACCT AS ACCOUNT_NUM
,F1.ORG AS ACCOUNT_MODIFIER_NUM
,MAX(CASE WHEN F1.DATE_LAST_CYCLE IS NOT NULL THEN F1.DATE_LAST_CYCLE
ELSE CONCAT('201705',F1.BILLING_CYCLE)
END) AS STMT_DATE --账单日
,MAX(CONCAT('201706',F1.BILLING_CYCLE)) AS NEXT_STMT_DATE --下一账单日
FROM BASE.VP_AMBS F1 --取上个月的
WHERE F1.DT = '20170531'
GROUP BY F1.ACCT,F1.ORG
)
,
TMP_CREDIT_ACCT_H_G1 AS
(SELECT
ACCT
,COALESCE(BLOCK_CODE_1,' ') AS BLOCK_CODE_1
,COALESCE(BLOCK_CODE_2,' ') AS BLOCK_CODE_2
,BLOCK_CODE_1_SET_DATE AS BLOCK_CODE_1_SET_DATE
,BLOCK_CODE_2_SET_DATE AS BLOCK_CODE_2_SET_DATE
,CUST_TYPE
FROM BASE.VP_AMBS_TMP_CREDIT_ACCT_H_01
WHERE DT = '20170604' 
AND ORG = '242'
)
,
TMP_CREDIT_ACCT_H_G2 AS
(SELECT
ACCT
,COALESCE(BLOCK_CODE_1,' ') AS BLOCK_CODE_1
,CASE WHEN BLOCK_CODE_2 = 'Z' AND COALESCE(BLOCK_CODE_2_MEMO,' ') IN 
('UCF','CUCF')
THEN ' ' – 美元边需要将z+ucf/cucf的block_code变为' '
ELSE COALESCE(BLOCK_CODE_2,' ')
END AS BLOCK_CODE_2
,BLOCK_CODE_1_SET_DATE AS BLOCK_CODE_1_SET_DATE
,BLOCK_CODE_2_SET_DATE AS BLOCK_CODE_2_SET_DATE
,CUST_TYPE
FROM BASE.VP_AMBS_TMP_CREDIT_ACCT_H_01
WHERE DT = '20170604'
AND ORG = '241'
)
,
– 汇总双边账户的block_code
TMP_CREDIT_ACCT_H_04 AS
(SELECT
G1.ACCT
,G1.BLOCK_CODE_1 AS BLOCK_CODE_1 – 人民币边block_code_1
,coalesce(G3.PRI,0) AS PRI_1 – 人民币边block_code_1的优先级
,coalesce(G1.BLOCK_CODE_1_SET_DATE,'29991231') AS BLOCK_CODE_1_SET_DATE – 
人民币边Block_Code_1_Set_Date
,G1.BLOCK_CODE_2 AS BLOCK_CODE_2 – 人民币边block_code_2
,coalesce(G4.PRI,0) AS PRI_2 – 人民币边block_code_2的优先级
,coalesce(G1.BLOCK_CODE_2_SET_DATE,'29991231') AS BLOCK_CODE_2_SET_DATE – 
人民币边Block_Code_2_Set_Date
,G2.BLOCK_CODE_1 AS BLOCK_CODE_3 – 美元边block_code_1
,coalesce(G5.PRI,0) AS PRI_3 – 美元边block_code_1的优先级
,coalesce(G2.BLOCK_CODE_1_SET_DATE,'29991231') AS BLOCK_CODE_3_SET_DATE – 
美元边Block_Code_1_Set_Date
,G2.BLOCK_CODE_2 AS BLOCK_CODE_4 – 美元边block_code_2
,coalesce(G6.PRI,0) AS PRI_4 – 美元边block_code_2的优先级
,coalesce(G2.BLOCK_CODE_2_SET_DATE,'29991231') AS BLOCK_CODE_4_SET_DATE – 
美元边Block_Code_2_Set_Date
,G1.CUST_TYPE AS CUST_TYPE_1 – 纯费用标志客户类型
,G2.CUST_TYPE AS CUST_TYPE_2 – 纯费用标志客户类型
FROM TMP_CREDIT_ACCT_H_G1 G1
LEFT JOIN TMP_CREDIT_ACCT_H_G2 G2
ON G1.ACCT = G2.ACCT
LEFT JOIN DIM.CCM_CODE_CFG_BLOCK_CODE G3
ON COALESCE(G1.BLOCK_CODE_1,' ') = COALESCE(G3.BLOCK_CODE,' ')
LEFT JOIN DIM.CCM_CODE_CFG_BLOCK_CODE G4
ON COALESCE(G1.BLOCK_CODE_2,' ') = COALESCE(G4.BLOCK_CODE,' ')
LEFT JOIN DIM.CCM_CODE_CFG_BLOCK_CODE G5
ON COALESCE(G2.BLOCK_CODE_1,' ') = COALESCE(G5.BLOCK_CODE,' ')
LEFT JOIN DIM.CCM_CODE_CFG_BLOCK_CODE G6
ON COALESCE(G2.BLOCK_CODE_2,' ') = COALESCE(G6.BLOCK_CODE,' ')
),
TEMP_TMP_CREDIT_ACCT_H_04_1 AS (
SELECT
ACCT
,'242' AS OR

[jira] [Updated] (HIVE-16951) ACID Compactor, PartialScanTask, MergeFileTask, ColumnTruncateTask, HCatUtil don't close JobClient

2017-06-26 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-16951:

Description: 
When a compaction job is launched, we create a new JobClient everytime we run 
the MR job:
{code}
  private void launchCompactionJob(JobConf job, Path baseDir, CompactionType 
compactionType,
   StringableList dirsToSearch,
   List parsedDeltas,
   int curDirNumber, int obsoleteDirNumber, 
HiveConf hiveConf,
   TxnStore txnHandler, long id, String 
jobName) throws IOException {
job.setBoolean(IS_MAJOR, compactionType == CompactionType.MAJOR);
if(dirsToSearch == null) {
  dirsToSearch = new StringableList();
}
StringableList deltaDirs = new StringableList();
long minTxn = Long.MAX_VALUE;
long maxTxn = Long.MIN_VALUE;
for (AcidUtils.ParsedDelta delta : parsedDeltas) {
  LOG.debug("Adding delta " + delta.getPath() + " to directories to 
search");
  dirsToSearch.add(delta.getPath());
  deltaDirs.add(delta.getPath());
  minTxn = Math.min(minTxn, delta.getMinTransaction());
  maxTxn = Math.max(maxTxn, delta.getMaxTransaction());
}

if (baseDir != null) job.set(BASE_DIR, baseDir.toString());
job.set(DELTA_DIRS, deltaDirs.toString());
job.set(DIRS_TO_SEARCH, dirsToSearch.toString());
job.setLong(MIN_TXN, minTxn);
job.setLong(MAX_TXN, maxTxn);

if (hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST)) {
  mrJob = job;
}

LOG.info("Submitting " + compactionType + " compaction job '" +
  job.getJobName() + "' to " + job.getQueueName() + " queue.  " +
  "(current delta dirs count=" + curDirNumber +
  ", obsolete delta dirs count=" + obsoleteDirNumber + ". TxnIdRange[" + 
minTxn + "," + maxTxn + "]");
RunningJob rj = new JobClient(job).submitJob(job);
LOG.info("Submitted compaction job '" + job.getJobName() + "' with jobID=" 
+ rj.getID() + " compaction ID=" + id);
txnHandler.setHadoopJobId(rj.getID().toString(), id);
rj.waitForCompletion();
if (!rj.isSuccessful()) {
  throw new IOException(compactionType == CompactionType.MAJOR ? "Major" : 
"Minor" +
  " compactor job failed for " + jobName + "! Hadoop JobId: " + 
rj.getID() );
}
  }
{code}

We should close the JobClient to release resources (cached FS objects etc).

Similarly for other classes listed above.

  was:
When a compaction job is launched, we create a new JobClient everytime we run 
the MR job:
{code}
  private void launchCompactionJob(JobConf job, Path baseDir, CompactionType 
compactionType,
   StringableList dirsToSearch,
   List parsedDeltas,
   int curDirNumber, int obsoleteDirNumber, 
HiveConf hiveConf,
   TxnStore txnHandler, long id, String 
jobName) throws IOException {
job.setBoolean(IS_MAJOR, compactionType == CompactionType.MAJOR);
if(dirsToSearch == null) {
  dirsToSearch = new StringableList();
}
StringableList deltaDirs = new StringableList();
long minTxn = Long.MAX_VALUE;
long maxTxn = Long.MIN_VALUE;
for (AcidUtils.ParsedDelta delta : parsedDeltas) {
  LOG.debug("Adding delta " + delta.getPath() + " to directories to 
search");
  dirsToSearch.add(delta.getPath());
  deltaDirs.add(delta.getPath());
  minTxn = Math.min(minTxn, delta.getMinTransaction());
  maxTxn = Math.max(maxTxn, delta.getMaxTransaction());
}

if (baseDir != null) job.set(BASE_DIR, baseDir.toString());
job.set(DELTA_DIRS, deltaDirs.toString());
job.set(DIRS_TO_SEARCH, dirsToSearch.toString());
job.setLong(MIN_TXN, minTxn);
job.setLong(MAX_TXN, maxTxn);

if (hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST)) {
  mrJob = job;
}

LOG.info("Submitting " + compactionType + " compaction job '" +
  job.getJobName() + "' to " + job.getQueueName() + " queue.  " +
  "(current delta dirs count=" + curDirNumber +
  ", obsolete delta dirs count=" + obsoleteDirNumber + ". TxnIdRange[" + 
minTxn + "," + maxTxn + "]");
RunningJob rj = new JobClient(job).submitJob(job);
LOG.info("Submitted compaction job '" + job.getJobName() + "' with jobID=" 
+ rj.getID() + " compaction ID=" + id);
txnHandler.setHadoopJobId(rj.getID().toString(), id);
rj.waitForCompletion();
if (!rj.isSuccessful()) {
  throw new IOException(compactionType == CompactionType.MAJOR ? "Major" : 
"Minor" +
  " compactor job failed for " + jobName + "! Hadoop JobId: " + 
rj.getID() );
}
  }
{code}

We should close the JobClient to release resources (cached FS objects etc).


> ACID Compactor, PartialScanTask, MergeFileTask, ColumnTruncateTask, HCatUtil 
> don't close JobClient
> --

[jira] [Updated] (HIVE-16951) ACID Compactor, PartialScanTask, MergeFileTask, ColumnTruncateTask, HCatUtil don't close JobClient

2017-06-26 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-16951:

Summary: ACID Compactor, PartialScanTask, MergeFileTask, 
ColumnTruncateTask, HCatUtil don't close JobClient  (was: ACID: Compactor 
doesn't close JobClient)

> ACID Compactor, PartialScanTask, MergeFileTask, ColumnTruncateTask, HCatUtil 
> don't close JobClient
> --
>
> Key: HIVE-16951
> URL: https://issues.apache.org/jira/browse/HIVE-16951
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.2, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> When a compaction job is launched, we create a new JobClient everytime we run 
> the MR job:
> {code}
>   private void launchCompactionJob(JobConf job, Path baseDir, CompactionType 
> compactionType,
>StringableList dirsToSearch,
>List parsedDeltas,
>int curDirNumber, int obsoleteDirNumber, 
> HiveConf hiveConf,
>TxnStore txnHandler, long id, String 
> jobName) throws IOException {
> job.setBoolean(IS_MAJOR, compactionType == CompactionType.MAJOR);
> if(dirsToSearch == null) {
>   dirsToSearch = new StringableList();
> }
> StringableList deltaDirs = new StringableList();
> long minTxn = Long.MAX_VALUE;
> long maxTxn = Long.MIN_VALUE;
> for (AcidUtils.ParsedDelta delta : parsedDeltas) {
>   LOG.debug("Adding delta " + delta.getPath() + " to directories to 
> search");
>   dirsToSearch.add(delta.getPath());
>   deltaDirs.add(delta.getPath());
>   minTxn = Math.min(minTxn, delta.getMinTransaction());
>   maxTxn = Math.max(maxTxn, delta.getMaxTransaction());
> }
> if (baseDir != null) job.set(BASE_DIR, baseDir.toString());
> job.set(DELTA_DIRS, deltaDirs.toString());
> job.set(DIRS_TO_SEARCH, dirsToSearch.toString());
> job.setLong(MIN_TXN, minTxn);
> job.setLong(MAX_TXN, maxTxn);
> if (hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST)) {
>   mrJob = job;
> }
> LOG.info("Submitting " + compactionType + " compaction job '" +
>   job.getJobName() + "' to " + job.getQueueName() + " queue.  " +
>   "(current delta dirs count=" + curDirNumber +
>   ", obsolete delta dirs count=" + obsoleteDirNumber + ". TxnIdRange[" + 
> minTxn + "," + maxTxn + "]");
> RunningJob rj = new JobClient(job).submitJob(job);
> LOG.info("Submitted compaction job '" + job.getJobName() + "' with 
> jobID=" + rj.getID() + " compaction ID=" + id);
> txnHandler.setHadoopJobId(rj.getID().toString(), id);
> rj.waitForCompletion();
> if (!rj.isSuccessful()) {
>   throw new IOException(compactionType == CompactionType.MAJOR ? "Major" 
> : "Minor" +
>   " compactor job failed for " + jobName + "! Hadoop JobId: " + 
> rj.getID() );
> }
>   }
> {code}
> We should close the JobClient to release resources (cached FS objects etc).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16942) INFORMATION_SCHEMA: schematool for setting it up is not idempotent

2017-06-26 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16942:
--
Status: Patch Available  (was: Open)

> INFORMATION_SCHEMA: schematool for setting it up is not idempotent
> --
>
> Key: HIVE-16942
> URL: https://issues.apache.org/jira/browse/HIVE-16942
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16942.1.patch
>
>
> If you run schematool to set up information schema, but the SYS database 
> already exists, here's what happens:
> {code}
> [vagrant@trunk apache-hive-3.0.0-SNAPSHOT-bin]$ schematool -metaDbType mysql 
> -dbType hive -initSchema -url jdbc:hive2://localhost:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> Metastore connection URL:  jdbc:hive2://localhost:1/default
> Metastore Connection Driver :  org.apache.hive.jdbc.HiveDriver
> Metastore connection User: hive
> Starting metastore schema initialization to 3.0.0
> Initialization script hive-schema-3.0.0.hive.sql
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Database SYS already exists
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254)
> {code}
> Why is this a problem you ask?
> If you run schematool without hive.metastore.db.type set (or set to the wrong 
> thing), it will create the sys database but fail to create any of the tables 
> within it. If you go and fix hive.metastore.db.type and re-run you'll get 
> this failure until you drop the SYS database (which must be done as the hive 
> user).
> Can the init script use "create database if not exists sys" rather than just 
> "create database sys"?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16965) SMB join may produce incorrect results

2017-06-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16965:
---

Assignee: Deepak Jaiswal

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for two selects. The SMB one looks incorrect. cc 
> [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-06-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16965:

Description: 
Running the following on MiniTez
{noformat}
set hive.mapred.mode=nonstrict;
SET hive.vectorized.execution.enabled=true;
SET hive.exec.orc.default.buffer.size=32768;
SET hive.exec.orc.default.row.index.stride=1000;
SET hive.optimize.index.filter=true;
set hive.fetch.task.conversion=none;
set hive.exec.dynamic.partition.mode=nonstrict;

DROP TABLE orc_a;
DROP TABLE orc_b;

CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
smallint)
  CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
CREATE TABLE orc_b (id bigint, cfloat float)
  CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;

insert into table orc_a partition (y=2000, q)
select cbigint, cdouble, csmallint % 10 from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc;
insert into table orc_a partition (y=2001, q)
select cbigint, cdouble, csmallint % 10 from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc;

insert into table orc_b 
select cbigint, cfloat from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;

set hive.cbo.enable=false;

select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;

set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=10;

explain
select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;

DROP TABLE orc_a;
DROP TABLE orc_b;
{noformat}

Produces different results for the two selects. The SMB one looks incorrect. cc 
[~djaiswal] [~hagleitn]

  was:
Running the following on MiniTez
{noformat}
set hive.mapred.mode=nonstrict;
SET hive.vectorized.execution.enabled=true;
SET hive.exec.orc.default.buffer.size=32768;
SET hive.exec.orc.default.row.index.stride=1000;
SET hive.optimize.index.filter=true;
set hive.fetch.task.conversion=none;
set hive.exec.dynamic.partition.mode=nonstrict;

DROP TABLE orc_a;
DROP TABLE orc_b;

CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
smallint)
  CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
CREATE TABLE orc_b (id bigint, cfloat float)
  CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;

insert into table orc_a partition (y=2000, q)
select cbigint, cdouble, csmallint % 10 from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc;
insert into table orc_a partition (y=2001, q)
select cbigint, cdouble, csmallint % 10 from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc;

insert into table orc_b 
select cbigint, cfloat from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;

set hive.cbo.enable=false;

select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;

set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=10;

explain
select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;

DROP TABLE orc_a;
DROP TABLE orc_b;
{noformat}

Produces different results for two selects. The SMB one looks incorrect. cc 
[~djaiswal] [~hagleitn]


> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into tab

[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-26 Thread Deepak Jaiswal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063930#comment-16063930
 ] 

Deepak Jaiswal commented on HIVE-16761:
---

I am working on the incorrect results in SMB. Can you provide the test names 
which are failing due to that? It would be helpful in fixing the issues.

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.03.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16761:

Attachment: HIVE-16761.03.patch

This patch also does some other clean up and fixes the paths where IO elevator 
would start processing and then decide the split is not compatible, so the 
reader would be running and noone would read the results.

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.03.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063922#comment-16063922
 ] 

Sergey Shelukhin commented on HIVE-16761:
-

Looks like incorrect result is not specific to IO elevator.
MiniTez is also broken. I will file a separate JIRA; for this one, for now the 
IO elevator will be disabled for SMB to avoid further muddying the waters

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16962) Better error msg for Hive on Spark in case user cancels query and closes session

2017-06-26 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16962:
---
Attachment: HIVE-16962.patch

> Better error msg for Hive on Spark in case user cancels query and closes 
> session
> 
>
> Key: HIVE-16962
> URL: https://issues.apache.org/jira/browse/HIVE-16962
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16962.patch
>
>
> In case user cancels a query and closes the session, Hive marks the query as 
> failed. However, the error message is a little confusing. It still says:
> {quote}
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create spark 
> client. This is likely because the queue you assigned to does not have free 
> resource at the moment to start the job. Please check your queue usage and 
> try the query again later.
> {quote}
> followed by some InterruptedException.
> Ideally, the error should clearly indicates the fact that user cancels the 
> execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16962) Better error msg for Hive on Spark in case user cancels query and closes session

2017-06-26 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16962:
---
Status: Patch Available  (was: Open)

> Better error msg for Hive on Spark in case user cancels query and closes 
> session
> 
>
> Key: HIVE-16962
> URL: https://issues.apache.org/jira/browse/HIVE-16962
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16962.patch
>
>
> In case user cancels a query and closes the session, Hive marks the query as 
> failed. However, the error message is a little confusing. It still says:
> {quote}
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create spark 
> client. This is likely because the queue you assigned to does not have free 
> resource at the moment to start the job. Please check your queue usage and 
> try the query again later.
> {quote}
> followed by some InterruptedException.
> Ideally, the error should clearly indicates the fact that user cancels the 
> execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15041) Specify GCE network name on Hive ptest

2017-06-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15041:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

This is not an issue anymore.

> Specify GCE network name on Hive ptest
> --
>
> Key: HIVE-15041
> URL: https://issues.apache.org/jira/browse/HIVE-15041
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-15041.1.patch, HIVE-15041.2.patch
>
>
> NO PRECOMMIT TESTS
> A new option on cloudhost.properties should be added to specify the GCE 
> network name:
> # GCE network option
> network = 
> https://www.googleapis.com/compute/v1/projects//global/networks/default



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-06-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063840#comment-16063840
 ] 

Hive QA commented on HIVE-16793:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874525/HIVE-16793.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=102)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5778/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5778/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5778/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874525 - PreCommit-HIVE-Build

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=10

[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063803#comment-16063803
 ] 

Sergey Shelukhin commented on HIVE-16761:
-

Found an issue causing the failure, however after fixing it I see that results 
are incorrect. SMB join seems to rely on something about inputs that is no 
longer true for the elevator.

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16838) Improve plans for subqueries with non-equi co-related predicates

2017-06-26 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16838:
---
Status: Open  (was: Patch Available)

> Improve plans for subqueries with non-equi co-related predicates
> 
>
> Key: HIVE-16838
> URL: https://issues.apache.org/jira/browse/HIVE-16838
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-16838.1.patch, HIVE-16838.2.patch, 
> HIVE-16838.3.patch, HIVE-16838.4.patch, HIVE-16838.5.patch, HIVE-16838.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16838) Improve plans for subqueries with non-equi co-related predicates

2017-06-26 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16838:
---
Status: Patch Available  (was: Open)

> Improve plans for subqueries with non-equi co-related predicates
> 
>
> Key: HIVE-16838
> URL: https://issues.apache.org/jira/browse/HIVE-16838
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-16838.1.patch, HIVE-16838.2.patch, 
> HIVE-16838.3.patch, HIVE-16838.4.patch, HIVE-16838.5.patch, HIVE-16838.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16838) Improve plans for subqueries with non-equi co-related predicates

2017-06-26 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16838:
---
Attachment: HIVE-16838.6.patch

> Improve plans for subqueries with non-equi co-related predicates
> 
>
> Key: HIVE-16838
> URL: https://issues.apache.org/jira/browse/HIVE-16838
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-16838.1.patch, HIVE-16838.2.patch, 
> HIVE-16838.3.patch, HIVE-16838.4.patch, HIVE-16838.5.patch, HIVE-16838.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16750) Support change management for rename table/partition.

2017-06-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063711#comment-16063711
 ] 

Hive QA commented on HIVE-16750:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874524/HIVE-16750.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10848 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5777/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5777/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5777/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874524 - PreCommit-HIVE-Build

> Support change management for rename table/partition.
> -
>
> Key: HIVE-16750
> URL: https://issues.apache.org/jira/browse/HIVE-16750
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16750.01.patch
>
>
> Currently, rename table/partition updates the data location by renaming the 
> directory which is equivalent to moving files to new path and delete old 
> path. So, this should trigger move of files into $CMROOT.
> Scenario:
> 1. Create a table (T1)
> 2. Insert a record
> 3. Rename the table(T1 -> T2)
> 4. Repl Dump till Insert.
> 5. Repl Load from the dump.
> 6. Target DB should have table T1 with the record.
> Similar scenario with rename partition as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16954) LLAP IO: better debugging

2017-06-26 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16954:

Attachment: HIVE-16954.patch

The master patch is going to be almost the same as branch-2 patch until the 
off-heap metadata patch is committed. Or, if this is committed first, that one 
would have to be adjusted

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-26 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-16938:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master.

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Fix For: 3.0.0
>
> Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-26 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063691#comment-16063691
 ] 

Sergio Peña commented on HIVE-16559:


+1
The patch looks good.

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, 
> HIVE-16559.06.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );

[jira] [Updated] (HIVE-16961) Hive on Spark leaks spark application in case user cancels query and closes session

2017-06-26 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16961:
---
Attachment: HIVE-16961.patch

> Hive on Spark leaks spark application in case user cancels query and closes 
> session
> ---
>
> Key: HIVE-16961
> URL: https://issues.apache.org/jira/browse/HIVE-16961
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16961.patch
>
>
> It's found that a Spark application is leaked when user cancels query and 
> closes the session while Hive is waiting for remote driver to connect back. 
> This is found for asynchronous query execution, but seemingly equally 
> applicable for synchronous submission when session is abruptly closed. The 
> leaked Spark application that runs Spark driver connects back to Hive 
> successfully and run for ever (until HS2 restarts), but receives no job 
> submission because the session is already closed. Ideally, Hive should 
> rejects the connection from the driver so the driver will exist.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16961) Hive on Spark leaks spark application in case user cancels query and closes session

2017-06-26 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16961:
---
Status: Patch Available  (was: Open)

> Hive on Spark leaks spark application in case user cancels query and closes 
> session
> ---
>
> Key: HIVE-16961
> URL: https://issues.apache.org/jira/browse/HIVE-16961
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16961.patch
>
>
> It's found that a Spark application is leaked when user cancels query and 
> closes the session while Hive is waiting for remote driver to connect back. 
> This is found for asynchronous query execution, but seemingly equally 
> applicable for synchronous submission when session is abruptly closed. The 
> leaked Spark application that runs Spark driver connects back to Hive 
> successfully and run for ever (until HS2 restarts), but receives no job 
> submission because the session is already closed. Ideally, Hive should 
> rejects the connection from the driver so the driver will exist.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16905) Add zookeeper ACL for hiveserver2

2017-06-26 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063660#comment-16063660
 ] 

Vaibhav Gumashta commented on HIVE-16905:
-

[~txhsj] Thanks a lot for the patch and the document. 

In your patch, it appears that you are improving the unsecure cluster case. The 
current model is as follows: in a secure cluster (with kerberos), the znode for 
HiveServer2 is created with the ACLs: Read permission to everyone (the JDBC 
client needs this) and Create/Delete/Write/Admin to the SASL authenticated 
HiveServer2 user. In an unsecure cluster, the znode for HiveServer2 is created 
with Read/Create/Delete/Write/Admin access to all users. 

I have a few questions: what are the other authentication modes you plan to 
support with this (can you give an example)? How will that affect the 
interaction between JDBC - ZooKeeper and HiveServer2 - ZooKeeper? Also, in 
ZooKeeperHiveClientHelper, you are reading the config from Server's HiveConf. 
However, on the remote JDBC client machine, we do not have access to the 
Server's hive-site.xml (we also don't want JDBC client to depend on HiveConf - 
typically any configuration needed on the client side are passed through the 
JDBC connection string and dealt with appropriately in the JDBC driver - for 
example check how we pass the ZooKeeper namespace for HiveServer2 via the 
connection string). 

> Add zookeeper ACL for hiveserver2
> -
>
> Key: HIVE-16905
> URL: https://issues.apache.org/jira/browse/HIVE-16905
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
> Attachments: HIVE-16905.1.patch, HIVE ACL FOR HIVESERVER2.pdf
>
>
> Add zookeeper ACL for hiveserver2 is necessary for hive to protect the znode 
> of hiveserver2 deleted by accident.
> --
> case:
> when i do beeline connections throught hive HA with zookeeper, i suddenly 
> find the beeline can not connect the hiveserve2.The reason of the problem is 
> that others delete the /hiveserver2 falsely which cause to the beeline 
> connection is failed and can not read the configs from zookeeper.
> -
> as a result of the acl of /hiveserver2, the acl is set to world:anyone:cdrwa 
> which meant to anyone easily delete the /hiveserver2 and znodes anytime.It is 
> unsafe and necessary to protect the znode /hiveserver2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-4239) Remove lock on compilation stage

2017-06-26 Thread zhangzr (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangzr updated HIVE-4239:
--
Description: *strong text*

> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch
>
>
> *strong text*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-4239) Remove lock on compilation stage

2017-06-26 Thread zhangzr (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangzr updated HIVE-4239:
--
Description: (was: *strong text*)

> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16964) Compactor is not writing _orc_acid_version file

2017-06-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16964:
-


> Compactor is not writing _orc_acid_version file
> ---
>
> Key: HIVE-16964
> URL: https://issues.apache.org/jira/browse/HIVE-16964
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> OrcRecordUpdater creates OrcRecordUpdater.ACID_FORMAT in the dir that it 
> creates.
> It doesn't look like CompactorMR does the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16785) Ensure replication actions are idempotent if any series of events are applied again.

2017-06-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063593#comment-16063593
 ] 

Hive QA commented on HIVE-16785:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874523/HIVE-16785.05.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10848 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_windowing2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5776/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5776/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5776/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874523 - PreCommit-HIVE-Build

> Ensure replication actions are idempotent if any series of events are applied 
> again.
> 
>
> Key: HIVE-16785
> URL: https://issues.apache.org/jira/browse/HIVE-16785
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16785.01.patch, HIVE-16785.02.patch, 
> HIVE-16785.03.patch, HIVE-16785.04.patch, HIVE-16785.05.patch
>
>
> Some of the events(ALTER, RENAME, TRUNCATE) are not idempotent and hence 
> leads to failure of REPL LOAD if applied twice or applied on an object which 
> is latest than current event. For example, if TRUNCATE is applied on a table 
> which is already dropped will fail instead of noop.
> Also, need to consider the scenario where the object is missing while 
> applying an event. For example, if RENAME_TABLE event is applied on target 
> where the old table is missing should validate if table should be recreated 
> or should treat the event as noop. This can be done by verifying the DB level 
> last repl ID against the current event ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16750) Support change management for rename table/partition.

2017-06-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063500#comment-16063500
 ] 

ASF GitHub Bot commented on HIVE-16750:
---

GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/199

HIVE-16750: Support change management for rename table/partition.

Rename to copy data files to CM path.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-16750

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/199.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #199


commit 0e47852b50f410b145eb6fbeb95a7add4c4653af
Author: Sankar Hariappan 
Date:   2017-05-24T12:41:44Z

HIVE-16750: Support change management for rename table/partition.




> Support change management for rename table/partition.
> -
>
> Key: HIVE-16750
> URL: https://issues.apache.org/jira/browse/HIVE-16750
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16750.01.patch
>
>
> Currently, rename table/partition updates the data location by renaming the 
> directory which is equivalent to moving files to new path and delete old 
> path. So, this should trigger move of files into $CMROOT.
> Scenario:
> 1. Create a table (T1)
> 2. Insert a record
> 3. Rename the table(T1 -> T2)
> 4. Repl Dump till Insert.
> 5. Repl Load from the dump.
> 6. Target DB should have table T1 with the record.
> Similar scenario with rename partition as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-06-26 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Status: Patch Available  (was: Open)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator [SEL_59] (rows=2000

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-06-26 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Attachment: HIVE-16793.3.patch

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator [SEL_59] (rows=2

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-06-26 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Status: Open  (was: Patch Available)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator [SEL_59] (rows=2000

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-06-26 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Attachment: (was: HIVE-16793.2.patch)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator [SEL_59] (rows=2 width=621)
>

[jira] [Updated] (HIVE-16750) Support change management for rename table/partition.

2017-06-26 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16750:

Attachment: HIVE-16750.01.patch

Added 01.patch for
- Rename copies data files to CM path.

Request [~anishek]/[~sushanth] to review the patch!

> Support change management for rename table/partition.
> -
>
> Key: HIVE-16750
> URL: https://issues.apache.org/jira/browse/HIVE-16750
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16750.01.patch
>
>
> Currently, rename table/partition updates the data location by renaming the 
> directory which is equivalent to moving files to new path and delete old 
> path. So, this should trigger move of files into $CMROOT.
> Scenario:
> 1. Create a table (T1)
> 2. Insert a record
> 3. Rename the table(T1 -> T2)
> 4. Repl Dump till Insert.
> 5. Repl Load from the dump.
> 6. Target DB should have table T1 with the record.
> Similar scenario with rename partition as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16750) Support change management for rename table/partition.

2017-06-26 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16750:

Fix Version/s: 3.0.0

> Support change management for rename table/partition.
> -
>
> Key: HIVE-16750
> URL: https://issues.apache.org/jira/browse/HIVE-16750
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16750.01.patch
>
>
> Currently, rename table/partition updates the data location by renaming the 
> directory which is equivalent to moving files to new path and delete old 
> path. So, this should trigger move of files into $CMROOT.
> Scenario:
> 1. Create a table (T1)
> 2. Insert a record
> 3. Rename the table(T1 -> T2)
> 4. Repl Dump till Insert.
> 5. Repl Load from the dump.
> 6. Target DB should have table T1 with the record.
> Similar scenario with rename partition as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16750) Support change management for rename table/partition.

2017-06-26 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16750:

Status: Patch Available  (was: Open)

> Support change management for rename table/partition.
> -
>
> Key: HIVE-16750
> URL: https://issues.apache.org/jira/browse/HIVE-16750
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16750.01.patch
>
>
> Currently, rename table/partition updates the data location by renaming the 
> directory which is equivalent to moving files to new path and delete old 
> path. So, this should trigger move of files into $CMROOT.
> Scenario:
> 1. Create a table (T1)
> 2. Insert a record
> 3. Rename the table(T1 -> T2)
> 4. Repl Dump till Insert.
> 5. Repl Load from the dump.
> 6. Target DB should have table T1 with the record.
> Similar scenario with rename partition as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16785) Ensure replication actions are idempotent if any series of events are applied again.

2017-06-26 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16785:

Status: Patch Available  (was: Open)

> Ensure replication actions are idempotent if any series of events are applied 
> again.
> 
>
> Key: HIVE-16785
> URL: https://issues.apache.org/jira/browse/HIVE-16785
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16785.01.patch, HIVE-16785.02.patch, 
> HIVE-16785.03.patch, HIVE-16785.04.patch, HIVE-16785.05.patch
>
>
> Some of the events(ALTER, RENAME, TRUNCATE) are not idempotent and hence 
> leads to failure of REPL LOAD if applied twice or applied on an object which 
> is latest than current event. For example, if TRUNCATE is applied on a table 
> which is already dropped will fail instead of noop.
> Also, need to consider the scenario where the object is missing while 
> applying an event. For example, if RENAME_TABLE event is applied on target 
> where the old table is missing should validate if table should be recreated 
> or should treat the event as noop. This can be done by verifying the DB level 
> last repl ID against the current event ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16785) Ensure replication actions are idempotent if any series of events are applied again.

2017-06-26 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16785:

Attachment: HIVE-16785.05.patch

> Ensure replication actions are idempotent if any series of events are applied 
> again.
> 
>
> Key: HIVE-16785
> URL: https://issues.apache.org/jira/browse/HIVE-16785
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16785.01.patch, HIVE-16785.02.patch, 
> HIVE-16785.03.patch, HIVE-16785.04.patch, HIVE-16785.05.patch
>
>
> Some of the events(ALTER, RENAME, TRUNCATE) are not idempotent and hence 
> leads to failure of REPL LOAD if applied twice or applied on an object which 
> is latest than current event. For example, if TRUNCATE is applied on a table 
> which is already dropped will fail instead of noop.
> Also, need to consider the scenario where the object is missing while 
> applying an event. For example, if RENAME_TABLE event is applied on target 
> where the old table is missing should validate if table should be recreated 
> or should treat the event as noop. This can be done by verifying the DB level 
> last repl ID against the current event ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16785) Ensure replication actions are idempotent if any series of events are applied again.

2017-06-26 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16785:

Attachment: (was: HIVE-16785.05.patch)

> Ensure replication actions are idempotent if any series of events are applied 
> again.
> 
>
> Key: HIVE-16785
> URL: https://issues.apache.org/jira/browse/HIVE-16785
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16785.01.patch, HIVE-16785.02.patch, 
> HIVE-16785.03.patch, HIVE-16785.04.patch, HIVE-16785.05.patch
>
>
> Some of the events(ALTER, RENAME, TRUNCATE) are not idempotent and hence 
> leads to failure of REPL LOAD if applied twice or applied on an object which 
> is latest than current event. For example, if TRUNCATE is applied on a table 
> which is already dropped will fail instead of noop.
> Also, need to consider the scenario where the object is missing while 
> applying an event. For example, if RENAME_TABLE event is applied on target 
> where the old table is missing should validate if table should be recreated 
> or should treat the event as noop. This can be done by verifying the DB level 
> last repl ID against the current event ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16785) Ensure replication actions are idempotent if any series of events are applied again.

2017-06-26 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16785:

Status: Open  (was: Patch Available)

> Ensure replication actions are idempotent if any series of events are applied 
> again.
> 
>
> Key: HIVE-16785
> URL: https://issues.apache.org/jira/browse/HIVE-16785
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16785.01.patch, HIVE-16785.02.patch, 
> HIVE-16785.03.patch, HIVE-16785.04.patch, HIVE-16785.05.patch
>
>
> Some of the events(ALTER, RENAME, TRUNCATE) are not idempotent and hence 
> leads to failure of REPL LOAD if applied twice or applied on an object which 
> is latest than current event. For example, if TRUNCATE is applied on a table 
> which is already dropped will fail instead of noop.
> Also, need to consider the scenario where the object is missing while 
> applying an event. For example, if RENAME_TABLE event is applied on target 
> where the old table is missing should validate if table should be recreated 
> or should treat the event as noop. This can be done by verifying the DB level 
> last repl ID against the current event ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-16963) rely on AcidUtils.getAcidState() for read path

2017-06-26 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng resolved HIVE-16963.
--
   Resolution: Fixed
Fix Version/s: hive-14535

> rely on AcidUtils.getAcidState() for read path
> --
>
> Key: HIVE-16963
> URL: https://issues.apache.org/jira/browse/HIVE-16963
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: hive-14535
>
> Attachments: HIVE-16963.patch
>
>
> This is to make MM table more consistent to full ACID table. Also it's a 
> prerequisite for Insert Overwrite support for MM table (refer to HIVE-14988).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16963) rely on AcidUtils.getAcidState() for read path

2017-06-26 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-16963:
-
Attachment: HIVE-16963.patch

The patch also fixed an issue for Import

> rely on AcidUtils.getAcidState() for read path
> --
>
> Key: HIVE-16963
> URL: https://issues.apache.org/jira/browse/HIVE-16963
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-16963.patch
>
>
> This is to make MM table more consistent to full ACID table. Also it's a 
> prerequisite for Insert Overwrite support for MM table (refer to HIVE-14988).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16963) rely on AcidUtils.getAcidState() for read path

2017-06-26 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng reassigned HIVE-16963:



> rely on AcidUtils.getAcidState() for read path
> --
>
> Key: HIVE-16963
> URL: https://issues.apache.org/jira/browse/HIVE-16963
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>
> This is to make MM table more consistent to full ACID table. Also it's a 
> prerequisite for Insert Overwrite support for MM table (refer to HIVE-14988).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16962) Better error msg for Hive on Spark in case user cancels query and closes session

2017-06-26 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-16962:
--


> Better error msg for Hive on Spark in case user cancels query and closes 
> session
> 
>
> Key: HIVE-16962
> URL: https://issues.apache.org/jira/browse/HIVE-16962
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> In case user cancels a query and closes the session, Hive marks the query as 
> failed. However, the error message is a little confusing. It still says:
> {quote}
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create spark 
> client. This is likely because the queue you assigned to does not have free 
> resource at the moment to start the job. Please check your queue usage and 
> try the query again later.
> {quote}
> followed by some InterruptedException.
> Ideally, the error should clearly indicates the fact that user cancels the 
> execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16961) Hive on Spark leaks spark application in case user cancels query and closes session

2017-06-26 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-16961:
--


> Hive on Spark leaks spark application in case user cancels query and closes 
> session
> ---
>
> Key: HIVE-16961
> URL: https://issues.apache.org/jira/browse/HIVE-16961
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> It's found that a Spark application is leaked when user cancels query and 
> closes the session while Hive is waiting for remote driver to connect back. 
> This is found for asynchronous query execution, but seemingly equally 
> applicable for synchronous submission when session is abruptly closed. The 
> leaked Spark application that runs Spark driver connects back to Hive 
> successfully and run for ever (until HS2 restarts), but receives no job 
> submission because the session is already closed. Ideally, Hive should 
> rejects the connection from the driver so the driver will exist.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set

2017-06-26 Thread Janaki Lahorani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani reassigned HIVE-16960:
--

Assignee: Janaki Lahorani

> Hive throws an ugly error exception when HDFS sticky bit is set
> ---
>
> Key: HIVE-16960
> URL: https://issues.apache.org/jira/browse/HIVE-16960
> Project: Hive
>  Issue Type: Bug
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>Priority: Critical
>
> When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user 
> other than the HDFS file owner, and the HDFS sticky bit is set, then Hive 
> will throw an error exception message that the file cannot be moved due to 
> permission issues.
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied by sticky bit setting: user=hive, 
> inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk
> The permission denied is expected, but the error message does not make sense 
> to users + the stack trace displayed is huge. We should display a better 
> error message to users, and maybe provide with help information about how to 
> fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16784) Missing lineage information when hive.blobstore.optimizations.enabled is true

2017-06-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063335#comment-16063335
 ] 

Hive QA commented on HIVE-16784:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874514/HIVE-16784.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5775/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5775/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5775/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874514 - PreCommit-HIVE-Build

> Missing lineage information when hive.blobstore.optimizations.enabled is true
> -
>
> Key: HIVE-16784
> URL: https://issues.apache.org/jira/browse/HIVE-16784
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16784.01.patch, HIVE-16784.02.patch
>
>
> Running the commands of the add_part_multiple.q test on S3 with 
> hive.blobstore.optimizations.enabled=true fails because of missing lineage 
> information.
> Running the command on HDFS
> {noformat}
> from src TABLESAMPLE (1 ROWS)
> insert into table add_part_test PARTITION (ds='2010-01-01') select 100,100
> insert into table add_part_test PARTITION (ds='2010-02-01') select 200,200
> insert into table add_part_test PARTITION (ds='2010-03-01') select 400,300
> insert into table add_part_test PARTITION (ds='2010-04-01') select 500,400;
> {noformat}
> results the following posthook outputs 
> {noformat}
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-02-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-02-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-03-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-03-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-04-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-04-01).value EXPRESSION []
> {noformat}
> These lines are not printed when running the command on the table located in 
> S3.
> If hive.blobstore.optimizations.enabled=false, the lineage information is 
> printed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16938) INFORMATION_SCHEMA usability: difficult to access # of table records

2017-06-26 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063263#comment-16063263
 ] 

Thejas M Nair commented on HIVE-16938:
--

+1

> INFORMATION_SCHEMA usability: difficult to access # of table records
> 
>
> Key: HIVE-16938
> URL: https://issues.apache.org/jira/browse/HIVE-16938
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16938.1.patch, HIVE-16938.2.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> One common thing users want to know is the number of rows in tables, system 
> wide.
> This information is in the table_params table but the structure of this table 
> makes it quite inconvenient to access since it is essentially a table of 
> key-value pairs. More table stats are likely to be added over time, 
> especially because of ACID. It would be a lot better if this were a first 
> class table.
> For what it's worth I deal with the current table by pivoting it into 
> something easier to deal with as follows:
> {code}
> create view table_stats as
> select
>   tbl_id,
>   max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as 
> COLUMN_STATS_ACCURATE,
>   max(case param_key when 'numFiles' then param_value end) as numFiles,
>   max(case param_key when 'numRows' then param_value end) as numRows,
>   max(case param_key when 'rawDataSize' then param_value end) as rawDataSize,
>   max(case param_key when 'totalSize' then param_value end) as totalSize,
>   max(case param_key when 'transient_lastDdlTime' then param_value end) as 
> transient_lastDdlTime
> from table_params group by tbl_id;
> {code}
> It would be better to not have users provide workarounds and make table stats 
> first-class like column stats currently are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16942) INFORMATION_SCHEMA: schematool for setting it up is not idempotent

2017-06-26 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063257#comment-16063257
 ] 

Thejas M Nair commented on HIVE-16942:
--

+1


> INFORMATION_SCHEMA: schematool for setting it up is not idempotent
> --
>
> Key: HIVE-16942
> URL: https://issues.apache.org/jira/browse/HIVE-16942
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16942.1.patch
>
>
> If you run schematool to set up information schema, but the SYS database 
> already exists, here's what happens:
> {code}
> [vagrant@trunk apache-hive-3.0.0-SNAPSHOT-bin]$ schematool -metaDbType mysql 
> -dbType hive -initSchema -url jdbc:hive2://localhost:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> Metastore connection URL:  jdbc:hive2://localhost:1/default
> Metastore Connection Driver :  org.apache.hive.jdbc.HiveDriver
> Metastore connection User: hive
> Starting metastore schema initialization to 3.0.0
> Initialization script hive-schema-3.0.0.hive.sql
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. Database SYS already exists
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254)
> {code}
> Why is this a problem you ask?
> If you run schematool without hive.metastore.db.type set (or set to the wrong 
> thing), it will create the sys database but fail to create any of the tables 
> within it. If you go and fix hive.metastore.db.type and re-run you'll get 
> this failure until you drop the SYS database (which must be done as the hive 
> user).
> Can the init script use "create database if not exists sys" rather than just 
> "create database sys"?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16869) Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader

2017-06-26 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063242#comment-16063242
 ] 

Yongzhi Chen commented on HIVE-16869:
-

return null when any of the "or" sub-condition return null is more like turn 
off hive.optimize.index.filter  when the filter has none existing columns in 
parquet file. It is a fast fix before the partition filter issue is handled by 
parquet. 
The change looks good. +1

> Hive returns wrong result when predicates on non-existing columns are pushed 
> down to Parquet reader
> ---
>
> Key: HIVE-16869
> URL: https://issues.apache.org/jira/browse/HIVE-16869
> Project: Hive
>  Issue Type: Bug
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Critical
> Attachments: HIVE-16869.1.patch, HIVE-16869.2.patch
>
>
> When {{hive.optimize.ppd}} and {{hive.optimize.index.filter}} are turned, and 
> a select query has a condition on a column that doesn't exist in Parquet file 
> (such as a partition column), Hive often returns wrong result.
> Please see below example for details:
> {noformat}
> hive> create table test_parq (a int, b int) partitioned by (p int) stored as 
> parquet;
> OK
> Time taken: 0.292 seconds
> hive> insert overwrite table test_parq partition (p=1) values (1, 2);
> OK
> Time taken: 5.08 seconds
> hive> select * from test_parq where a=1 and p=1;
> OK
> 1 2   1
> Time taken: 0.441 seconds, Fetched: 1 row(s)
> hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999);
> OK
> 1 2   1
> Time taken: 0.197 seconds, Fetched: 1 row(s)
> hive> set hive.optimize.index.filter=true;
> hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999);
> OK
> Time taken: 0.167 seconds
> hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1);
> OK
> Time taken: 0.563 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-6348) Order by/Sort by in subquery

2017-06-26 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063234#comment-16063234
 ] 

Ashutosh Chauhan commented on HIVE-6348:


I don't think its 'breaking'  those cases. So, change should be ok.
[~lirui] Seems like some of q.out files needs update. Can you please update 
those? 

> Order by/Sort by in subquery
> 
>
> Key: HIVE-6348
> URL: https://issues.apache.org/jira/browse/HIVE-6348
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Rui Li
>Priority: Minor
>  Labels: sub-query
> Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch, HIVE-6348.3.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16784) Missing lineage information when hive.blobstore.optimizations.enabled is true

2017-06-26 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16784:
---
Attachment: HIVE-16784.02.patch

Fixed unit tests.

> Missing lineage information when hive.blobstore.optimizations.enabled is true
> -
>
> Key: HIVE-16784
> URL: https://issues.apache.org/jira/browse/HIVE-16784
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16784.01.patch, HIVE-16784.02.patch
>
>
> Running the commands of the add_part_multiple.q test on S3 with 
> hive.blobstore.optimizations.enabled=true fails because of missing lineage 
> information.
> Running the command on HDFS
> {noformat}
> from src TABLESAMPLE (1 ROWS)
> insert into table add_part_test PARTITION (ds='2010-01-01') select 100,100
> insert into table add_part_test PARTITION (ds='2010-02-01') select 200,200
> insert into table add_part_test PARTITION (ds='2010-03-01') select 400,300
> insert into table add_part_test PARTITION (ds='2010-04-01') select 500,400;
> {noformat}
> results the following posthook outputs 
> {noformat}
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-02-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-02-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-03-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-03-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-04-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-04-01).value EXPRESSION []
> {noformat}
> These lines are not printed when running the command on the table located in 
> S3.
> If hive.blobstore.optimizations.enabled=false, the lineage information is 
> printed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16877) NPE when issue query like alter table ... cascade onto non-partitioned table

2017-06-26 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16877:

   Resolution: Fixed
Fix Version/s: (was: 2.2.0)
   3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Wang!

> NPE when issue query like alter table ... cascade onto non-partitioned table 
> -
>
> Key: HIVE-16877
> URL: https://issues.apache.org/jira/browse/HIVE-16877
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Fix For: 3.0.0
>
> Attachments: HIVE-16877.1.patch, HIVE-16877.2.patch
>
>
> After HIVE-8839 in 1.1.0 support "alter table ... cascade" to cascade table 
> changes to partitions as well.  But NPE thrown when issue query like "alter 
> table ... cascade" onto non-partitioned table 
> Sample Query:
> {code}
> create table test_cascade_npe (id int);
> alter table test_cascade_npe add columns (name string ) cascade;
> {code}
> Exception stack:
> {code}
> 2017-06-09T22:16:05,913 ERROR [main] ql.Driver: FAILED: NullPointerException 
> null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:547)
> at 
> org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:489)
> at 
> org.apache.hadoop.hive.ql.metadata.Partition.getName(Partition.java:198)
> at org.apache.hadoop.hive.ql.hooks.Entity.computeName(Entity.java:339)
> at org.apache.hadoop.hive.ql.hooks.Entity.(Entity.java:208)
> at 
> org.apache.hadoop.hive.ql.hooks.WriteEntity.(WriteEntity.java:104)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1496)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1473)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableModifyCols(DDLSemanticAnalyzer.java:2685)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:284)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:474)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1245)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1387)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16845) INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE

2017-06-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063217#comment-16063217
 ] 

Hive QA commented on HIVE-16845:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874504/HIVE-16845.1.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10848 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=241)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5774/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5774/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5774/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874504 - PreCommit-HIVE-Build

> INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE
> -
>
> Key: HIVE-16845
> URL: https://issues.apache.org/jira/browse/HIVE-16845
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
> Attachments: HIVE-16845.1.patch
>
>
> *How to reproduce*
> - Create a partitioned table on S3:
> {noformat}
> CREATE EXTERNAL TABLE s3table(user_id string COMMENT '', event_name string 
> COMMENT '') PARTITIONED BY (reported_date string, product_id int) LOCATION 
> 's3a://'; 
> {noformat}
> - Create a temp table:
> {noformat}
> create table tmp_table (id string, name string, date string, pid int) row 
> format delimited fields terminated by '\t' lines terminated by '\n' stored as 
> textfile;
> {noformat}
> - Load the following rows to the tmp table:
> {noformat}
> u1value1  2017-04-10  1
> u2value2  2017-04-10  1
> u3value3  2017-04-10  10001
> {noformat}
> - Set the following parameters:
> -- hive.exec.dynamic.partition.mode=nonstrict
> -- mapreduce.input.fileinputformat.split.maxsize=10
> -- hive.blobstore.optimizations.enabled=true
> -- hive.blobstore.use.blobstore.as.scratchdir=false
> -- hive.merge.mapfiles=true
> - Insert the rows from the temp table into the s3 table:
> {noformat}
> INSERT OVERWRITE TABLE s3table
> PARTITION (reported_date, product_id)
> SELECT
>   t.id as user_id,
>   t.name as event_name,
>   t.date as reported_date,
>   t.pid as product_id
> FROM tmp_table t;
> {noformat}
> A NPE will occur with the following stacktrace:
> {noformat}
> 2017-05-08 21:32:50,607 ERROR 
> org.apache.hive.service.cli.operation.Operation: 
> [HiveServer2-Background-Pool: Thread-184028]: Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.ConditionalTask. null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:239)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at 
> org.apache.hive.ser

[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-26 Thread Barna Zsombor Klara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063204#comment-16063204
 ] 

Barna Zsombor Klara commented on HIVE-16559:


Failures are unrelated:
- HIVE-16908 - for HCat failures
- HIVE-16785 - is taking care of replication failures
- HIVE-15776 - for vector_if_expr
- HIVE-16931 - PerfTests
- HIVE-16959 - insert_overwrite_local_directory_1
- tez_smb_main seems to be failing constantly

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, 
> HIVE-16559.06.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );

[jira] [Updated] (HIVE-16943) MoveTask should separate src FileSystem from dest FileSystem

2017-06-26 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16943:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Fei!

> MoveTask should separate src FileSystem from dest FileSystem 
> -
>
> Key: HIVE-16943
> URL: https://issues.apache.org/jira/browse/HIVE-16943
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Fix For: 3.0.0
>
> Attachments: HIVE-16943.1.patch
>
>
> {code:title=MoveTask.java|borderStyle=solid} 
>  private void moveFileInDfs (Path sourcePath, Path targetPath, FileSystem fs)
>   throws HiveException, IOException {
> // if source exists, rename. Otherwise, create a empty directory
> if (fs.exists(sourcePath)) {
>   Path deletePath = null;
>   // If it multiple level of folder are there fs.rename is failing so 
> first
>   // create the targetpath.getParent() if it not exist
>   if (HiveConf.getBoolVar(conf, 
> HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
> deletePath = createTargetPath(targetPath, fs);
>   }
>   Hive.clearDestForSubDirSrc(conf, targetPath, sourcePath, false);
>   if (!Hive.moveFile(conf, sourcePath, targetPath, true, false)) {
> try {
>   if (deletePath != null) {
> fs.delete(deletePath, true);
>   }
> } catch (IOException e) {
>   LOG.info("Unable to delete the path created for facilitating rename"
>   + deletePath);
> }
> throw new HiveException("Unable to rename: " + sourcePath
> + " to: " + targetPath);
>   }
> } else if (!fs.mkdirs(targetPath)) {
>   throw new HiveException("Unable to make directory: " + targetPath);
> }
>   }
> {code}
> Maybe sourcePath and targetPath come from defferent filesystem, we should 
> separate them.
> I see that HIVE-11568 had done it in Hive.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16784) Missing lineage information when hive.blobstore.optimizations.enabled is true

2017-06-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063119#comment-16063119
 ] 

Hive QA commented on HIVE-16784:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874493/HIVE-16784.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 10845 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_blobstore_to_blobstore]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_empty_into_blobstore]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[orc_format_nonpart]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[orc_format_part]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[orc_nonstd_partitions_loc]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[rcfile_format_nonpart]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[rcfile_format_part]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[rcfile_nonstd_partitions_loc]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[zero_rows_blobstore]
 (batchId=241)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.ql.optimizer.TestGenMapRedUtilsCreateConditionalTask.testConditionalMoveTaskIsOptimized
 (batchId=269)
org.apache.hadoop.hive.ql.optimizer.TestGenMapRedUtilsCreateConditionalTask.testMergePathValidMoveWorkReturnsNewMoveWork
 (batchId=269)
org.apache.hadoop.hive.ql.optimizer.TestGenMapRedUtilsCreateConditionalTask.testMergePathWithInvalidMoveWorkThrowsException
 (batchId=269)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5773/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5773/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5773/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874493 - PreCommit-HIVE-Build

> Missing lineage information when hive.blobstore.optimizations.enabled is true
> -
>
> Key: HIVE-16784
> URL: https://issues.apache.org/jira/browse/HIVE-16784
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16784.01.patch
>
>
> Running the commands of the add_part_multiple.q test on S3 with 
> hive.blobstore.optimizations.enabled=true fails because of missing lineage 
> information.
> Running the command on HDFS
> {noformat}
> from src TABLESAMPLE (1 ROWS)
> insert into table add_part_test PARTITION (ds='2010-01-01') select 100,100
> insert into table add_part_test PARTITION (ds='2010-02-01') select 200,200
> insert into table add_part_test PARTITION (ds='2010-03-01') select 400,300
> insert into table add_part_test PARTITION (ds='2010-04-01') select 500,400;
> {noformat}
> results the following posthook outputs 
> {noformat}
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(

[jira] [Updated] (HIVE-16845) INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE

2017-06-26 Thread Marta Kuczora (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-16845:
-
Status: Patch Available  (was: Open)

The patch and the link to the review board is attached.

> INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE
> -
>
> Key: HIVE-16845
> URL: https://issues.apache.org/jira/browse/HIVE-16845
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
> Attachments: HIVE-16845.1.patch
>
>
> *How to reproduce*
> - Create a partitioned table on S3:
> {noformat}
> CREATE EXTERNAL TABLE s3table(user_id string COMMENT '', event_name string 
> COMMENT '') PARTITIONED BY (reported_date string, product_id int) LOCATION 
> 's3a://'; 
> {noformat}
> - Create a temp table:
> {noformat}
> create table tmp_table (id string, name string, date string, pid int) row 
> format delimited fields terminated by '\t' lines terminated by '\n' stored as 
> textfile;
> {noformat}
> - Load the following rows to the tmp table:
> {noformat}
> u1value1  2017-04-10  1
> u2value2  2017-04-10  1
> u3value3  2017-04-10  10001
> {noformat}
> - Set the following parameters:
> -- hive.exec.dynamic.partition.mode=nonstrict
> -- mapreduce.input.fileinputformat.split.maxsize=10
> -- hive.blobstore.optimizations.enabled=true
> -- hive.blobstore.use.blobstore.as.scratchdir=false
> -- hive.merge.mapfiles=true
> - Insert the rows from the temp table into the s3 table:
> {noformat}
> INSERT OVERWRITE TABLE s3table
> PARTITION (reported_date, product_id)
> SELECT
>   t.id as user_id,
>   t.name as event_name,
>   t.date as reported_date,
>   t.pid as product_id
> FROM tmp_table t;
> {noformat}
> A NPE will occur with the following stacktrace:
> {noformat}
> 2017-05-08 21:32:50,607 ERROR 
> org.apache.hive.service.cli.operation.Operation: 
> [HiveServer2-Background-Pool: Thread-184028]: Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.ConditionalTask. null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:239)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ConditionalResolverMergeFiles.generateActualTasks(ConditionalResolverMergeFiles.java:290)
> at 
> org.apache.hadoop.hive.ql.plan.ConditionalResolverMergeFiles.getTasks(ConditionalResolverMergeFiles.java:175)
> at 
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1977)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1690)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1422)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1206)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1201)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
> ... 11 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16845) INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE

2017-06-26 Thread Marta Kuczora (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-16845:
-
Attachment: HIVE-16845.1.patch

> INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE
> -
>
> Key: HIVE-16845
> URL: https://issues.apache.org/jira/browse/HIVE-16845
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
> Attachments: HIVE-16845.1.patch
>
>
> *How to reproduce*
> - Create a partitioned table on S3:
> {noformat}
> CREATE EXTERNAL TABLE s3table(user_id string COMMENT '', event_name string 
> COMMENT '') PARTITIONED BY (reported_date string, product_id int) LOCATION 
> 's3a://'; 
> {noformat}
> - Create a temp table:
> {noformat}
> create table tmp_table (id string, name string, date string, pid int) row 
> format delimited fields terminated by '\t' lines terminated by '\n' stored as 
> textfile;
> {noformat}
> - Load the following rows to the tmp table:
> {noformat}
> u1value1  2017-04-10  1
> u2value2  2017-04-10  1
> u3value3  2017-04-10  10001
> {noformat}
> - Set the following parameters:
> -- hive.exec.dynamic.partition.mode=nonstrict
> -- mapreduce.input.fileinputformat.split.maxsize=10
> -- hive.blobstore.optimizations.enabled=true
> -- hive.blobstore.use.blobstore.as.scratchdir=false
> -- hive.merge.mapfiles=true
> - Insert the rows from the temp table into the s3 table:
> {noformat}
> INSERT OVERWRITE TABLE s3table
> PARTITION (reported_date, product_id)
> SELECT
>   t.id as user_id,
>   t.name as event_name,
>   t.date as reported_date,
>   t.pid as product_id
> FROM tmp_table t;
> {noformat}
> A NPE will occur with the following stacktrace:
> {noformat}
> 2017-05-08 21:32:50,607 ERROR 
> org.apache.hive.service.cli.operation.Operation: 
> [HiveServer2-Background-Pool: Thread-184028]: Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.ConditionalTask. null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:239)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ConditionalResolverMergeFiles.generateActualTasks(ConditionalResolverMergeFiles.java:290)
> at 
> org.apache.hadoop.hive.ql.plan.ConditionalResolverMergeFiles.getTasks(ConditionalResolverMergeFiles.java:175)
> at 
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1977)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1690)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1422)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1206)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1201)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
> ... 11 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063052#comment-16063052
 ] 

Hive QA commented on HIVE-16559:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874485/HIVE-16559.06.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10846 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5772/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5772/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5772/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874485 - PreCommit-HIVE-Build

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, 
> HIVE-16559.06.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );

[jira] [Updated] (HIVE-16784) Missing lineage information when hive.blobstore.optimizations.enabled is true

2017-06-26 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16784:
---
Fix Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> Missing lineage information when hive.blobstore.optimizations.enabled is true
> -
>
> Key: HIVE-16784
> URL: https://issues.apache.org/jira/browse/HIVE-16784
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16784.01.patch
>
>
> Running the commands of the add_part_multiple.q test on S3 with 
> hive.blobstore.optimizations.enabled=true fails because of missing lineage 
> information.
> Running the command on HDFS
> {noformat}
> from src TABLESAMPLE (1 ROWS)
> insert into table add_part_test PARTITION (ds='2010-01-01') select 100,100
> insert into table add_part_test PARTITION (ds='2010-02-01') select 200,200
> insert into table add_part_test PARTITION (ds='2010-03-01') select 400,300
> insert into table add_part_test PARTITION (ds='2010-04-01') select 500,400;
> {noformat}
> results the following posthook outputs 
> {noformat}
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-02-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-02-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-03-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-03-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-04-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-04-01).value EXPRESSION []
> {noformat}
> These lines are not printed when running the command on the table located in 
> S3.
> If hive.blobstore.optimizations.enabled=false, the lineage information is 
> printed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16784) Missing lineage information when hive.blobstore.optimizations.enabled is true

2017-06-26 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16784:
---
Attachment: HIVE-16784.01.patch

First version of the patch.

> Missing lineage information when hive.blobstore.optimizations.enabled is true
> -
>
> Key: HIVE-16784
> URL: https://issues.apache.org/jira/browse/HIVE-16784
> Project: Hive
>  Issue Type: Bug
>Reporter: Marta Kuczora
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-16784.01.patch
>
>
> Running the commands of the add_part_multiple.q test on S3 with 
> hive.blobstore.optimizations.enabled=true fails because of missing lineage 
> information.
> Running the command on HDFS
> {noformat}
> from src TABLESAMPLE (1 ROWS)
> insert into table add_part_test PARTITION (ds='2010-01-01') select 100,100
> insert into table add_part_test PARTITION (ds='2010-02-01') select 200,200
> insert into table add_part_test PARTITION (ds='2010-03-01') select 400,300
> insert into table add_part_test PARTITION (ds='2010-04-01') select 500,400;
> {noformat}
> results the following posthook outputs 
> {noformat}
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-01-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-02-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-02-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-03-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-03-01).value EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-04-01).key EXPRESSION []
> POSTHOOK: Lineage: add_part_test2 PARTITION(ds=2010-04-01).value EXPRESSION []
> {noformat}
> These lines are not printed when running the command on the table located in 
> S3.
> If hive.blobstore.optimizations.enabled=false, the lineage information is 
> printed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-13060) Exception in parsing hive query in standalone utlity

2017-06-26 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-13060.
-
Resolution: Not A Problem

I'm afraid what you are trying to do is not really supported - using those 
classes for other purposes is fine...

anyway...0.13 is kinda old...I think you may have better luck by using classes 
from a more recent version of hive...try a 2.x - since you are using the parser 
only...I think it will be fine :)

> Exception in parsing hive query in standalone utlity
> 
>
> Key: HIVE-13060
> URL: https://issues.apache.org/jira/browse/HIVE-13060
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.13.1
>Reporter: mahendra dattatraya tonape
>Priority: Blocker
>  Labels: newbie, test
> Fix For: 0.13.1
>
>
> I had requirement in which I am parsing hive queries using following classes 
> from hive-exex jar 
> org.apache.hadoop.hive.ql.parse.ParseDriver,org.apache.hadoop.hive.ql.parse.ASTNode,org.apache.hadoop.hive.ql.parse.ParseUtils,org.apache.hadoop.hive.ql.parse.HiveParser
>   for accessing these classes from hive-exec jar i am using following maven 
> dependency in my project :
> 
>   org.apache.hive
>   hive-exec
>   0.13.1-cdh5.3.0
> 
> My hive query parsing utility works in almost all cases but surprisingly its 
> failing in case of following query :
> INSERT INTO db_lineage.many_one_hv SELECT * FROM (SELECT * FROM 
> db_lineage.one_many1_hv UNION ALL SELECT * FROM db_lineage.one_many2_hv) 
> FINAL;
> This query is getting executed successfully on hive cluster with version 0.14 
> and 1.2 but in my local system its failing on my local system.I never connect 
> to hive cluster or any database,my utility is standalone parsing of hive 
> query and retrieve source node and destination node from it only by using 
> hive-exec dependencies.please let me know if you can provide any inputs on 
> this.
> Thanks and Regards, Mahendra Tonape.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-06-26 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Attachment: HIVE-16559.06.patch

Updated after RB comments.

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch, HIVE-16559.02.patch, 
> HIVE-16559.03.patch, HIVE-16559.04.patch, HIVE-16559.05.patch, 
> HIVE-16559.06.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );

[jira] [Commented] (HIVE-16958) Setting hive.merge.sparkfiles=true will retrun an error when generating parquet databases

2017-06-26 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062960#comment-16062960
 ] 

Rui Li commented on HIVE-16958:
---

Hi [~Liu765940375], does the issue happen for MR too? I.e. when 
hive.merge.mapfiles=true and hive.merge.mapredfiles=true.
And could you share your data to reproduce it?

> Setting hive.merge.sparkfiles=true will retrun an error when generating 
> parquet databases 
> --
>
> Key: HIVE-16958
> URL: https://issues.apache.org/jira/browse/HIVE-16958
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0
> Environment: centos7 hadoop2.7.3 spark2.0.0
>Reporter: Liu Chunxiao
>Priority: Minor
> Attachments: parquet-hivemergesparkfiles.txt, sale.sql
>
>
> The process will return 
> Job failed with java.lang.NullPointerException
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 1.0 (TID 31, bdpe822n1): java.io.IOException: 
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:217)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:695)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.GeneratedConstructorAccessor26.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
>   ... 17 more
> Caused by: java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:118)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:189)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:84)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrappe

[jira] [Comment Edited] (HIVE-12745) Hive Timestamp value change after joining two tables

2017-06-26 Thread Rajat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062910#comment-16062910
 ] 

Rajat edited comment on HIVE-12745 at 6/26/17 10:55 AM:


We have also faced the same issue.
We have a simple select statement having a join of two tables.
The timestamp columns are getting shifted back by 1 hour for one table.

Although, individually firing select query for each table is working fine as 
expected.
We are not able to find the root cause and still looking for it.
Can it be due to clock out of sync issue of one of the mapper/reducer node?


was (Author: srajat):
We have also faced the same issue.
We have a simple select statement fetching data using join of 2 tables.
The timestamp columns are getting shifted back by 1 hour for one table.

Although, individually firing select query for each table is working fine as 
expected.
We are not able to find the root cause and still looking for it.
Can it be due to clock out of sync issue of one of the mapper/reducer node?

> Hive Timestamp value change after joining two tables
> 
>
> Key: HIVE-12745
> URL: https://issues.apache.org/jira/browse/HIVE-12745
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
>Reporter: wyp
>Assignee: Dmitry Tolpeko
>
> I have two Hive tables:test and test1:
> {code}
> CREATE TABLE `test`( `t` timestamp)
> CREATE TABLE `test1`( `t` timestamp)
> {code}
> they all holds a t value with Timestamp datatype,the contents of the two 
> table as follow:
> {code}
> hive> select * from test1;
> OK
> 1970-01-01 00:00:00
> 1970-03-02 00:00:00
> Time taken: 0.091 seconds, Fetched: 2 row(s)
> hive> select * from test;
> OK
> 1970-01-01 00:00:00
> 1970-01-02 00:00:00
> Time taken: 0.085 seconds, Fetched: 2 row(s)
> {code}
> However when joining this two table, the returned timestamp value changed:
> {code}
> hive> select test.t, test1.t from test, test1;
> OK
> 1969-12-31 23:00:00   1970-01-01 00:00:00
> 1970-01-01 23:00:00   1970-01-01 00:00:00
> 1969-12-31 23:00:00   1970-03-02 00:00:00
> 1970-01-01 23:00:00   1970-03-02 00:00:00
> Time taken: 54.347 seconds, Fetched: 4 row(s)
> {code}
> and I found the result is changed every time
> {code}
> hive> select test.t, test1.t from test, test1;
> OK
> 1970-01-01 00:00:00   1970-01-01 00:00:00
> 1970-01-02 00:00:00   1970-01-01 00:00:00
> 1970-01-01 00:00:00   1970-03-02 00:00:00
> 1970-01-02 00:00:00   1970-03-02 00:00:00
> Time taken: 26.308 seconds, Fetched: 4 row(s)
> {code}
> Any suggestion? Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-12745) Hive Timestamp value change after joining two tables

2017-06-26 Thread Rajat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062910#comment-16062910
 ] 

Rajat commented on HIVE-12745:
--

We have also faced the same issue.
We have a simple select statement fetching data using join of 2 tables.
The timestamp columns are getting shifted back by 1 hour for one table.

Although, individually firing select query for each table is working fine as 
expected.
We are not able to find the root cause and still looking for it.
Can it be due to clock out of sync issue of one of the mapper/reducer node?

> Hive Timestamp value change after joining two tables
> 
>
> Key: HIVE-12745
> URL: https://issues.apache.org/jira/browse/HIVE-12745
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 1.2.1
>Reporter: wyp
>Assignee: Dmitry Tolpeko
>
> I have two Hive tables:test and test1:
> {code}
> CREATE TABLE `test`( `t` timestamp)
> CREATE TABLE `test1`( `t` timestamp)
> {code}
> they all holds a t value with Timestamp datatype,the contents of the two 
> table as follow:
> {code}
> hive> select * from test1;
> OK
> 1970-01-01 00:00:00
> 1970-03-02 00:00:00
> Time taken: 0.091 seconds, Fetched: 2 row(s)
> hive> select * from test;
> OK
> 1970-01-01 00:00:00
> 1970-01-02 00:00:00
> Time taken: 0.085 seconds, Fetched: 2 row(s)
> {code}
> However when joining this two table, the returned timestamp value changed:
> {code}
> hive> select test.t, test1.t from test, test1;
> OK
> 1969-12-31 23:00:00   1970-01-01 00:00:00
> 1970-01-01 23:00:00   1970-01-01 00:00:00
> 1969-12-31 23:00:00   1970-03-02 00:00:00
> 1970-01-01 23:00:00   1970-03-02 00:00:00
> Time taken: 54.347 seconds, Fetched: 4 row(s)
> {code}
> and I found the result is changed every time
> {code}
> hive> select test.t, test1.t from test, test1;
> OK
> 1970-01-01 00:00:00   1970-01-01 00:00:00
> 1970-01-02 00:00:00   1970-01-01 00:00:00
> 1970-01-01 00:00:00   1970-03-02 00:00:00
> 1970-01-02 00:00:00   1970-03-02 00:00:00
> Time taken: 26.308 seconds, Fetched: 4 row(s)
> {code}
> Any suggestion? Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16877) NPE when issue query like alter table ... cascade onto non-partitioned table

2017-06-26 Thread Wang Haihua (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062878#comment-16062878
 ] 

Wang Haihua commented on HIVE-16877:


Failed Test seems unrelated with this patch

> NPE when issue query like alter table ... cascade onto non-partitioned table 
> -
>
> Key: HIVE-16877
> URL: https://issues.apache.org/jira/browse/HIVE-16877
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Fix For: 2.2.0
>
> Attachments: HIVE-16877.1.patch, HIVE-16877.2.patch
>
>
> After HIVE-8839 in 1.1.0 support "alter table ... cascade" to cascade table 
> changes to partitions as well.  But NPE thrown when issue query like "alter 
> table ... cascade" onto non-partitioned table 
> Sample Query:
> {code}
> create table test_cascade_npe (id int);
> alter table test_cascade_npe add columns (name string ) cascade;
> {code}
> Exception stack:
> {code}
> 2017-06-09T22:16:05,913 ERROR [main] ql.Driver: FAILED: NullPointerException 
> null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:547)
> at 
> org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:489)
> at 
> org.apache.hadoop.hive.ql.metadata.Partition.getName(Partition.java:198)
> at org.apache.hadoop.hive.ql.hooks.Entity.computeName(Entity.java:339)
> at org.apache.hadoop.hive.ql.hooks.Entity.(Entity.java:208)
> at 
> org.apache.hadoop.hive.ql.hooks.WriteEntity.(WriteEntity.java:104)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1496)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1473)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableModifyCols(DDLSemanticAnalyzer.java:2685)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:284)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:474)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1245)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1387)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1174)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-26 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062808#comment-16062808
 ] 

liyunzhang_intel commented on HIVE-16840:
-

limit push down is in HIVE-3562.

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16958) Setting hive.merge.sparkfiles=true will retrun an error when generating parquet databases

2017-06-26 Thread Liu Chunxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Chunxiao updated HIVE-16958:

Attachment: parquet-hivemergesparkfiles.txt
sale.sql

> Setting hive.merge.sparkfiles=true will retrun an error when generating 
> parquet databases 
> --
>
> Key: HIVE-16958
> URL: https://issues.apache.org/jira/browse/HIVE-16958
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0
> Environment: centos7 hadoop2.7.3 spark2.0.0
>Reporter: Liu Chunxiao
>Priority: Minor
> Attachments: parquet-hivemergesparkfiles.txt, sale.sql
>
>
> The process will return 
> Job failed with java.lang.NullPointerException
> FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> java.util.concurrent.ExecutionException: Exception thrown by job
>   at 
> org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
>   at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 1.0 (TID 31, bdpe822n1): java.io.IOException: 
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:217)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:695)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.GeneratedConstructorAccessor26.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
>   ... 17 more
> Caused by: java.lang.NullPointerException
>   at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:118)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:189)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:84)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:74)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderW

[jira] [Commented] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062685#comment-16062685
 ] 

Hive QA commented on HIVE-13567:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874439/HIVE-13567.20.patch

{color:green}SUCCESS:{color} +1 due to 25 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 210 failed/errored test(s), 10849 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_11] 
(batchId=238)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_10] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_1] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_2] 
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_7] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join14] (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join17] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join19] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join19_inclause] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join1] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join26] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join2] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join3] (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join4] (batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join5] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join6] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join7] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join8] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join9] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_13] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket1] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket3] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark1] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark2] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark3] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_1]
 (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_3]
 (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_4]
 (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_5]
 (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join17] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby2_map_multi_distinct]
 (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[column_pruner_multiple_children]
 (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer5] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cp_sel] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby10] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby11] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby12] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby1_limit] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby1_map] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby1_map_nomap] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby1_map_skew] 
(batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby1_noskew] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby2_map] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby2_map_multi_distinct]
 (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby2_map_skew] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby2_noskew] 
(batchId=1)
org.apache.hadoop.hive.

[jira] [Commented] (HIVE-16840) Investigate the performance of order by limit in HoS

2017-06-26 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062651#comment-16062651
 ] 

liyunzhang_intel commented on HIVE-16840:
-

[~lirui]: thanks for you investigation
bq. hive should have already pushed down the limit to the upstream of shuffle. 
Looking at the RS code, it uses a TopN hash to track the top N keys in input. 
Ideally, each RS will only output N records. I tried some simple query to 
verify how this saves shuffled data.
I saw the topN in ReduceSinkOperator. But let me spend some time to 
investigate. [~xuefuz]: can you give us some suggestion?

> Investigate the performance of order by limit in HoS
> 
>
> Key: HIVE-16840
> URL: https://issues.apache.org/jira/browse/HIVE-16840
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16840.patch
>
>
> We found that on 1TB data of TPC-DS, q17 of TPC-DS hanged.
> {code}
>  select  i_item_id
>,i_item_desc
>,s_state
>,count(ss_quantity) as store_sales_quantitycount
>,avg(ss_quantity) as store_sales_quantityave
>,stddev_samp(ss_quantity) as store_sales_quantitystdev
>,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
>,count(sr_return_quantity) as_store_returns_quantitycount
>,avg(sr_return_quantity) as_store_returns_quantityave
>,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
>,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
> store_returns_quantitycov
>,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) 
> as catalog_sales_quantityave
>,stddev_samp(cs_quantity)/avg(cs_quantity) as 
> catalog_sales_quantitystdev
>,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
>  from store_sales
>  ,store_returns
>  ,catalog_sales
>  ,date_dim d1
>  ,date_dim d2
>  ,date_dim d3
>  ,store
>  ,item
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = store_sales.ss_sold_date_sk
>and item.i_item_sk = store_sales.ss_item_sk
>and store.s_store_sk = store_sales.ss_store_sk
>and store_sales.ss_customer_sk = store_returns.sr_customer_sk
>and store_sales.ss_item_sk = store_returns.sr_item_sk
>and store_sales.ss_ticket_number = store_returns.sr_ticket_number
>and store_returns.sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
>and store_returns.sr_item_sk = catalog_sales.cs_item_sk
>and catalog_sales.cs_sold_date_sk = d3.d_date_sk
>and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
>  group by i_item_id
>  ,i_item_desc
>  ,s_state
>  order by i_item_id
>  ,i_item_desc
>  ,s_state
> limit 100;
> {code}
> the reason why the script hanged is because we only use 1 task to implement 
> sort.
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Spark
>   Edges:
> Reducer 10 <- Reducer 9 (SORT, 1)
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 889), Map 11 
> (PARTITION-LEVEL SORT, 889)
> Reducer 3 <- Map 12 (PARTITION-LEVEL SORT, 1009), Reducer 2 
> (PARTITION-LEVEL SORT, 1009)
> Reducer 4 <- Map 13 (PARTITION-LEVEL SORT, 683), Reducer 3 
> (PARTITION-LEVEL SORT, 683)
> Reducer 5 <- Map 14 (PARTITION-LEVEL SORT, 751), Reducer 4 
> (PARTITION-LEVEL SORT, 751)
> Reducer 6 <- Map 15 (PARTITION-LEVEL SORT, 826), Reducer 5 
> (PARTITION-LEVEL SORT, 826)
> Reducer 7 <- Map 16 (PARTITION-LEVEL SORT, 909), Reducer 6 
> (PARTITION-LEVEL SORT, 909)
> Reducer 8 <- Map 17 (PARTITION-LEVEL SORT, 1001), Reducer 7 
> (PARTITION-LEVEL SORT, 1001)
> Reducer 9 <- Reducer 8 (GROUP, 2)
> {code}
> The parallelism of Reducer 9 is 1. It is a orderby limit case so we use 1 
> task to execute to ensure the correctness. But the performance is poor.
> the reason why we use 1 task to implement order by limit is 
> [here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-26 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Attachment: HIVE-13567.20.patch

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch, 
> HIVE-13567.18.patch, HIVE-13567.19.patch, HIVE-13567.20.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-26 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Open  (was: Patch Available)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch, 
> HIVE-13567.18.patch, HIVE-13567.19.patch, HIVE-13567.20.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-06-26 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Patch Available  (was: Open)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch, HIVE-13567.02.patch, 
> HIVE-13567.03.patch, HIVE-13567.04.patch, HIVE-13567.05.patch, 
> HIVE-13567.06.patch, HIVE-13567.07.patch, HIVE-13567.08.patch, 
> HIVE-13567.09.patch, HIVE-13567.10.patch, HIVE-13567.11.patch, 
> HIVE-13567.12.patch, HIVE-13567.13.patch, HIVE-13567.14.patch, 
> HIVE-13567.15.patch, HIVE-13567.16.patch, HIVE-13567.17.patch, 
> HIVE-13567.18.patch, HIVE-13567.19.patch, HIVE-13567.20.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

89 matches

Mail list logo