[jira] [Work started] (HIVE-13873) Column pruning for nested fields

2016-06-14 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13873 started by Ferdinand Xu.
---
> Column pruning for nested fields
> 
>
> Key: HIVE-13873
> URL: https://issues.apache.org/jira/browse/HIVE-13873
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13873) Column pruning for nested fields

2016-06-14 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-13873:
---

Assignee: Ferdinand Xu

> Column pruning for nested fields
> 
>
> Key: HIVE-13873
> URL: https://issues.apache.org/jira/browse/HIVE-13873
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13840) Orc split generation is reading file footers twice

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13840:
-
Attachment: HIVE-13840-branch-1.patch

Committed to branch-1 as well

> Orc split generation is reading file footers twice
> --
>
> Key: HIVE-13840
> URL: https://issues.apache.org/jira/browse/HIVE-13840
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13840-branch-1.patch, HIVE-13840.1.patch, 
> HIVE-13840.2.patch, HIVE-13840.3.patch
>
>
> Recent refactorings to move orc out introduced a regression in split 
> generation. This leads to reading the orc file footers twice during split 
> generation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13840) Orc split generation is reading file footers twice

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13840:
-
Fix Version/s: 1.3.0

> Orc split generation is reading file footers twice
> --
>
> Key: HIVE-13840
> URL: https://issues.apache.org/jira/browse/HIVE-13840
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13840-branch-1.patch, HIVE-13840.1.patch, 
> HIVE-13840.2.patch, HIVE-13840.3.patch
>
>
> Recent refactorings to move orc out introduced a regression in split 
> generation. This leads to reading the orc file footers twice during split 
> generation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13841) Orc split generation returns different strategies with cache enabled vs disabled

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13841:
-
Fix Version/s: 1.3.0

> Orc split generation returns different strategies with cache enabled vs 
> disabled
> 
>
> Key: HIVE-13841
> URL: https://issues.apache.org/jira/browse/HIVE-13841
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13841-branch-1.patch, HIVE-13841.1.patch
>
>
> Split strategy chosen by OrcInputFormat should not change when enabling or 
> disabling footer cache. Currently if footer cache is disabled minSplits in 
> OrcInputFormat.Context will be set to -1 which is used during determination 
> of split strategies. minSplits should be set to requested value or some 
> default instead of cache size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330996#comment-15330996
 ] 

Sergey Shelukhin commented on HIVE-13913:
-

isClosed is not thread safe, I might just restore it to not working for now. 
There's also some other bug I got distracted from... will update the patch 
eventually.

> LLAP: introduce backpressure to recordreader
> 
>
> Key: HIVE-13913
> URL: https://issues.apache.org/jira/browse/HIVE-13913
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, 
> HIVE-13913.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330993#comment-15330993
 ] 

Sergey Shelukhin commented on HIVE-13913:
-

isClosed is not thread safe, I might just restore it to not working for now. 
There's also some other bug I got distracted from... will update the patch 
eventually.

> LLAP: introduce backpressure to recordreader
> 
>
> Key: HIVE-13913
> URL: https://issues.apache.org/jira/browse/HIVE-13913
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, 
> HIVE-13913.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330992#comment-15330992
 ] 

Sergey Shelukhin commented on HIVE-13913:
-

isClosed is not thread safe, I might just restore it to not working for now. 
There's also some other bug I got distracted from... will update the patch 
eventually.

> LLAP: introduce backpressure to recordreader
> 
>
> Key: HIVE-13913
> URL: https://issues.apache.org/jira/browse/HIVE-13913
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, 
> HIVE-13913.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13841) Orc split generation returns different strategies with cache enabled vs disabled

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13841:
-
Attachment: HIVE-13841-branch-1.patch

Also committed patch to branch-1.

> Orc split generation returns different strategies with cache enabled vs 
> disabled
> 
>
> Key: HIVE-13841
> URL: https://issues.apache.org/jira/browse/HIVE-13841
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.1.0
>
> Attachments: HIVE-13841-branch-1.patch, HIVE-13841.1.patch
>
>
> Split strategy chosen by OrcInputFormat should not change when enabling or 
> disabling footer cache. Currently if footer cache is disabled minSplits in 
> OrcInputFormat.Context will be set to -1 which is used during determination 
> of split strategies. minSplits should be set to requested value or some 
> default instead of cache size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14016) Vectorization: VectorGroupByRollupOperator and VectorGroupByCubeOperator

2016-06-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-14016:
--

Assignee: Gopal V

> Vectorization: VectorGroupByRollupOperator and VectorGroupByCubeOperator
> 
>
> Key: HIVE-14016
> URL: https://issues.apache.org/jira/browse/HIVE-14016
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Gopal V
>
> Rollup and Cube queries are not vectorized today due to the miss of 
> grouping-sets inside vector group by.
> The cube and rollup operators can be shimmed onto the end of the pipeline by 
> converting a single row writer into a multiple row writer.
> The corresponding non-vec loop is as follows
> {code}
>   if (groupingSetsPresent) {
> Object[] newKeysArray = newKeys.getKeyArray();
> Object[] cloneNewKeysArray = new Object[newKeysArray.length];
> for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
>   cloneNewKeysArray[keyPos] = newKeysArray[keyPos];
> }
> for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); 
> groupingSetPos++) {
>   for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) {
> newKeysArray[keyPos] = null;
>   }
>   FastBitSet bitset = groupingSetsBitSet[groupingSetPos];
>   // Some keys need to be left to null corresponding to that grouping 
> set.
>   for (int keyPos = bitset.nextSetBit(0); keyPos >= 0;
> keyPos = bitset.nextSetBit(keyPos+1)) {
> newKeysArray[keyPos] = cloneNewKeysArray[keyPos];
>   }
>   newKeysArray[groupingSetsPosition] = 
> newKeysGroupingSets[groupingSetPos];
>   processKey(row, rowInspector);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13984:
---
Status: Patch Available  (was: Open)

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch, 
> HIVE-13984.03.patch, HIVE-13984.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14014:
---
Status: Patch Available  (was: Open)

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13984:
---
Attachment: HIVE-13984.04.patch

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch, 
> HIVE-13984.03.patch, HIVE-13984.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13984:
---
Status: Open  (was: Patch Available)

> Use multi-threaded approach to listing files for msck
> -
>
> Key: HIVE-13984
> URL: https://issues.apache.org/jira/browse/HIVE-13984
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch, 
> HIVE-13984.03.patch, HIVE-13984.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14014:
---
Status: Open  (was: Patch Available)

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13961) ACID: Major compaction fails to include the original bucket files if there's no delta directory

2016-06-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330894#comment-15330894
 ] 

Eugene Koifman commented on HIVE-13961:
---

+1

> ACID: Major compaction fails to include the original bucket files if there's 
> no delta directory
> ---
>
> Key: HIVE-13961
> URL: https://issues.apache.org/jira/browse/HIVE-13961
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-13961.1.patch, HIVE-13961.2.patch, 
> HIVE-13961.3.patch, HIVE-13961.4.patch, HIVE-13961.5.patch, HIVE-13961.6.patch
>
>
> The issue can be reproduced by steps below:
> 1. Insert a row to Non-ACID table
> 2. Convert Non-ACID to ACID table (i.e. set transactional=true table property)
> 3. Perform Major compaction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13958) hive.strict.checks.type.safety should apply to decimals, as well as IN... and BETWEEN... ops

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330839#comment-15330839
 ] 

Sergey Shelukhin commented on HIVE-13958:
-

One nit on RB; also, this doesn't actually cover decimal <-> string case

> hive.strict.checks.type.safety should apply to decimals, as well as IN... and 
> BETWEEN... ops
> 
>
> Key: HIVE-13958
> URL: https://issues.apache.org/jira/browse/HIVE-13958
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Takuma Wakamori
>  Labels: patch
> Attachments: HIVE-13958.01.patch, HIVE-13958.02.patch, 
> HIVE-13958.03.patch
>
>
> String to decimal auto-casts should be prohibited for compares



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13958) hive.strict.checks.type.safety should apply to decimals, as well as IN... and BETWEEN... ops

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13958:

Attachment: HIVE-13958.03.patch

The same patch as 02, looks like HiveQA died and skipped this patch.

> hive.strict.checks.type.safety should apply to decimals, as well as IN... and 
> BETWEEN... ops
> 
>
> Key: HIVE-13958
> URL: https://issues.apache.org/jira/browse/HIVE-13958
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Takuma Wakamori
>  Labels: patch
> Attachments: HIVE-13958.01.patch, HIVE-13958.02.patch, 
> HIVE-13958.03.patch
>
>
> String to decimal auto-casts should be prohibited for compares



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330823#comment-15330823
 ] 

Sergey Shelukhin commented on HIVE-13617:
-

[~spena] when files are explicitly specified in qfile, they are run regardless 
of the properties file. For now I just added the out file.

[~prasanth_j] can you please review

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13617-wo-11417.patch, HIVE-13617-wo-11417.patch, 
> HIVE-13617.01.patch, HIVE-13617.03.patch, HIVE-13617.04.patch, 
> HIVE-13617.05.patch, HIVE-13617.06.patch, HIVE-13617.patch, HIVE-13617.patch, 
> HIVE-15396-with-oi.patch
>
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
> think the latter might be better - it's not a hugely important path, and perf 
> in non-vectorized case is not the best anyway, so it's better to make do with 
> much less new code and architectural disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14002) Extend limit propagation to subsequent RS operators

2016-06-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330819#comment-15330819
 ] 

Ashutosh Chauhan commented on HIVE-14002:
-

I am not sure if we can allow *any* operators between two RSs other than GBy. 
e.g., filter can be problematic if first Limit only generates N rows which 
filters eat all of it. We will get incorrect result. Even for Select operator 
we can allow this only for column references and constants.

> Extend limit propagation to subsequent RS operators
> ---
>
> Key: HIVE-14002
> URL: https://issues.apache.org/jira/browse/HIVE-14002
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14002.patch
>
>
> In some occasions, for instance when RS dedup does not kick in, it is useful 
> to propagate the limit to subsequent RS operators, as this will reduce 
> intermediary results and impact performance. This issue covers that extension.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14010) parquet-logging.properties from HIVE_CONF_DIR should be used when available

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-14010.
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.0
   1.3.0

Committed to branch-1, branch-2.1 and master.

> parquet-logging.properties from HIVE_CONF_DIR should be used when available
> ---
>
> Key: HIVE-14010
> URL: https://issues.apache.org/jira/browse/HIVE-14010
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.1.0, 2.2.0
>
> Attachments: HIVE-14010.1.patch
>
>
> Following up on HIVE-13954, when parquet-logging.properties is available in 
> HIVE_CONF_DIR it should be used first. When not available fallback to 
> relative path from bin directory.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14010) parquet-logging.properties from HIVE_CONF_DIR should be used when available

2016-06-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330804#comment-15330804
 ] 

Ashutosh Chauhan commented on HIVE-14010:
-

+1

> parquet-logging.properties from HIVE_CONF_DIR should be used when available
> ---
>
> Key: HIVE-14010
> URL: https://issues.apache.org/jira/browse/HIVE-14010
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14010.1.patch
>
>
> Following up on HIVE-13954, when parquet-logging.properties is available in 
> HIVE_CONF_DIR it should be used first. When not available fallback to 
> relative path from bin directory.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330792#comment-15330792
 ] 

Ashutosh Chauhan commented on HIVE-14014:
-

+1

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330790#comment-15330790
 ] 

Pengcheng Xiong commented on HIVE-14014:


Done.

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13833) Add an initial delay when starting the heartbeat

2016-06-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13833:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   1.3.0
   Status: Resolved  (was: Patch Available)

> Add an initial delay when starting the heartbeat
> 
>
> Key: HIVE-13833
> URL: https://issues.apache.org/jira/browse/HIVE-13833
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Minor
> Fix For: 1.3.0, 2.2.0
>
> Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch, 
> HIVE-13833.3.patch, HIVE-13833.4.patch
>
>
> Since the scheduling of heartbeat happens immediately after lock acquisition, 
> it's unnecessary to send heartbeat at the time when locks is acquired. Add an 
> initial delay to skip this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13833) Add an initial delay when starting the heartbeat

2016-06-14 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330787#comment-15330787
 ] 

Wei Zheng commented on HIVE-13833:
--

Committed to master and branch-1. Thanks Eugene for the review.

> Add an initial delay when starting the heartbeat
> 
>
> Key: HIVE-13833
> URL: https://issues.apache.org/jira/browse/HIVE-13833
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Minor
> Fix For: 1.3.0, 2.2.0
>
> Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch, 
> HIVE-13833.3.patch, HIVE-13833.4.patch
>
>
> Since the scheduling of heartbeat happens immediately after lock acquisition, 
> it's unnecessary to send heartbeat at the time when locks is acquired. Add an 
> initial delay to skip this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11166) HiveHBaseTableOutputFormat can't call getFileExtension(JobConf jc, boolean isCompressed, HiveOutputFormat hiveOutputFormat)

2016-06-14 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330789#comment-15330789
 ] 

Aihua Xu commented on HIVE-11166:
-

[~Yun Zhao] The change seems reasonable to me. Can we add one unit test to 
cover this hbase test case?

> HiveHBaseTableOutputFormat can't call getFileExtension(JobConf jc, boolean 
> isCompressed, HiveOutputFormat hiveOutputFormat)
> -
>
> Key: HIVE-11166
> URL: https://issues.apache.org/jira/browse/HIVE-11166
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler, Spark
>Reporter: meiyoula
>Assignee: Yun Zhao
> Attachments: HIVE-11166.2.patch, HIVE-11166.patch
>
>
>  I create a hbase table with HBaseStorageHandler in JDBCServer of spark, then 
> execute the *insert into* sql statement, ClassCastException occurs.
> {quote}
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 3.0 (TID 12, vm-17): java.lang.ClassCastException: 
> org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to 
> org.apache.hadoop.hive.ql.io.HiveOutputFormat
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:72)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:71)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:91)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:115)
> at 
> org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:84)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:112)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> {quote}
> It's because the code in spark below. To hbase table, the outputFormat is 
> HiveHBaseTableOutputFormat, it isn't instanceOf[HiveOutputForm
> at].
> {quote}
> @transient private lazy val 
> outputFormat=conf.value.getOutputFormat.asInstanceOf[HiveOutputForm
> at[AnyRef, Writable]]
> val extension = Utilities.getFileExtension(conf.value, 
> fileSinkConf.getCompressed, outputFormat)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14014:
---
Status: Patch Available  (was: Open)

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14014:
---
Attachment: HIVE-14014.02.patch

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14014:
---
Status: Open  (was: Patch Available)

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14011) MessageFactory is not pluggable

2016-06-14 Thread Sravya Tirukkovalur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sravya Tirukkovalur updated HIVE-14011:
---
Attachment: HIVE-14011.patch

Attaching a fix.

> MessageFactory is not pluggable
> ---
>
> Key: HIVE-14011
> URL: https://issues.apache.org/jira/browse/HIVE-14011
> Project: Hive
>  Issue Type: Bug
>Reporter: Sravya Tirukkovalur
> Attachments: HIVE-14011.patch
>
>
> Property "hcatalog.message.factory.impl.json" is available to use a custom 
> message factory implementation. Although it is not pluggable as 
> MessageFatcory is hardcoded to use JSONMessageFactory.
> https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/messaging/MessageFactory.java#L39



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13970) refactor LLAPIF splits - get rid of SubmitWorkInfo

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13970:

Attachment: HIVE-13970.patch

The patch for HiveQA again...

> refactor LLAPIF splits - get rid of SubmitWorkInfo
> --
>
> Key: HIVE-13970
> URL: https://issues.apache.org/jira/browse/HIVE-13970
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13970.only.patch, HIVE-13970.patch, HIVE-13970.patch
>
>
> First we build the signable vertex spec, convert it into bytes (as we 
> should), and put it inside SubmitWorkInfo. Then we serialize that into byte[] 
> and put it into LlapInputSplit. Then we serialize that to return... We should 
> get rid of one of the steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14009) Acid DB creation error in HiveQA

2016-06-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330767#comment-15330767
 ] 

Eugene Koifman commented on HIVE-14009:
---

[~spena] could you comment?

> Acid DB creation error in HiveQA
> 
>
> Key: HIVE-14009
> URL: https://issues.apache.org/jira/browse/HIVE-14009
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Seen when running TestEncryptedHDFSCliDriver, at least with Hadoop 2.7.2 
> (HIVE-13930). 
> Looks like such issues are usually caused by concurrent db creation from 
> multiple threads.
> {noformat}
> java.lang.RuntimeException: Unable to set up transaction database for 
> testing: Exception during creation of file 
> /home/hiveptest/54.219.24.101-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/junit_metastore_db/seg0/cc60.dat
>  for container
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkQFileTestHack(TxnHandler.java:2172)
>  ~[hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.setConf(TxnHandler.java:228) 
> ~[hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:96) 
> [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTxnHandler(HiveMetaStore.java:557)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5902)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_25]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_25]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_25]
>   at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy111.heartbeat(Unknown Source) [?:?]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:2140)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_25]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_25]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_25]
>   at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25]
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy112.heartbeat(Unknown Source) [?:?]
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.heartbeat(DbTxnManager.java:663)
>  [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:423)
>  [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:633)
>  [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_25]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [?:1.8.0_25]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [?:1.8.0_25]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [?:1.8.0_25]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_25]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_25]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_25]
> Caused by: java.sql.SQLException: Exception during creation of file 
> /home/hiveptest/54.219.24.101-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/junit_metastore_db/seg0/cc60.dat
>  for container
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source) ~[derby-10.10.2.0.jar:?]
>   at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source) 
> 

[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13930:

Attachment: HIVE-13930.03.patch

HiveQA failed silently, trying again.

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14009) Acid DB creation error in HiveQA

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330760#comment-15330760
 ] 

Sergey Shelukhin commented on HIVE-14009:
-

No idea. 

> Acid DB creation error in HiveQA
> 
>
> Key: HIVE-14009
> URL: https://issues.apache.org/jira/browse/HIVE-14009
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Seen when running TestEncryptedHDFSCliDriver, at least with Hadoop 2.7.2 
> (HIVE-13930). 
> Looks like such issues are usually caused by concurrent db creation from 
> multiple threads.
> {noformat}
> java.lang.RuntimeException: Unable to set up transaction database for 
> testing: Exception during creation of file 
> /home/hiveptest/54.219.24.101-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/junit_metastore_db/seg0/cc60.dat
>  for container
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkQFileTestHack(TxnHandler.java:2172)
>  ~[hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.setConf(TxnHandler.java:228) 
> ~[hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:96) 
> [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTxnHandler(HiveMetaStore.java:557)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5902)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_25]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_25]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_25]
>   at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy111.heartbeat(Unknown Source) [?:?]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:2140)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_25]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_25]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_25]
>   at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25]
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
>  [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy112.heartbeat(Unknown Source) [?:?]
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.heartbeat(DbTxnManager.java:663)
>  [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:423)
>  [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:633)
>  [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_25]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [?:1.8.0_25]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [?:1.8.0_25]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [?:1.8.0_25]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_25]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_25]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_25]
> Caused by: java.sql.SQLException: Exception during creation of file 
> /home/hiveptest/54.219.24.101-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/junit_metastore_db/seg0/cc60.dat
>  for container
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source) ~[derby-10.10.2.0.jar:?]
>   at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source) 
> 

[jira] [Updated] (HIVE-13771) LLAPIF: generate app ID

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13771:

Attachment: HIVE-13771.03.patch

Looks like HiveQA failed silently

> LLAPIF: generate app ID
> ---
>
> Key: HIVE-13771
> URL: https://issues.apache.org/jira/browse/HIVE-13771
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13771.01.patch, HIVE-13771.02.patch, 
> HIVE-13771.03.patch, HIVE-13771.patch
>
>
> See comments in the HIVE-13675 patch. The uniqueness needs to be ensured; the 
> user may be allowed to supply a prefix (e.g. his YARN app Id, if any) for 
> ease of tracking



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330755#comment-15330755
 ] 

Ashutosh Chauhan commented on HIVE-14014:
-

partition dir may not get created in commit() so its better to pass in 
{{filescreated}} boolean.

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13986) LLAP: kill Tez AM on token errors from plugin

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13986:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master.

> LLAP: kill Tez AM on token errors from plugin
> -
>
> Key: HIVE-13986
> URL: https://issues.apache.org/jira/browse/HIVE-13986
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-13986.01.patch, HIVE-13986.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330703#comment-15330703
 ] 

Ashutosh Chauhan commented on HIVE-13985:
-

Can you create a RB entry for this?

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-2.1.patch, 
> HIVE-13985.1.patch, HIVE-13985.2.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330681#comment-15330681
 ] 

Sergey Shelukhin commented on HIVE-13974:
-

Some comments on RB

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13974.01.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14014:
---
Attachment: HIVE-14014.01.patch

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330679#comment-15330679
 ] 

Pengcheng Xiong commented on HIVE-14014:


[~ashutoshc], could u take a look? Thanks.

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)

2016-06-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14014:
---
Status: Patch Available  (was: Open)

> zero length file is being created for empty bucket in tez mode (II)
> ---
>
> Key: HIVE-14014
> URL: https://issues.apache.org/jira/browse/HIVE-14014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14014.01.patch
>
>
> The same problem happens when source table is not empty, e.g,, when "limit 0" 
> is not there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13696) Monitor fair-scheduler.xml and automatically update/validate jobs submitted to fair-scheduler

2016-06-14 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330656#comment-15330656
 ] 

Yongzhi Chen commented on HIVE-13696:
-

LGTM  +1
But I do not know this part of code well, [~prasadm], could you review the 
patch? Thanks

> Monitor fair-scheduler.xml and automatically update/validate jobs submitted 
> to fair-scheduler
> -
>
> Key: HIVE-13696
> URL: https://issues.apache.org/jira/browse/HIVE-13696
> Project: Hive
>  Issue Type: Improvement
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
> Attachments: HIVE-13696.01.patch, HIVE-13696.02.patch, 
> HIVE-13696.06.patch, HIVE-13696.08.patch, HIVE-13696.11.patch, 
> HIVE-13696.13.patch
>
>
> Ensure that jobs are placed into the correct queue according to 
> {{fair-scheduler.xml}}. Jobs should be placed into the correct queue, and 
> users should not be able to submit jobs to queues they do not have access to.
> This patch builds on the existing functionality in {{FairSchedulerShim}} to 
> route jobs to user-specific queue based on {{fair-scheduler.xml}} 
> configuration (leveraging the Yarn {{QueuePlacementPolicy}} class). In 
> addition to configuring job routing at session connect (current behavior), 
> the routing is validated per submission to yarn (when impersonation is off). 
> A {{FileSystemWatcher}} class is included to monitor changes in the 
> {{fair-scheduler.xml}} file (so updates are automatically reloaded when the 
> file pointed to by {{yarn.scheduler.fair.allocation.file}} is changed).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13959) MoveTask should only release its query associated locks

2016-06-14 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-13959:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

HIVE-13959.patch has committed to 2.2.0 and 2.1.1. Thanks [~ychena] for review.

> MoveTask should only release its query associated locks
> ---
>
> Key: HIVE-13959
> URL: https://issues.apache.org/jira/browse/HIVE-13959
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-13959.1.patch, HIVE-13959.patch, HIVE-13959.patch
>
>
> releaseLocks in MoveTask releases all locks under a HiveLockObject pathNames. 
> But some of locks under this pathNames might be for other queries and should 
> not be released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14013) Describe table doesn't show unicode properly

2016-06-14 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14013:

Status: Patch Available  (was: Open)

Patch-1: for various places, use utf-8 encoding when writing to output stream. 
We need to come up with hive specific escape() version since the common one 
also escapes unicode characters which causes the issue.

> Describe table doesn't show unicode properly
> 
>
> Key: HIVE-14013
> URL: https://issues.apache.org/jira/browse/HIVE-14013
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14013.1.patch
>
>
> Describe table output will show comments incorrectly rather than the unicode 
> itself.
> {noformat}
> hive> desc formatted t1;
> # Detailed Table Information 
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
> comment \u8868\u4E2D\u6587\u6D4B\u8BD5
> numFiles0   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14013) Describe table doesn't show unicode properly

2016-06-14 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14013:

Attachment: HIVE-14013.1.patch

> Describe table doesn't show unicode properly
> 
>
> Key: HIVE-14013
> URL: https://issues.apache.org/jira/browse/HIVE-14013
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14013.1.patch
>
>
> Describe table output will show comments incorrectly rather than the unicode 
> itself.
> {noformat}
> hive> desc formatted t1;
> # Detailed Table Information 
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
> comment \u8868\u4E2D\u6587\u6D4B\u8BD5
> numFiles0   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14000) (ORC) Changing a numeric type column of a partitioned table to lower type set values to something other than 'NULL'

2016-06-14 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330266#comment-15330266
 ] 

Matt McCline commented on HIVE-14000:
-

About 1/2 the change is eliminating unused members and parameters after row 
reading parts of all the tree readers were eliminated.

> (ORC) Changing a numeric type column of a partitioned table to lower type set 
> values to something other than 'NULL'
> ---
>
> Key: HIVE-14000
> URL: https://issues.apache.org/jira/browse/HIVE-14000
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14000.01.patch
>
>
> When an integer column is changed to a type that is smaller (e.g. bigint to 
> int) and set hive.metastore.disallow.incompatible.col.type.changes=false, the 
> data is clipped instead of being NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14000) (ORC) Changing a numeric type column of a partitioned table to lower type set values to something other than 'NULL'

2016-06-14 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330251#comment-15330251
 ] 

Matt McCline commented on HIVE-14000:
-

[~sershe] I added a link to RB.   Yes, the only related failues are I didn't 
update the MiniTez Q file outputs and mistakenly updated schema_evol_stats.

> (ORC) Changing a numeric type column of a partitioned table to lower type set 
> values to something other than 'NULL'
> ---
>
> Key: HIVE-14000
> URL: https://issues.apache.org/jira/browse/HIVE-14000
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14000.01.patch
>
>
> When an integer column is changed to a type that is smaller (e.g. bigint to 
> int) and set hive.metastore.disallow.incompatible.col.type.changes=false, the 
> data is clipped instead of being NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14008) Duplicate line in LLAP SecretManager

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14008:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed to branches.

> Duplicate line in LLAP SecretManager
> 
>
> Key: HIVE-14008
> URL: https://issues.apache.org/jira/browse/HIVE-14008
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330193#comment-15330193
 ] 

Sergey Shelukhin commented on HIVE-13974:
-

Test failures look related. I'd look at the patch later today.

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13974.01.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14000) (ORC) Changing a numeric type column of a partitioned table to lower type set values to something other than 'NULL'

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330187#comment-15330187
 ] 

Sergey Shelukhin commented on HIVE-14000:
-

Is it possible to have an RB? Also, test failures look related.

> (ORC) Changing a numeric type column of a partitioned table to lower type set 
> values to something other than 'NULL'
> ---
>
> Key: HIVE-14000
> URL: https://issues.apache.org/jira/browse/HIVE-14000
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14000.01.patch
>
>
> When an integer column is changed to a type that is smaller (e.g. bigint to 
> int) and set hive.metastore.disallow.incompatible.col.type.changes=false, the 
> data is clipped instead of being NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330182#comment-15330182
 ] 

Sergey Shelukhin commented on HIVE-13957:
-

Committed there too. Thanks!

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.2.0, 2.1.1, 2.0.2
>
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14012) some ColumnVector-s are missing ensureSize

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14012:

Reporter: Takahiko Saito  (was: Sergey Shelukhin)

> some ColumnVector-s are missing ensureSize
> --
>
> Key: HIVE-14012
> URL: https://issues.apache.org/jira/browse/HIVE-14012
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14012.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13957:

Fix Version/s: 2.1.1

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.2.0, 2.1.1, 2.0.2
>
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13957:

Target Version/s:   (was: 2.1.1)

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.2.0, 2.0.2
>
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13648) ORC Schema Evolution doesn't support same type conversion for VARCHAR, CHAR, or DECIMAL when maxLength or precision/scale is different

2016-06-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330181#comment-15330181
 ] 

Prasanth Jayachandran commented on HIVE-13648:
--

[~mmccline] I can see enforcing precision and scale in decimal conversion, but 
I don't see enforcing change in maxLength for char/varchar conversion reader. 
Also fileType argument passed StringGroupFromStringGroupTreeReader seems to be 
unused. 

> ORC Schema Evolution doesn't support same type conversion for VARCHAR, CHAR, 
> or DECIMAL when maxLength or precision/scale is different
> --
>
> Key: HIVE-13648
> URL: https://issues.apache.org/jira/browse/HIVE-13648
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-13648.01.patch, HIVE-13648.02.patch
>
>
> E.g. when a data file is copied in has a VARCHAR maxLength that doesn't match 
> the DDL's maxLength.  This error is produced:
> {code}
> java.io.IOException: ORC does not support type conversion from file type 
> varchar(145) (36) to reader type varchar(114) (36)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14012) some ColumnVector-s are missing ensureSize

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330176#comment-15330176
 ] 

Sergey Shelukhin commented on HIVE-14012:
-

[~owen.omalley] can you comment on the above? I understand you added the 
complex type vectors.

> some ColumnVector-s are missing ensureSize
> --
>
> Key: HIVE-14012
> URL: https://issues.apache.org/jira/browse/HIVE-14012
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14012.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14006) Hive query with UNION ALL fails with ArrayIndexOutOfBoundsException

2016-06-14 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-14006:


Assignee: Naveen Gangam

> Hive query with UNION ALL fails with ArrayIndexOutOfBoundsException
> ---
>
> Key: HIVE-14006
> URL: https://issues.apache.org/jira/browse/HIVE-14006
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>
> set hive.cbo.enable=false;
> DROP VIEW IF EXISTS a_view;
> DROP TABLE IF EXISTS table_a1;
> DROP TABLE IF EXISTS table_a2;
> DROP TABLE IF EXISTS table_b1;
> DROP TABLE IF EXISTS table_b2;
> CREATE TABLE table_a1
> (composite_key STRING);
> CREATE TABLE table_a2
> (composite_key STRING);
> CREATE TABLE table_b1
> (composite_key STRING, col1 STRING);
> CREATE TABLE table_b2
> (composite_key STRING);
> CREATE VIEW a_view AS
> SELECT
> substring(a1.composite_key, 1, locate('|',a1.composite_key) - 1) AS autoname,
> NULL AS col1
> FROM table_a1 a1
> FULL OUTER JOIN table_a2 a2
> ON a1.composite_key = a2.composite_key
> UNION ALL
> SELECT
> substring(b1.composite_key, 1, locate('|',b1.composite_key) - 1) AS autoname,
> b1.col1 AS col1
> FROM table_b1 b1
> FULL OUTER JOIN table_b2 b2
> ON b1.composite_key = b2.composite_key;
> INSERT INTO TABLE table_b1
> SELECT * FROM (
> SELECT 'something|awful', 'col1'
> )s ;
> SELECT autoname
> FROM a_view
> WHERE autoname='something';
> fails with 
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"_col0":"something"}
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"_col0":"something"}
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
>   ... 8 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:134)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
> The same query succeeds when {{hive.ppd.remove.duplicatefilters=false}} with 
> or without CBO on. It also succeeds with just CBO on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14012) some ColumnVector-s are missing ensureSize

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14012:

Status: Patch Available  (was: Open)

> some ColumnVector-s are missing ensureSize
> --
>
> Key: HIVE-14012
> URL: https://issues.apache.org/jira/browse/HIVE-14012
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14012.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14012) some ColumnVector-s are missing ensureSize

2016-06-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14012:

Attachment: HIVE-14012.patch

[~prasanth_j] [~mmccline] can you take a look?

Also, do List and Map vectors need ensureSize? It doesn't look like it, but I 
wonder if something needs to be done with child vectors.

> some ColumnVector-s are missing ensureSize
> --
>
> Key: HIVE-14012
> URL: https://issues.apache.org/jira/browse/HIVE-14012
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14012.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13985:
-
Attachment: HIVE-13985-branch-2.1.patch
HIVE-13985-branch-1.patch

Attaching branch-1 and branch-2.1 patches

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-2.1.patch, 
> HIVE-13985.1.patch, HIVE-13985.2.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330108#comment-15330108
 ] 

Sergey Shelukhin commented on HIVE-13901:
-

Um, no +1 for now, some feedback on RB

> Hivemetastore add partitions can be slow depending on filesystems
> -
>
> Key: HIVE-13901
> URL: https://issues.apache.org/jira/browse/HIVE-13901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch
>
>
> Depending on FS, creating external tables & adding partitions can be 
> expensive (e.g msck which adds all partitions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13985:
-
Attachment: HIVE-13985.2.patch

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985.1.patch, HIVE-13985.2.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329890#comment-15329890
 ] 

Hive QA commented on HIVE-14007:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12810403/HIVE-14007.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 35 failed/errored test(s), 9438 tests 
executed
*Failed tests:*
{noformat}
TestBitFieldReader - did not produce a TEST-*.xml file
TestBitPack - did not produce a TEST-*.xml file
TestColumnStatistics - did not produce a TEST-*.xml file
TestColumnStatisticsImpl - did not produce a TEST-*.xml file
TestDataReaderProperties - did not produce a TEST-*.xml file
TestDynamicArray - did not produce a TEST-*.xml file
TestFileDump - did not produce a TEST-*.xml file
TestInStream - did not produce a TEST-*.xml file
TestIntegerCompressionReader - did not produce a TEST-*.xml file
TestJsonFileDump - did not produce a TEST-*.xml file
TestMemoryManager - did not produce a TEST-*.xml file
TestNewIntegerEncoding - did not produce a TEST-*.xml file
TestOrcNullOptimization - did not produce a TEST-*.xml file
TestOrcTimezone1 - did not produce a TEST-*.xml file
TestOrcTimezone2 - did not produce a TEST-*.xml file
TestOrcWideTable - did not produce a TEST-*.xml file
TestOutStream - did not produce a TEST-*.xml file
TestRLEv2 - did not produce a TEST-*.xml file
TestReaderImpl - did not produce a TEST-*.xml file
TestRecordReaderImpl - did not produce a TEST-*.xml file
TestRunLengthByteReader - did not produce a TEST-*.xml file
TestRunLengthIntegerReader - did not produce a TEST-*.xml file
TestSerializationUtils - did not produce a TEST-*.xml file
TestStreamName - did not produce a TEST-*.xml file
TestStringDictionary - did not produce a TEST-*.xml file
TestStringRedBlackTree - did not produce a TEST-*.xml file
TestTypeDescription - did not produce a TEST-*.xml file
TestUnrolledBitPack - did not produce a TEST-*.xml file
TestVectorOrcFile - did not produce a TEST-*.xml file
TestZlib - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.ql.TestTxnCommands.testSimpleAcidInsert
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/123/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/123/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-123/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 35 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12810403 - PreCommit-HIVE-MASTER-Build

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329862#comment-15329862
 ] 

Prasanth Jayachandran commented on HIVE-13985:
--

Targeting this patch for branch-2.1. Since orc is moving out in HIVE-14007 will 
wait for master commit. [~ashutoshc] fyi.

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985.1.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13985:
-
Attachment: HIVE-13985.1.patch

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985.1.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13985:
-
Status: Patch Available  (was: Open)

> ORC improvements for reducing the file system calls in task side
> 
>
> Key: HIVE-13985
> URL: https://issues.apache.org/jira/browse/HIVE-13985
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13985.1.patch
>
>
> HIVE-13840 fixed some issues with addition file system invocations during 
> split generation. Similarly, this jira will fix issues with additional file 
> system invocations on the task side. To avoid reading footers on the task 
> side, users can set hive.orc.splits.include.file.footer to true which will 
> serialize the orc footers on the splits. But this has issues with serializing 
> unwanted information like column statistics and other metadata which are not 
> really required for reading orc split on the task side. We can reduce the 
> payload on the orc splits by serializing only the minimum required 
> information (stripe information, types, compression details). This will 
> decrease the payload on the orc splits and can potentially avoid OOMs in 
> application master (AM) during split generation. This jira also address other 
> issues concerning the AM cache. The local cache used by AM is soft reference 
> cache. This can introduce unpredictability across multiple runs of the same 
> query. We can cache the serialized footer in the local cache and also use 
> strong reference cache which should avoid memory pressure and will have 
> better predictability.
> One other improvement that we can do is when 
> hive.orc.splits.include.file.footer is set to false, on the task side we make 
> one additional file system call to know the size of the file. If we can 
> serialize the file length in the orc split this can be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13951) GenericUDFArray should constant fold at compile time

2016-06-14 Thread Sergey Zadoroshnyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Zadoroshnyak updated HIVE-13951:
---
Priority: Critical  (was: Major)

> GenericUDFArray should constant fold at compile time
> 
>
> Key: HIVE-13951
> URL: https://issues.apache.org/jira/browse/HIVE-13951
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Sergey Zadoroshnyak
>Priority: Critical
>
> 1. Hive constant propagation optimizer is enabled.  
> hive.optimize.constant.propagation=true;
> 2. Hive query: 
> select array('Total','Total') from some_table;
> ERROR: org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory 
> (ConstantPropagateProcFactory.java:evaluateFunction(939)) - Unable to 
> evaluate org.apache.hadoop.hive.ql.udf.generic.GenericUDFArray@3d26c423. 
> Return value unrecoginizable.
> Details:
> During compilation of query, hive checks if any subexpression of a specified 
> expression can be evaluated to be constant and replaces such subexpression 
> with the constant.
> If the expression is a deterministic UDF and all the subexpressions are 
> constants, the value will be calculated immediately during compilation time 
> (not runtime)
> So array is a deterministic UDF,  'Total' is string constant. So Hive tries 
> to replace result of evaluation UDF with the constant.
> But looks like, that Hive only supports primitives and struct objects.
> So, array is not supported yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329638#comment-15329638
 ] 

Sergio Peña commented on HIVE-13964:


Thanks [~ayousufi]. This is working very good.

Now the only problem is the test {{TestBeeLineWithArgs}} that fail on HiveQA. 
Whenever you see a {{did not produce a TEST-*.xml file}} message, it means that 
the test was taking too long and PTest had to kill the process. Currently, we 
have 40m of expiration time to run a test.

Could you take a look at it? Maybe there are some tests that are waiting for 
user/pass to be passed, and they are hanging the test execution.

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch, 
> HIVE-13964.03.patch, HIVE-13964.04.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14007) Replace ORC module with ORC release

2016-06-14 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14007:
-
Attachment: HIVE-14007.patch

This patch makes the change and deletes the files.

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14007) Replace ORC module with ORC release

2016-06-14 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14007:
-
Status: Patch Available  (was: Open)

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13928) Hive2: float value need to be single quoted inside where clause to return rows when it doesn't have to be

2016-06-14 Thread Takahiko Saito (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329567#comment-15329567
 ] 

Takahiko Saito commented on HIVE-13928:
---

I don't think anyone is working on it. Cc: [~mmccline] [~jdere]

> Hive2: float value need to be single quoted inside where clause to return 
> rows when it doesn't have to be
> -
>
> Key: HIVE-13928
> URL: https://issues.apache.org/jira/browse/HIVE-13928
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Takahiko Saito
>Priority: Critical
>
> The below select where with float value does not return any row:
> {noformat}
> 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> drop table test;
> No rows affected (0.212 seconds)
> 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> create table test (f float);
> No rows affected (1.131 seconds)
> 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> insert into table test values 
> (-35664.76),(29497.34);
> No rows affected (2.482 seconds)
> 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> select * from test;
> ++--+
> |   test.f   |
> ++--+
> | -35664.76  |
> | 29497.34   |
> ++--+
> 2 rows selected (0.142 seconds)
> 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> select * from test where f = 
> -35664.76;
> +-+--+
> | test.f  |
> +-+--+
> +-+--+
> {noformat}
> The workaround is to single quote float value:
> {noformat}
> 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> select * from test where f = 
> '-35664.76';
> ++--+
> |   test.f   |
> ++--+
> | -35664.76  |
> ++--+
> 1 row selected (0.163 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-06-14 Thread Reuben Kuhnert (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329382#comment-15329382
 ] 

Reuben Kuhnert commented on HIVE-13966:
---

Looking at this pattern in a number of metastore functions:

{code}
if (!success) {
  ms.rollbackTransaction();
  if (madeDir) {
wh.deleteDir(tblPath, true);
  }
}
for (MetaStoreEventListener listener : listeners) {
  CreateTableEvent createTableEvent =
  new CreateTableEvent(tbl, success, this);
  createTableEvent.setEnvironmentContext(envContext);
  listener.onCreateTable(createTableEvent);
}
{code}

I'm noticing that {{DBNotificationListener}} is a subclass of 
{{MetastoreEventListener}}. When you say we should not require bringing all 
post event listeners into the transaction (but we do want to bring in 
{{DbNotificationListener}}), would that mean having a separate hierarchy for 
those listeners that *should* be part of the transaction? Is that what is meant 
by 'synchronous' (part of the transaction) or do we mean 'synchronous' as in 
not queued for processing later, per:

{code}
 * Design overview:  This listener takes any event, builds a 
NotificationEventResponse,
 * and puts it on a queue.  There is a dedicated thread that reads entries from 
the queue and
 * places them in the database.  The reason for doing it in a separate thread 
is that we want to
 * avoid slowing down other metadata operations with the work of putting the 
notification into
 * the database.  Also, occasionally the thread needs to clean the database of 
old records.  We
 * definitely don't want to do that as part of another metadata operation.
 */
public class DbNotificationListener extends MetaStoreEventListener {
{code}

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Priority: Critical
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13735) Query involving only partition columns need not launch mr/tez job

2016-06-14 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329349#comment-15329349
 ] 

Rajesh Balamohan commented on HIVE-13735:
-

Thanks [~Takuma] - I have updated the assignee.

> Query involving only partition columns need not launch mr/tez job
> -
>
> Key: HIVE-13735
> URL: https://issues.apache.org/jira/browse/HIVE-13735
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Takuma Wakamori
>
> codebase: hive master
> dataset: tpc-ds 10 TB scale
> e.g queries:
> {noformat}
> hive> show partitions web_sales;
> ...
> ...
> Time taken: 0.13 seconds, Fetched: 1824 row(s)
> hive> select distinct ws_sold_date_sk from web_sales;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 2.70 s
> --
> Status: DAG finished successfully in 2.70 seconds
> ..
> Time taken: 3.964 seconds, Fetched: 1824 row(s)
> hive> select distinct ws_sold_date_sk from web_sales order by ws_sold_date_sk;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED80180100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 3 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 23.05 s
> --
> Status: DAG finished successfully in 23.05 seconds
> ...
> Time taken: 27.095 seconds, Fetched: 1824 row(s)
> {noformat}
> since the info is already available in metastore, it might not need to launch 
> these jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13735) Query involving only partition columns need not launch mr/tez job

2016-06-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-13735:

Assignee: Takuma Wakamori

> Query involving only partition columns need not launch mr/tez job
> -
>
> Key: HIVE-13735
> URL: https://issues.apache.org/jira/browse/HIVE-13735
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Takuma Wakamori
>
> codebase: hive master
> dataset: tpc-ds 10 TB scale
> e.g queries:
> {noformat}
> hive> show partitions web_sales;
> ...
> ...
> Time taken: 0.13 seconds, Fetched: 1824 row(s)
> hive> select distinct ws_sold_date_sk from web_sales;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 2.70 s
> --
> Status: DAG finished successfully in 2.70 seconds
> ..
> Time taken: 3.964 seconds, Fetched: 1824 row(s)
> hive> select distinct ws_sold_date_sk from web_sales order by ws_sold_date_sk;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED80180100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 3 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 23.05 s
> --
> Status: DAG finished successfully in 23.05 seconds
> ...
> Time taken: 27.095 seconds, Fetched: 1824 row(s)
> {noformat}
> since the info is already available in metastore, it might not need to launch 
> these jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-06-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13903:
---
Fix Version/s: 2.2.0

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, 
> HIVE-13903.02.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-14 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329237#comment-15329237
 ] 

Jesus Camacho Rodriguez commented on HIVE-13982:


[~ashutoshc], fails are unrelated. Could you review the patch? Thanks

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-14 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329225#comment-15329225
 ] 

Jesus Camacho Rodriguez commented on HIVE-13957:


[~sershe], fix can be pushed to branch-2.1 and fix version set to 2.1.1. About 
2.1.0, it is waiting for you vote! :p Thanks

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.2.0, 2.0.2
>
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13735) Query involving only partition columns need not launch mr/tez job

2016-06-14 Thread Takuma Wakamori (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329101#comment-15329101
 ] 

Takuma Wakamori commented on HIVE-13735:


Hi, [~rajesh.balamohan].
If there is no one working on this issue, I will try.

However, I imagine that my solution may be ad-hoc; the corresponding code will 
be executed only under the condition like:
(the query is select) && (have distinct clause) && (select_expr is a 
partitioning key) && (...).

If it is OK, could anyone assign me to this issue? Thanks.

> Query involving only partition columns need not launch mr/tez job
> -
>
> Key: HIVE-13735
> URL: https://issues.apache.org/jira/browse/HIVE-13735
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>
> codebase: hive master
> dataset: tpc-ds 10 TB scale
> e.g queries:
> {noformat}
> hive> show partitions web_sales;
> ...
> ...
> Time taken: 0.13 seconds, Fetched: 1824 row(s)
> hive> select distinct ws_sold_date_sk from web_sales;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 2.70 s
> --
> Status: DAG finished successfully in 2.70 seconds
> ..
> Time taken: 3.964 seconds, Fetched: 1824 row(s)
> hive> select distinct ws_sold_date_sk from web_sales order by ws_sold_date_sk;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED80180100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 3 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 03/03  [==>>] 100%  ELAPSED TIME: 23.05 s
> --
> Status: DAG finished successfully in 23.05 seconds
> ...
> Time taken: 27.095 seconds, Fetched: 1824 row(s)
> {noformat}
> since the info is already available in metastore, it might not need to launch 
> these jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues

2016-06-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329052#comment-15329052
 ] 

Siddharth Seth commented on HIVE-14003:
---

[~hagleitn] - mind taking a look at the patch, and providing some more 
information on dummyOps / mergeOps.

An interrupt would ideally stop an opeartion - however it's really a 
suggestion, and we cannot rely on libraries to handle them correctly. I suspect 
most of Hadoop has issues here. An HDFS jira was created and has already been 
fixed.
The abort flag serves to protect against operations which reset the interrupt 
status - which is where the avoid blocking op comment comes in. In most cases 
we'll be OK, with an abort flag check.

> queries running against llap hang at times - preemption issues
> --
>
> Key: HIVE-14003
> URL: https://issues.apache.org/jira/browse/HIVE-14003
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.1.0
>Reporter: Takahiko Saito
>Assignee: Siddharth Seth
> Attachments: HIVE-14003.01.patch
>
>
> The preemption logic in the Hive processor needs some more work. There are 
> definitely windows where the abort flag is completely dropped within the Hive 
> processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13986) LLAP: kill Tez AM on token errors from plugin

2016-06-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329021#comment-15329021
 ] 

Siddharth Seth commented on HIVE-13986:
---

+1

> LLAP: kill Tez AM on token errors from plugin
> -
>
> Key: HIVE-13986
> URL: https://issues.apache.org/jira/browse/HIVE-13986
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13986.01.patch, HIVE-13986.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14008) Duplicate line in LLAP SecretManager

2016-06-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329016#comment-15329016
 ] 

Siddharth Seth commented on HIVE-14008:
---

+1

> Duplicate line in LLAP SecretManager
> 
>
> Key: HIVE-14008
> URL: https://issues.apache.org/jira/browse/HIVE-14008
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Attachments: HIVE-14008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14010) parquet-logging.properties from HIVE_CONF_DIR should be used when available

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14010:
-
Attachment: HIVE-14010.1.patch

[~hagleitn] Can you plz take a look? Its related to the previous logging issue.

> parquet-logging.properties from HIVE_CONF_DIR should be used when available
> ---
>
> Key: HIVE-14010
> URL: https://issues.apache.org/jira/browse/HIVE-14010
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14010.1.patch
>
>
> Following up on HIVE-13954, when parquet-logging.properties is available in 
> HIVE_CONF_DIR it should be used first. When not available fallback to 
> relative path from bin directory.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14010) parquet-logging.properties from HIVE_CONF_DIR should be used when available

2016-06-14 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14010:
-
Description: 
Following up on HIVE-13954, when parquet-logging.properties is available in 
HIVE_CONF_DIR it should be used first. When not available fallback to relative 
path from bin directory.

NO PRECOMMIT TESTS

  was:Following up on HIVE-13954, when parquet-logging.properties is available 
in HIVE_CONF_DIR it should be used first. When not available fallback to 
relative path from bin directory.


> parquet-logging.properties from HIVE_CONF_DIR should be used when available
> ---
>
> Key: HIVE-14010
> URL: https://issues.apache.org/jira/browse/HIVE-14010
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Following up on HIVE-13954, when parquet-logging.properties is available in 
> HIVE_CONF_DIR it should be used first. When not available fallback to 
> relative path from bin directory.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13956) LLAP: external client output is writing to channel before it is writable again

2016-06-14 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328992#comment-15328992
 ] 

Prasanth Jayachandran commented on HIVE-13956:
--

[~jdere] Can you please rebase the patch? The test failures looks unrelated.

> LLAP: external client output is writing to channel before it is writable again
> --
>
> Key: HIVE-13956
> URL: https://issues.apache.org/jira/browse/HIVE-13956
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-13956.1.patch
>
>
> Rows are being written/flushed on the output channel without checking if the 
> channel is writable. Introduce a writability check/wait.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in

2016-06-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328989#comment-15328989
 ] 

Hive QA commented on HIVE-13964:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12810164/HIVE-13964.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10191 tests 
executed
*Failed tests:*
{noformat}
TestBeeLineWithArgs - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/120/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/120/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-120/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12810164 - PreCommit-HIVE-MASTER-Build

> Add a parameter to beeline to allow a properties file to be passed in
> -
>
> Key: HIVE-13964
> URL: https://issues.apache.org/jira/browse/HIVE-13964
> Project: Hive
>  Issue Type: New Feature
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch, 
> HIVE-13964.03.patch, HIVE-13964.04.patch
>
>
> HIVE-6652 removed the ability to pass in a properties file as a beeline 
> parameter. It may be a useful feature to be able to pass the file in is a 
> parameter, such as --property-file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call

2016-06-14 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-13903:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   Status: Resolved  (was: Patch Available)

Committed. Thanks [~prongs]. 

Thanks [~jcamachorodriguez] for review.

> getFunctionInfo is downloading jar on every call
> 
>
> Key: HIVE-13903
> URL: https://issues.apache.org/jira/browse/HIVE-13903
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Fix For: 2.1.1
>
> Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, 
> HIVE-13903.02.patch
>
>
> on queries using permanent udfs, the jar file of the udf is downloaded 
> multiple times. Each call originating from Registry.getFunctionInfo. This 
> increases time for the query, especially if that query is just an explain 
> query. The jar should be downloaded once, and not downloaded again if the udf 
> class is accessible in the current thread. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)