date:20170321

[jira] [Commented] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-21 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935783#comment-15935783
 ] 

Lefty Leverenz commented on HIVE-16178:
---

Doc note:  The descriptions of corr() and covar_samp() need to be updated in 
the wiki, with version information.

* [Hive Operators and UDFs -- Built-in Aggregate Functions (UDAF) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)]

Added a TODOC2.2 label.

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-16178.1.patch, HIVE-16178.2.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-21 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16178:
--
Labels: TODOC2.2  (was: )

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-16178.1.patch, HIVE-16178.2.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16229) Wrong result for correlated scalar subquery with aggregate

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935772#comment-15935772
 ] 

Hive QA commented on HIVE-16229:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859826/HIVE-16229.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10497 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query1] (batchId=226)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query30] 
(batchId=226)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query6] (batchId=226)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query81] 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4281/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4281/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4281/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859826 - PreCommit-HIVE-Build

> Wrong result for correlated scalar subquery with aggregate
> --
>
> Key: HIVE-16229
> URL: https://issues.apache.org/jira/browse/HIVE-16229
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16229.1.patch
>
>
> Query:
> {code:SQL}
> select * from part where p_size > (select count(*) from part p where p.p_mfgr 
> = part.p_mfgr group by p_type);
> {code}
> Expected results:
> {code}
> ERROR: more than one row produced by subquery
> {code}
> Actual
> {code}
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 48427 almond antique violet mint lemonManufacturer#4  Brand#42
> PROMO POLISHED STEEL39  SM CASE 1375.42 hely ironic i
> 48427 almond antique violet mint lemonManufacturer#4  Brand#42
> PROMO POLISHED STEEL39  SM CASE 1375.42 hely ironic i
> 
> Time taken: 13.742 seconds, Fetched: 123 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16274) Support tuning of NDV of columns using lower/upper bounds

2017-03-21 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935760#comment-15935760
 ] 

Pengcheng Xiong commented on HIVE-16274:


ccing [~ashutoshc] and [~jdere]

> Support tuning of NDV of columns using lower/upper bounds
> -
>
> Key: HIVE-16274
> URL: https://issues.apache.org/jira/browse/HIVE-16274
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16274.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16274) Support tuning of NDV of columns using lower/upper bounds

2017-03-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16274:
---
Status: Patch Available  (was: Open)

> Support tuning of NDV of columns using lower/upper bounds
> -
>
> Key: HIVE-16274
> URL: https://issues.apache.org/jira/browse/HIVE-16274
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16274.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16274) Support tuning of NDV of columns using lower/upper bounds

2017-03-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16274:
---
Attachment: HIVE-16274.01.patch

> Support tuning of NDV of columns using lower/upper bounds
> -
>
> Key: HIVE-16274
> URL: https://issues.apache.org/jira/browse/HIVE-16274
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16274.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16274) Support tuning of NDV of columns using lower/upper bounds

2017-03-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16274:
---
Summary: Support tuning of NDV of columns using lower/upper bounds  (was: 
Improve column stats computation using density function)

> Support tuning of NDV of columns using lower/upper bounds
> -
>
> Key: HIVE-16274
> URL: https://issues.apache.org/jira/browse/HIVE-16274
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16274.01.patch
>
>
> to take into consideration of row counts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16274) Support tuning of NDV of columns using lower/upper bounds

2017-03-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16274:
---
Description: (was: to take into consideration of row counts.)

> Support tuning of NDV of columns using lower/upper bounds
> -
>
> Key: HIVE-16274
> URL: https://issues.apache.org/jira/browse/HIVE-16274
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16274.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16276) Add hadoop-aws as a dependency so Hive has out of the box support for running against S3

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935755#comment-15935755
 ] 

Hive QA commented on HIVE-16276:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859825/HIVE-16276.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 971 failed/errored test(s), 6610 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_custom_key2]
 (batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_custom_key]
 (batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_joins] 
(batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=222)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=222)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[drop_with_concurrency]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=231)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
 (batchId=43)

[jira] [Commented] (HIVE-16275) Vectorization: Add ReduceSink support for TopN (in specialized native classes)

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935734#comment-15935734
 ] 

Hive QA commented on HIVE-16275:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859817/HIVE-16275.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 133 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_2] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_mapjoin1] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_simple] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_coalesce_2] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_count] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_data_types] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_aggregate]
 (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_expressions]
 (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round_2] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_distinct_2] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_empty_where] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby4] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby6] 
(batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_3] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_if_expr] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_include_no_sel] 
(batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_1] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_arithmetic]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_non_string_partition]
 (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_orderby_5] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join1] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join2] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join3] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_outer_join4] 
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_reduce1] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_reduce2] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_reduce3] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_reduce_groupby_decimal]
 (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_string_concat] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_varchar_simple] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_when_case_null] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_7] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_8] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_div0] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_limit] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_offset_limit]
 (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_case] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_date_funcs] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_mapjoin2] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_timestamp] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_timestamp_funcs]
 (batchId=28)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_adaptor_usage_mode]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_auto_smb_mapjoin_14]
 (batchId=146)

[jira] [Resolved] (HIVE-14309) Fix naming of classes in orc module to not conflict with standalone orc

2017-03-21 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-14309.
--
  Resolution: Won't Fix
Target Version/s:   (was: 2.0.2, 2.1.2)

I've found a way to use hive-storage 2.2.1 and orc-core 1.3.3 to work with hive 
2.1. 

> Fix naming of classes in orc module to not conflict with standalone orc
> ---
>
> Key: HIVE-14309
> URL: https://issues.apache.org/jira/browse/HIVE-14309
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> The current Hive 2.0 and 2.1 releases have classes in the org.apache.orc 
> namespace that clash with the ORC project's classes. From Hive 2.2 onward, 
> the classes will only be on ORC, but we'll reduce the problems of classpath 
> issues if we rename the classes to org.apache.hive.orc.
> I've looked at a set of projects (pig, spark, oozie, flume, & storm) and 
> can't find any uses of Hive's versions of the org.apache.orc classes, so I 
> believe this is a safe change that will reduce the integration problems down 
> stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15691:
--
Attachment: HIVE-15691.5.patch

HIVE-15691.5.patch is same as 4 - trying to get hive2 build going 

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691.4.patch, HIVE-15691.5.patch, 
> HIVE-15691-branch-1.2.patch, HIVE-15691-branch-1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15691:
-

Assignee: Kalyan  (was: Eugene Koifman)

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
>Priority: Critical
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691.4.patch, HIVE-15691.5.patch, 
> HIVE-15691-branch-1.2.patch, HIVE-15691-branch-1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15929) Fix HiveDecimalWritable to be compatible with Hive 2.1

2017-03-21 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15929:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

This was committed. 

> Fix HiveDecimalWritable to be compatible with Hive 2.1
> --
>
> Key: HIVE-15929
> URL: https://issues.apache.org/jira/browse/HIVE-15929
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-15929.patch
>
>
> HIVE-15335 broke compatibility with Hive 2.1 by making 
> HiveDecimalWritable.getInternalStorate() throw an exception when called on an 
> unset value. It is easy to instead return an empty array, which will allow 
> the old code to allocate a new array.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15691:
-

Assignee: Eugene Koifman  (was: Kalyan)

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691.4.patch, HIVE-15691-branch-1.2.patch, 
> HIVE-15691-branch-1.patch, HIVE-15691.patch, HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HIVE-5725) Separate out ql code from exec jar

2017-03-21 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-5725.
-
Resolution: Duplicate

This has already been done.

> Separate out ql code from exec jar
> --
>
> Key: HIVE-5725
> URL: https://issues.apache.org/jira/browse/HIVE-5725
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> We should publish our code independently from our dependencies. Since the 
> exec jar has to include the runtime dependencies, I'd propose that we make 
> two jars, a ql jar and and exec jar.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15841) Upgrade Hive to ORC 1.3.3

2017-03-21 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15841:
-
Attachment: HIVE-15841.patch

Updated the last test case output.

> Upgrade Hive to ORC 1.3.3
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15841.patch, HIVE-15841.patch, HIVE-15841.patch
>
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.3 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935690#comment-15935690
 ] 

Hive QA commented on HIVE-15691:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859814/HIVE-15691-branch-1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 161 failed/errored test(s), 8085 tests 
executed
*Failed tests:*
{noformat}
TestAcidOnTez - did not produce a TEST-*.xml file (likely timed out) 
(batchId=376)
TestAdminUser - did not produce a TEST-*.xml file (likely timed out) 
(batchId=358)
TestAuthorizationPreEventListener - did not produce a TEST-*.xml file (likely 
timed out) (batchId=391)
TestAuthzApiEmbedAuthorizerInEmbed - did not produce a TEST-*.xml file (likely 
timed out) (batchId=368)
TestAuthzApiEmbedAuthorizerInRemote - did not produce a TEST-*.xml file (likely 
timed out) (batchId=374)
TestBeeLineWithArgs - did not produce a TEST-*.xml file (likely timed out) 
(batchId=398)
TestCLIAuthzSessionContext - did not produce a TEST-*.xml file (likely timed 
out) (batchId=416)
TestClearDanglingScratchDir - did not produce a TEST-*.xml file (likely timed 
out) (batchId=383)
TestClientSideAuthorizationProvider - did not produce a TEST-*.xml file (likely 
timed out) (batchId=390)
TestCompactor - did not produce a TEST-*.xml file (likely timed out) 
(batchId=379)
TestCreateUdfEntities - did not produce a TEST-*.xml file (likely timed out) 
(batchId=378)
TestCustomAuthentication - did not produce a TEST-*.xml file (likely timed out) 
(batchId=399)
TestDBTokenStore - did not produce a TEST-*.xml file (likely timed out) 
(batchId=342)
TestDDLWithRemoteMetastoreSecondNamenode - did not produce a TEST-*.xml file 
(likely timed out) (batchId=377)
TestDynamicSerDe - did not produce a TEST-*.xml file (likely timed out) 
(batchId=345)
TestEmbeddedHiveMetaStore - did not produce a TEST-*.xml file (likely timed 
out) (batchId=355)
TestEmbeddedThriftBinaryCLIService - did not produce a TEST-*.xml file (likely 
timed out) (batchId=402)
TestFilterHooks - did not produce a TEST-*.xml file (likely timed out) 
(batchId=350)
TestFolderPermissions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=385)
TestHS2AuthzContext - did not produce a TEST-*.xml file (likely timed out) 
(batchId=419)
TestHS2AuthzSessionContext - did not produce a TEST-*.xml file (likely timed 
out) (batchId=420)
TestHS2ClearDanglingScratchDir - did not produce a TEST-*.xml file (likely 
timed out) (batchId=406)
TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file (likely 
timed out) (batchId=407)
TestHiveAuthorizerCheckInvocation - did not produce a TEST-*.xml file (likely 
timed out) (batchId=394)
TestHiveAuthorizerShowFilters - did not produce a TEST-*.xml file (likely timed 
out) (batchId=393)
TestHiveHistory - did not produce a TEST-*.xml file (likely timed out) 
(batchId=396)
TestHiveMetaStoreTxns - did not produce a TEST-*.xml file (likely timed out) 
(batchId=370)
TestHiveMetaStoreWithEnvironmentContext - did not produce a TEST-*.xml file 
(likely timed out) (batchId=360)
TestHiveMetaTool - did not produce a TEST-*.xml file (likely timed out) 
(batchId=373)
TestHiveServer2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=422)
TestHiveServer2SessionTimeout - did not produce a TEST-*.xml file (likely timed 
out) (batchId=423)
TestHiveSessionImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=403)
TestHs2Hooks - did not produce a TEST-*.xml file (likely timed out) 
(batchId=375)
TestHs2HooksWithMiniKdc - did not produce a TEST-*.xml file (likely timed out) 
(batchId=451)
TestJdbcDriver2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=410)
TestJdbcMetadataApiAuth - did not produce a TEST-*.xml file (likely timed out) 
(batchId=421)
TestJdbcWithLocalClusterSpark - did not produce a TEST-*.xml file (likely timed 
out) (batchId=415)
TestJdbcWithMiniHS2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=412)
TestJdbcWithMiniKdc - did not produce a TEST-*.xml file (likely timed out) 
(batchId=448)
TestJdbcWithMiniKdcCookie - did not produce a TEST-*.xml file (likely timed 
out) (batchId=447)
TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file (likely 
timed out) (batchId=445)
TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file (likely 
timed out) (batchId=450)
TestJdbcWithMiniMr - did not produce a TEST-*.xml file (likely timed out) 
(batchId=411)
TestJdbcWithSQLAuthUDFBlacklist - did not produce a TEST-*.xml file (likely 
timed out) (batchId=417)
TestJdbcWithSQLAuthorization - did not produce a TEST-*.xml file (likely timed 
out) (batchId=418)
TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=382)
TestMTQueries - did not produce a TEST-*.xml file (likely timed out)

[jira] [Updated] (HIVE-16273) Vectorization: Make non-column key expressions work in MERGEPARTIAL mode

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16273:

Status: In Progress  (was: Patch Available)

> Vectorization: Make non-column key expressions work in MERGEPARTIAL mode
> 
>
> Key: HIVE-16273
> URL: https://issues.apache.org/jira/browse/HIVE-16273
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16273.01.patch, HIVE-16273.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16273) Vectorization: Make non-column key expressions work in MERGEPARTIAL mode

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16273:

Status: Patch Available  (was: In Progress)

> Vectorization: Make non-column key expressions work in MERGEPARTIAL mode
> 
>
> Key: HIVE-16273
> URL: https://issues.apache.org/jira/browse/HIVE-16273
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16273.01.patch, HIVE-16273.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16273) Vectorization: Make non-column key expressions work in MERGEPARTIAL mode

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16273:

Attachment: HIVE-16273.02.patch

> Vectorization: Make non-column key expressions work in MERGEPARTIAL mode
> 
>
> Key: HIVE-16273
> URL: https://issues.apache.org/jira/browse/HIVE-16273
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16273.01.patch, HIVE-16273.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2017-03-21 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935686#comment-15935686
 ] 

Rui Li commented on HIVE-14919:
---

[~stakiar], [~kellyzly] we're using RDD with type like . That's inline with MR (probably Tez too). And I think that's 
what Hive operators and SerDe expect.
I don't know much about the DataFrame/DataSet API, but from the discussion 
above, it seems to require a lot of work. [~xuefuz] could you share your 
thoughts?

> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 1 TB data set comparing with 
> Spark 1.6. We can see performance improvments about 5.4% in general and 45% 
> for the best case. However, some queries doesn't have significant performance 
> improvements.  This JIRA is the umbrella ticket addressing those performance 
> issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16278) LLAP: metadata cache may incorrectly decrease memory usage in mem manager

2017-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16278:

Status: Patch Available  (was: Open)

> LLAP: metadata cache may incorrectly decrease memory usage in mem manager
> -
>
> Key: HIVE-16278
> URL: https://issues.apache.org/jira/browse/HIVE-16278
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16278.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16278) LLAP: metadata cache may incorrectly decrease memory usage in mem manager

2017-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16278:

Attachment: HIVE-16278.patch

[~gopalv] can you take a look?
metadata cache mistakengly calls reserve with no-wait flag (which is only there 
for unit tests...), and then doesn't check the result. So, if the first 
eviction fails, it will ignore it and put object in the map anyway; then if 
there's a collision in the map, it will release the memory that has never been 
reserved. 
Unfortunately I'm not sure that's non-exotic enough to cause the issue in your 
cluster. Full LLAP log would be helpful, to examine error callstacks. Elevator 
threads are never interrupted, and only check stop between the calls to read 
one split; in addition location of reserve/release calls in relation to 
allocate/deallocate should make such a situation (memory manager thinks we have 
memory left and doesn't evict, but actually we are fully allocated, also with 
plenty to evict) close to impossible; even if processing is interrupted somehow 
and we lose a buffer, it should be consistently wrong with no mismatch between 
manager and allocator. It would be interesting to look at errors/interrupts to 
see what could have been wrong, in case there's another issue beside what this 
patch fixes.

> LLAP: metadata cache may incorrectly decrease memory usage in mem manager
> -
>
> Key: HIVE-16278
> URL: https://issues.apache.org/jira/browse/HIVE-16278
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16278.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-16278) LLAP: metadata cache may incorrectly decrease memory usage in mem manager

2017-03-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935682#comment-15935682
 ] 

Sergey Shelukhin edited comment on HIVE-16278 at 3/22/17 2:31 AM:
--

[~gopalv] can you take a look?
metadata cache mistakengly calls reserve with no-wait flag (which is only there 
for unit tests...), and then doesn't check the result. So, if the first 
eviction fails, reserve will fail, but the cache will ignore it and put the 
object in the map anyway; then if there's a collision in the map, it will 
release the memory that has never been reserved. 
Unfortunately I'm not sure that's non-exotic enough to cause the issue in your 
cluster. Full LLAP log would be helpful, to examine error callstacks. Elevator 
threads are never interrupted, and only check stop between the calls to read 
one split; in addition location of reserve/release calls in relation to 
allocate/deallocate should make such a situation (memory manager thinks we have 
memory left and doesn't evict, but actually we are fully allocated, also with 
plenty to evict) close to impossible; even if processing is interrupted somehow 
and we lose a buffer, it should be consistently wrong with no mismatch between 
manager and allocator. It would be interesting to look at errors/interrupts to 
see what could have been wrong, in case there's another issue beside what this 
patch fixes.


was (Author: sershe):
[~gopalv] can you take a look?
metadata cache mistakengly calls reserve with no-wait flag (which is only there 
for unit tests...), and then doesn't check the result. So, if the first 
eviction fails, it will ignore it and put object in the map anyway; then if 
there's a collision in the map, it will release the memory that has never been 
reserved. 
Unfortunately I'm not sure that's non-exotic enough to cause the issue in your 
cluster. Full LLAP log would be helpful, to examine error callstacks. Elevator 
threads are never interrupted, and only check stop between the calls to read 
one split; in addition location of reserve/release calls in relation to 
allocate/deallocate should make such a situation (memory manager thinks we have 
memory left and doesn't evict, but actually we are fully allocated, also with 
plenty to evict) close to impossible; even if processing is interrupted somehow 
and we lose a buffer, it should be consistently wrong with no mismatch between 
manager and allocator. It would be interesting to look at errors/interrupts to 
see what could have been wrong, in case there's another issue beside what this 
patch fixes.

> LLAP: metadata cache may incorrectly decrease memory usage in mem manager
> -
>
> Key: HIVE-16278
> URL: https://issues.apache.org/jira/browse/HIVE-16278
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16278.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15221) Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception

2017-03-21 Thread Fei Hui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HIVE-15221:
---
Description: 
i see in the current master version
{code:title=MapJoinMemoryExhaustionHandler.java|borderStyle=solid}
  public void checkMemoryStatus(long tableContainerSize, long numRows)
  throws MapJoinMemoryExhaustionException {
long usedMemory = memoryMXBean.getHeapMemoryUsage().getUsed();
double percentage = (double) usedMemory / (double) maxHeapSize;
String msg = Utilities.now() + "\tProcessing rows:\t" + numRows + 
"\tHashtable size:\t"
+ tableContainerSize + "\tMemory usage:\t" + usedMemory + 
"\tpercentage:\t" + percentageNumberFormat.format(percentage);
console.printInfo(msg);
if(percentage > maxMemoryUsage) {
  throw new MapJoinMemoryExhaustionException(msg);
}
   }
{code}

if  {{percentage > maxMemoryUsage}}, then throw MapJoinMemoryExhaustionException

in my opinion, running is better than fail. after System.gc, ' if percentage > 
maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better

And original checking way has a problem: 1) consuming much memory cause gc (e.g 
young gc), then check after adding row and pass. 2) consuming much memory does 
not cause gc, then check after adding rows but throw Exception

sometimes 2) occurs, but it contians less rows than 1).

  was:
i see in the current master version

 percentage = (double) usedMemory / (double) maxHeapSize;

if  percentage > maxMemoryUsage, then throw MapJoinMemoryExhaustionException

in my opinion, running is better than fail. after System.gc, ' if percentage > 
maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better

And original checking way has a problem: 1) consuming much memory cause gc (e.g 
young gc), then check after adding row and pass. 2) consuming much memory does 
not cause gc, then check after adding rows but throw Exception

sometimes 2) occurs, but it contians less rows than 1).


> Improvement for MapJoin checkMemoryStatus, adding gc before throwing Exception
> --
>
> Key: HIVE-15221
> URL: https://issues.apache.org/jira/browse/HIVE-15221
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15221.1.patch, stat_gc.png
>
>
> i see in the current master version
> {code:title=MapJoinMemoryExhaustionHandler.java|borderStyle=solid}
>   public void checkMemoryStatus(long tableContainerSize, long numRows)
>   throws MapJoinMemoryExhaustionException {
> long usedMemory = memoryMXBean.getHeapMemoryUsage().getUsed();
> double percentage = (double) usedMemory / (double) maxHeapSize;
> String msg = Utilities.now() + "\tProcessing rows:\t" + numRows + 
> "\tHashtable size:\t"
> + tableContainerSize + "\tMemory usage:\t" + usedMemory + 
> "\tpercentage:\t" + percentageNumberFormat.format(percentage);
> console.printInfo(msg);
> if(percentage > maxMemoryUsage) {
>   throw new MapJoinMemoryExhaustionException(msg);
> }
>}
> {code}
> if  {{percentage > maxMemoryUsage}}, then throw 
> MapJoinMemoryExhaustionException
> in my opinion, running is better than fail. after System.gc, ' if percentage 
> > maxMemoryUsage, then throw MapJoinMemoryExhaustionException' maybe better
> And original checking way has a problem: 1) consuming much memory cause gc 
> (e.g young gc), then check after adding row and pass. 2) consuming much 
> memory does not cause gc, then check after adding rows but throw Exception
> sometimes 2) occurs, but it contians less rows than 1).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16239) remove useless hiveserver

2017-03-21 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935680#comment-15935680
 ] 

Fei Hui commented on HIVE-16239:


CC [~vgumashta]

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16278) LLAP: metadata cache may incorrectly decrease memory usage in mem manager

2017-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16278:
---


> LLAP: metadata cache may incorrectly decrease memory usage in mem manager
> -
>
> Key: HIVE-16278
> URL: https://issues.apache.org/jira/browse/HIVE-16278
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Attachment: HIVE-15784.06.patch

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch, HIVE-15784.04.patch, HIVE-15784.05.patch, 
> HIVE-15784.06.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Status: Patch Available  (was: In Progress)

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch, HIVE-15784.04.patch, HIVE-15784.05.patch, 
> HIVE-15784.06.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15784) Vectorization: Turn on text vectorization by default

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15784:

Status: In Progress  (was: Patch Available)

> Vectorization: Turn on text vectorization by default
> 
>
> Key: HIVE-15784
> URL: https://issues.apache.org/jira/browse/HIVE-15784
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15784.01.patch, HIVE-15784.02.patch, 
> HIVE-15784.03.patch, HIVE-15784.04.patch, HIVE-15784.05.patch
>
>
> *Turn ON text vectorization related variables* 
> hive.vectorized.use.vector.serde.deserialize and 
> hive.vectorized.use.row.serde.deserialize by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15841) Upgrade Hive to ORC 1.3.3

2017-03-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935628#comment-15935628
 ] 

Prasanth Jayachandran commented on HIVE-15841:
--

union_fast_stats failure seems to be related. 

> Upgrade Hive to ORC 1.3.3
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15841.patch, HIVE-15841.patch
>
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.3 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935626#comment-15935626
 ] 

Hive QA commented on HIVE-16222:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859800/HIVE-16222.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mergejoin] (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[structin] (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[tez_join_hash] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_bucket] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_cast_constant] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_2] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_round] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby4] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby6] 
(batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_mapjoin] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mapjoin_reduce] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_mr_diff_schema_alias]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_orderby_5] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_reduce_groupby_decimal]
 (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_string_concat] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_tablesample_rows] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_character_length]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf_octet_length] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_13] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_14] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_15] 
(batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_limit] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_date_funcs] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_shufflejoin] 
(batchId=68)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_opt_vectorization]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization2]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_hash]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_vector_dynpart_hashjoin_2]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_bucket]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_round]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_mapjoin]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_udf_character_length]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_udf_octet_length]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_bucketmapjoin1]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_join46]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=94)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=130)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4277/testReport
Console output:

[jira] [Updated] (HIVE-16229) Wrong result for correlated scalar subquery with aggregate

2017-03-21 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16229:
---
Status: Patch Available  (was: Open)

> Wrong result for correlated scalar subquery with aggregate
> --
>
> Key: HIVE-16229
> URL: https://issues.apache.org/jira/browse/HIVE-16229
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16229.1.patch
>
>
> Query:
> {code:SQL}
> select * from part where p_size > (select count(*) from part p where p.p_mfgr 
> = part.p_mfgr group by p_type);
> {code}
> Expected results:
> {code}
> ERROR: more than one row produced by subquery
> {code}
> Actual
> {code}
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 48427 almond antique violet mint lemonManufacturer#4  Brand#42
> PROMO POLISHED STEEL39  SM CASE 1375.42 hely ironic i
> 48427 almond antique violet mint lemonManufacturer#4  Brand#42
> PROMO POLISHED STEEL39  SM CASE 1375.42 hely ironic i
> 
> Time taken: 13.742 seconds, Fetched: 123 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16229) Wrong result for correlated scalar subquery with aggregate

2017-03-21 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16229:
---
Attachment: HIVE-16229.1.patch

> Wrong result for correlated scalar subquery with aggregate
> --
>
> Key: HIVE-16229
> URL: https://issues.apache.org/jira/browse/HIVE-16229
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16229.1.patch
>
>
> Query:
> {code:SQL}
> select * from part where p_size > (select count(*) from part p where p.p_mfgr 
> = part.p_mfgr group by p_type);
> {code}
> Expected results:
> {code}
> ERROR: more than one row produced by subquery
> {code}
> Actual
> {code}
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 49671 almond antique gainsboro frosted violet Manufacturer#4  Brand#41
> SMALL BRUSHED BRASS 10  SM BOX  1620.67 ccounts run quick
> 48427 almond antique violet mint lemonManufacturer#4  Brand#42
> PROMO POLISHED STEEL39  SM CASE 1375.42 hely ironic i
> 48427 almond antique violet mint lemonManufacturer#4  Brand#42
> PROMO POLISHED STEEL39  SM CASE 1375.42 hely ironic i
> 
> Time taken: 13.742 seconds, Fetched: 123 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16277) Exchange Partition between filesystems throws "IllegalArgumentException Wrong FS"

2017-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16277:
---

Assignee: Sahil Takiar

> Exchange Partition between filesystems throws "IllegalArgumentException Wrong 
> FS"
> -
>
> Key: HIVE-16277
> URL: https://issues.apache.org/jira/browse/HIVE-16277
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The following query: {{alter table s3_tbl exchange partition (country='USA') 
> with table hdfs_tbl}} fails with the following exception:
> {code}
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> java.lang.IllegalArgumentException Wrong FS: 
> s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:379)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:361)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Got exception: java.lang.IllegalArgumentException Wrong 
> FS: s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.exchangeTablePartitions(Hive.java:3553)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.exchangeTablePartition(DDLTask.java:4691)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:570)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2182)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1838)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1525)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1231)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254)
>   ... 11 more
> Caused by: MetaException(message:Got exception: 
> java.lang.IllegalArgumentException Wrong FS: 
> s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1387)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.renameDir(Warehouse.java:208)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.renameDir(Warehouse.java:200)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.exchange_partitions(HiveMetaStore.java:2967)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.exchange_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.exchange_partitions(HiveMetaStoreClient.java:690)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
>

[jira] [Updated] (HIVE-16276) Add hadoop-aws as a dependency so Hive has out of the box support for running against S3

2017-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16276:

Status: Patch Available  (was: Open)

> Add hadoop-aws as a dependency so Hive has out of the box support for running 
> against S3
> 
>
> Key: HIVE-16276
> URL: https://issues.apache.org/jira/browse/HIVE-16276
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16276.1.patch
>
>
> We should add hadoop-aws as a dependency so we have out of the box support 
> for running against S3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16276) Add hadoop-aws as a dependency so Hive has out of the box support for running against S3

2017-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16276:

Attachment: HIVE-16276.1.patch

> Add hadoop-aws as a dependency so Hive has out of the box support for running 
> against S3
> 
>
> Key: HIVE-16276
> URL: https://issues.apache.org/jira/browse/HIVE-16276
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16276.1.patch
>
>
> We should add hadoop-aws as a dependency so we have out of the box support 
> for running against S3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16276) Add hadoop-aws as a dependency so Hive has out of the box support for running against S3

2017-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16276:

Summary: Add hadoop-aws as a dependency so Hive has out of the box support 
for running against S3  (was: Add hadoop-aws as a dependency)

> Add hadoop-aws as a dependency so Hive has out of the box support for running 
> against S3
> 
>
> Key: HIVE-16276
> URL: https://issues.apache.org/jira/browse/HIVE-16276
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16276.1.patch
>
>
> We should add hadoop-aws as a dependency so we have out of the box support 
> for running against S3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16276) Add hadoop-aws as a dependency

2017-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16276:
---


> Add hadoop-aws as a dependency
> --
>
> Key: HIVE-16276
> URL: https://issues.apache.org/jira/browse/HIVE-16276
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> We should add hadoop-aws as a dependency so we have out of the box support 
> for running against S3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16275) Vectorization: Add ReduceSink support for TopN (in specialized native classes)

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16275:

Status: Patch Available  (was: Open)

> Vectorization: Add ReduceSink support for TopN (in specialized native classes)
> --
>
> Key: HIVE-16275
> URL: https://issues.apache.org/jira/browse/HIVE-16275
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16275.01.patch
>
>
> Currently, we don't specialize vectorization of ReduceSink when Top N is 
> planned.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16275) Vectorization: Add ReduceSink support for TopN (in specialized native classes)

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16275:

Attachment: HIVE-16275.01.patch

> Vectorization: Add ReduceSink support for TopN (in specialized native classes)
> --
>
> Key: HIVE-16275
> URL: https://issues.apache.org/jira/browse/HIVE-16275
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16275.01.patch
>
>
> Currently, we don't specialize vectorization of ReduceSink when Top N is 
> planned.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16275) Vectorization: Add ReduceSink support for TopN (in specialized native classes)

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-16275:
---


> Vectorization: Add ReduceSink support for TopN (in specialized native classes)
> --
>
> Key: HIVE-16275
> URL: https://issues.apache.org/jira/browse/HIVE-16275
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
>
> Currently, we don't specialize vectorization of ReduceSink when Top N is 
> planned.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935578#comment-15935578
 ] 

Hive QA commented on HIVE-16260:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859797/HIVE-16260.3.patch

{color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4276/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4276/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4276/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859797 - PreCommit-HIVE-Build

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch, HIVE-16260.2.patch, 
> HIVE-16260.3.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16107) JDBC: HttpClient should retry one more time on NoHttpResponseException

2017-03-21 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935575#comment-15935575
 ] 

Thejas M Nair commented on HIVE-16107:
--

+1
Thanks for adding the test as well.


> JDBC: HttpClient should retry one more time on NoHttpResponseException
> --
>
> Key: HIVE-16107
> URL: https://issues.apache.org/jira/browse/HIVE-16107
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-16107.1.patch
>
>
> Hive's JDBC client in HTTP transport mode doesn't retry on 
> NoHttpResponseException. We've seen the exception being thrown to the JDBC 
> end user when used with Knox as the proxy, when Knox upgraded its jetty 
> version, which has a smaller value for jetty connector idletimeout, and as a 
> result closes the HTTP connection on server side. The next jdbc query on the 
> client, throws a NoHttpResponseException. However, subsequent queries 
> reconnect, but the JDBC driver should ideally handle this by retrying.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15841) Upgrade Hive to ORC 1.3.3

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935544#comment-15935544
 ] 

Hive QA commented on HIVE-15841:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859794/HIVE-15841.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2]
 (batchId=162)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4275/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4275/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4275/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859794 - PreCommit-HIVE-Build

> Upgrade Hive to ORC 1.3.3
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15841.patch, HIVE-15841.patch
>
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.3 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-21 Thread Kalyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935535#comment-15935535
 ] 

Kalyan commented on HIVE-15691:
---

Hi [~ekoifman], now i got the problem.

given the patches as per comments.

updated new patch (HIVE-15691-branch-1.patch) on branch-1

updated new patch (HIVE-15691.4.patch) on master

Thanks

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
>Priority: Critical
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691.4.patch, HIVE-15691-branch-1.2.patch, 
> HIVE-15691-branch-1.patch, HIVE-15691.patch, HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-21 Thread Kalyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kalyan updated HIVE-15691:
--
Attachment: HIVE-15691-branch-1.patch
HIVE-15691.4.patch

Hi [~ekoifman], now i got the problem.

given the patches as per comments.

updated new patch (HIVE-15691-branch-1.patch) on branch-1

updated new patch (HIVE-15691.4.patch) on master

Thanks

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
>Priority: Critical
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691.4.patch, HIVE-15691-branch-1.2.patch, 
> HIVE-15691-branch-1.patch, HIVE-15691.patch, HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.03.patch

Rebased the patch

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: (was: HIVE-15665.03.patch)

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.03.patch

The same patch

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others

2017-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16222:

Attachment: HIVE-16222.01.patch

Renamed the list and moved it to config

> add a setting to disable row.serde for specific formats; enable for others
> --
>
> Key: HIVE-16222
> URL: https://issues.apache.org/jira/browse/HIVE-16222
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16222.01.patch, HIVE-16222.patch
>
>
> Per [~gopalv]
> {quote}
> row.serde = true ... breaks Parquet (they expect to get the same object back, 
> which means you can't buffer 1024 rows).
> {quote}
> We want to enable this and vector.serde for text vectorization. Need to turn 
> it off for specific formats.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-21 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16260:
--
Attachment: HIVE-16260.3.patch

Implemented recommendations on review board.

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch, HIVE-16260.2.patch, 
> HIVE-16260.3.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15841) Upgrade Hive to ORC 1.3.3

2017-03-21 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15841:
-
Attachment: HIVE-15841.patch

Updated to ORC 1.3.3.

> Upgrade Hive to ORC 1.3.3
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15841.patch, HIVE-15841.patch
>
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.3 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16274) Improve column stats computation using density function

2017-03-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16274:
--


> Improve column stats computation using density function
> ---
>
> Key: HIVE-16274
> URL: https://issues.apache.org/jira/browse/HIVE-16274
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> to take into consideration of row counts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-14879) integrate MM tables into ACID: replace MM metastore calls and structures with ACID ones

2017-03-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935394#comment-15935394
 ] 

Sergey Shelukhin commented on HIVE-14879:
-

left some comments, mostly improvements etc.
My main concern is that a lot of tests are commented out and results have 
changed :)

> integrate MM tables into ACID: replace MM metastore calls and structures with 
> ACID ones
> ---
>
> Key: HIVE-14879
> URL: https://issues.apache.org/jira/browse/HIVE-14879
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-14879.1.patch, HIVE-14879.2.patch, 
> HIVE-14879.3.patch, HIVE-14879.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15841) Upgrade Hive to ORC 1.3.3

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935380#comment-15935380
 ] 

Hive QA commented on HIVE-15841:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852928/HIVE-15841.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4274/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4274/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4274/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-03-21 21:38:36.156
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4274/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-03-21 21:38:36.158
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   4f45536..9f5a3e3  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 4f45536 HIVE-16246: Support auto gather column stats for columns 
with trailing white spaces (Pengcheng XXiong, reviewed by Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 9f5a3e3 HIVE-16180 : LLAP: Native memory leak in EncodedReader 
(Sergey Shelukhin/Prasanth Jayachandran, reviewed by Prasanth 
Jayachandran/Sergey Shelukhin)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-03-21 21:38:37.431
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java:26
error: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java:
 patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852928 - PreCommit-HIVE-Build

> Upgrade Hive to ORC 1.3.3
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15841.patch
>
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.3 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16273) Vectorization: Make non-column key expressions work in MERGEPARTIAL mode

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935377#comment-15935377
 ] 

Hive QA commented on HIVE-16273:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859784/HIVE-16273.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_id3]
 (batchId=147)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4273/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4273/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4273/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859784 - PreCommit-HIVE-Build

> Vectorization: Make non-column key expressions work in MERGEPARTIAL mode
> 
>
> Key: HIVE-16273
> URL: https://issues.apache.org/jira/browse/HIVE-16273
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16273.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15841) Upgrade Hive to ORC 1.3.3

2017-03-21 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935369#comment-15935369
 ] 

Owen O'Malley commented on HIVE-15841:
--

Yes, I agree. Updating the patch now.

> Upgrade Hive to ORC 1.3.3
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15841.patch
>
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.3 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15841) Upgrade Hive to ORC 1.3.3

2017-03-21 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15841:
-
Summary: Upgrade Hive to ORC 1.3.3  (was: Upgrade Hive to ORC 1.3.2)
Description: Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 
1.3.3 once it releases.  (was: Hive needs ORC-141 and ORC-135, so we should 
upgrade to ORC 1.3.2 once it releases.)

> Upgrade Hive to ORC 1.3.3
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15841.patch
>
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.3 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16225) Memory leak in Templeton service

2017-03-21 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16225:
--
Attachment: HIVE-16225.1.patch

There are quite a few ugi related FileSystem cache leaks. Attach patch.

> Memory leak in Templeton service
> 
>
> Key: HIVE-16225
> URL: https://issues.apache.org/jira/browse/HIVE-16225
> Project: Hive
>  Issue Type: Bug
>Reporter: Subramanyam Pattipaka
>Assignee: Daniel Dai
> Attachments: HIVE-16225.1.patch, screenshot-1.png
>
>
> This is a known beast. here are details
> The problem seems to be similar to the one discussed in HIVE-13749. If we 
> submit very large number of jobs like 1000 to 2000 then we can see increase 
> in Configuration objects count.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15841) Upgrade Hive to ORC 1.3.2

2017-03-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935350#comment-15935350
 ] 

Prasanth Jayachandran commented on HIVE-15841:
--

Resubmitted for another test run. Should this be upgraded to 1.3.3 directly?

> Upgrade Hive to ORC 1.3.2
> -
>
> Key: HIVE-15841
> URL: https://issues.apache.org/jira/browse/HIVE-15841
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15841.patch
>
>
> Hive needs ORC-141 and ORC-135, so we should upgrade to ORC 1.3.2 once it 
> releases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16180) LLAP: Native memory leak in EncodedReader

2017-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16180:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> LLAP: Native memory leak in EncodedReader
> -
>
> Key: HIVE-16180
> URL: https://issues.apache.org/jira/browse/HIVE-16180
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: DirectCleaner.java, FullGC-15GB-cleanup.png, 
> Full-gc-native-mem-cleanup.png, HIVE-16180.03.patch, HIVE-16180.04.patch, 
> HIVE-16180.1.patch, HIVE-16180.2.patch, Native-mem-spike.png
>
>
> Observed this in internal test run. There is a native memory leak in Orc 
> EncodedReaderImpl that can cause YARN pmem monitor to kill the container 
> running the daemon. Direct byte buffers are null'ed out which is not 
> guaranteed to be cleaned until next Full GC. To show this issue, attaching a 
> small test program that allocates 3x256MB direct byte buffers. First buffer 
> is null'ed out but still native memory is used. Second buffer user Cleaner to 
> clean up native allocation. Third buffer is also null'ed but this time 
> invoking a System.gc() which cleans up all native memory. Output from the 
> test program is below
> {code}
> Allocating 3x256MB direct memory..
> Native memory used: 786432000
> Native memory used after data1=null: 786432000
> Native memory used after data2.clean(): 524288000
> Native memory used after data3=null: 524288000
> Native memory used without gc: 524288000
> Native memory used after gc: 0
> {code}
> Longer term improvements/solutions:
> 1) Use DirectBufferPool from hadoop or netty's 
> https://netty.io/4.0/api/io/netty/buffer/PooledByteBufAllocator.html as 
> direct byte buffer allocations are expensive (System.gc() + 100ms thread 
> sleep).
> 2) Use HADOOP-12760 for proper cleaner invocation in JDK8 and JDK9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16061) When hive.async.log.enabled is set to true, some output is not printed to the beeline console

2017-03-21 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935321#comment-15935321
 ] 

Aihua Xu commented on HIVE-16061:
-

Thanks [~prasanth_j] I will take a look.

> When hive.async.log.enabled is set to true, some output is not printed to the 
> beeline console
> -
>
> Key: HIVE-16061
> URL: https://issues.apache.org/jira/browse/HIVE-16061
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16061.1.patch, HIVE-16061.2.patch
>
>
> Run a hiveserver2 instance "hive --service hiveserver2".
> Then from another console, connect to hiveserver2 "beeline -u 
> "jdbc:hive2://localhost:1"
> When you run a MR job like "select t1.key from src t1 join src t2 on 
> t1.key=t2.key", some of the console logs like MR job info are not printed to 
> the console while it just print to the hiveserver2 console.
> When hive.async.log.enabled is set to false and restarts the HiveServer2, 
> then the output will be printed to the beeline console.
> OperationLog implementation uses the ThreadLocal variable to store associated 
> the log file. When the hive.async.log.enabled is set to true, the logs will 
> be processed by a ThreadPool and  the actual threads from the pool which 
> prints the message won't be able to access the log file stored in the 
> original thread. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14879) integrate MM tables into ACID: replace MM metastore calls and structures with ACID ones

2017-03-21 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14879:
-
Attachment: HIVE-14879.3.patch

patch 3 fixed issues in mm_conversions.q

> integrate MM tables into ACID: replace MM metastore calls and structures with 
> ACID ones
> ---
>
> Key: HIVE-14879
> URL: https://issues.apache.org/jira/browse/HIVE-14879
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-14879.1.patch, HIVE-14879.2.patch, 
> HIVE-14879.3.patch, HIVE-14879.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16061) When hive.async.log.enabled is set to true, some output is not printed to the beeline console

2017-03-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935315#comment-15935315
 ] 

Prasanth Jayachandran commented on HIVE-16061:
--

*Configuration:*
https://github.com/apache/hive/blob/master/llap-server/src/main/resources/llap-daemon-log4j2.properties#L79-L100

*MDC put for new threads:*
https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L108-L111

*MDC put/reset for threadpools (cache/reuse of threads):*
This is required for inheriting MDC when new threads are spawned.
https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java#L336

For the case where a threadpool already exists and MDC has to be inherited (in 
case of cached thread pools or core thread reuse), we use custom threadpool 
that copies the IDs to MDC and clears it after executing the thread. There is 
some reflection happening in custom threadpool (NDC to MDC copying) which is 
LLAP specific and is not required for HS2.
https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/StatsRecordingThreadPool.java#L108-L149

> When hive.async.log.enabled is set to true, some output is not printed to the 
> beeline console
> -
>
> Key: HIVE-16061
> URL: https://issues.apache.org/jira/browse/HIVE-16061
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16061.1.patch, HIVE-16061.2.patch
>
>
> Run a hiveserver2 instance "hive --service hiveserver2".
> Then from another console, connect to hiveserver2 "beeline -u 
> "jdbc:hive2://localhost:1"
> When you run a MR job like "select t1.key from src t1 join src t2 on 
> t1.key=t2.key", some of the console logs like MR job info are not printed to 
> the console while it just print to the hiveserver2 console.
> When hive.async.log.enabled is set to false and restarts the HiveServer2, 
> then the output will be printed to the beeline console.
> OperationLog implementation uses the ThreadLocal variable to store associated 
> the log file. When the hive.async.log.enabled is set to true, the logs will 
> be processed by a ThreadPool and  the actual threads from the pool which 
> prints the message won't be able to access the log file stored in the 
> original thread. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16061) When hive.async.log.enabled is set to true, some output is not printed to the beeline console

2017-03-21 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935272#comment-15935272
 ] 

Aihua Xu commented on HIVE-16061:
-

Thanks [~prasanth_j]  Agree. I was checking MDC related and RoutingAppender as 
well. Can you point me the code in LLAP which uses MDC based routing?  

> When hive.async.log.enabled is set to true, some output is not printed to the 
> beeline console
> -
>
> Key: HIVE-16061
> URL: https://issues.apache.org/jira/browse/HIVE-16061
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16061.1.patch, HIVE-16061.2.patch
>
>
> Run a hiveserver2 instance "hive --service hiveserver2".
> Then from another console, connect to hiveserver2 "beeline -u 
> "jdbc:hive2://localhost:1"
> When you run a MR job like "select t1.key from src t1 join src t2 on 
> t1.key=t2.key", some of the console logs like MR job info are not printed to 
> the console while it just print to the hiveserver2 console.
> When hive.async.log.enabled is set to false and restarts the HiveServer2, 
> then the output will be printed to the beeline console.
> OperationLog implementation uses the ThreadLocal variable to store associated 
> the log file. When the hive.async.log.enabled is set to true, the logs will 
> be processed by a ThreadPool and  the actual threads from the pool which 
> prints the message won't be able to access the log file stored in the 
> original thread. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16273) Vectorization: Make non-column key expressions work in MERGEPARTIAL mode

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16273:

Attachment: HIVE-16273.01.patch

> Vectorization: Make non-column key expressions work in MERGEPARTIAL mode
> 
>
> Key: HIVE-16273
> URL: https://issues.apache.org/jira/browse/HIVE-16273
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16273.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16273) Vectorization: Make non-column key expressions work in MERGEPARTIAL mode

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16273:

Status: Patch Available  (was: Open)

> Vectorization: Make non-column key expressions work in MERGEPARTIAL mode
> 
>
> Key: HIVE-16273
> URL: https://issues.apache.org/jira/browse/HIVE-16273
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16273.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16273) Vectorization: Make non-column key expressions work in MERGEPARTIAL mode

2017-03-21 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935258#comment-15935258
 ] 

Matt McCline commented on HIVE-16273:
-

HIVE-16245 disabled vectorized non-column key expressions for MERGEPARTIAL 
mode.  This JIRA fixes VectorGroupByOperator to handle key expressions 
correctly.

> Vectorization: Make non-column key expressions work in MERGEPARTIAL mode
> 
>
> Key: HIVE-16273
> URL: https://issues.apache.org/jira/browse/HIVE-16273
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16061) When hive.async.log.enabled is set to true, some output is not printed to the beeline console

2017-03-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935256#comment-15935256
 ] 

Prasanth Jayachandran commented on HIVE-16061:
--

This was observed earlier in HIVE-14183. Ideally, we want to use MDC based log 
redirection to operation log files and move away from the current threadlocal 
because of the exact issue you had mentioned. MDC based routing is safe in the 
context of Async logger and we currently use it in LLAP. 

> When hive.async.log.enabled is set to true, some output is not printed to the 
> beeline console
> -
>
> Key: HIVE-16061
> URL: https://issues.apache.org/jira/browse/HIVE-16061
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16061.1.patch, HIVE-16061.2.patch
>
>
> Run a hiveserver2 instance "hive --service hiveserver2".
> Then from another console, connect to hiveserver2 "beeline -u 
> "jdbc:hive2://localhost:1"
> When you run a MR job like "select t1.key from src t1 join src t2 on 
> t1.key=t2.key", some of the console logs like MR job info are not printed to 
> the console while it just print to the hiveserver2 console.
> When hive.async.log.enabled is set to false and restarts the HiveServer2, 
> then the output will be printed to the beeline console.
> OperationLog implementation uses the ThreadLocal variable to store associated 
> the log file. When the hive.async.log.enabled is set to true, the logs will 
> be processed by a ThreadPool and  the actual threads from the pool which 
> prints the message won't be able to access the log file stored in the 
> original thread. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16273) Vectorization: Make non-column key expressions work in MERGEPARTIAL mode

2017-03-21 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-16273:
---

Assignee: Matt McCline

> Vectorization: Make non-column key expressions work in MERGEPARTIAL mode
> 
>
> Key: HIVE-16273
> URL: https://issues.apache.org/jira/browse/HIVE-16273
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16271) add support for STRUCT in VectorSerializeRow/VectorDeserializeRow

2017-03-21 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935250#comment-15935250
 ] 

Matt McCline commented on HIVE-16271:
-

It is a moderately large in-progress change.

> add support for STRUCT in VectorSerializeRow/VectorDeserializeRow
> -
>
> Key: HIVE-16271
> URL: https://issues.apache.org/jira/browse/HIVE-16271
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>
> Add support for the STRUCT type by interleaving its fields.
> VectorizedRowBatch seems to be already capable of handling STRUCT.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16180) LLAP: Native memory leak in EncodedReader

2017-03-21 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935232#comment-15935232
 ] 

Prasanth Jayachandran commented on HIVE-16180:
--

lgtm, +1

> LLAP: Native memory leak in EncodedReader
> -
>
> Key: HIVE-16180
> URL: https://issues.apache.org/jira/browse/HIVE-16180
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: DirectCleaner.java, FullGC-15GB-cleanup.png, 
> Full-gc-native-mem-cleanup.png, HIVE-16180.03.patch, HIVE-16180.04.patch, 
> HIVE-16180.1.patch, HIVE-16180.2.patch, Native-mem-spike.png
>
>
> Observed this in internal test run. There is a native memory leak in Orc 
> EncodedReaderImpl that can cause YARN pmem monitor to kill the container 
> running the daemon. Direct byte buffers are null'ed out which is not 
> guaranteed to be cleaned until next Full GC. To show this issue, attaching a 
> small test program that allocates 3x256MB direct byte buffers. First buffer 
> is null'ed out but still native memory is used. Second buffer user Cleaner to 
> clean up native allocation. Third buffer is also null'ed but this time 
> invoking a System.gc() which cleans up all native memory. Output from the 
> test program is below
> {code}
> Allocating 3x256MB direct memory..
> Native memory used: 786432000
> Native memory used after data1=null: 786432000
> Native memory used after data2.clean(): 524288000
> Native memory used after data3=null: 524288000
> Native memory used without gc: 524288000
> Native memory used after gc: 0
> {code}
> Longer term improvements/solutions:
> 1) Use DirectBufferPool from hadoop or netty's 
> https://netty.io/4.0/api/io/netty/buffer/PooledByteBufAllocator.html as 
> direct byte buffer allocations are expensive (System.gc() + 100ms thread 
> sleep).
> 2) Use HADOOP-12760 for proper cleaner invocation in JDK8 and JDK9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16061) When hive.async.log.enabled is set to true, some output is not printed to the beeline console

2017-03-21 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935217#comment-15935217
 ] 

Aihua Xu commented on HIVE-16061:
-

Please ignore the patches since technically we should not call 
OperationLog.write() directly as I understand the original design. 
DivertLogAppender should handle the message append to the operation log file 
which will be read by the beeline.

+ [~prasanth_j]  log4j expert and you worked on the log4j2 for hive. 






> When hive.async.log.enabled is set to true, some output is not printed to the 
> beeline console
> -
>
> Key: HIVE-16061
> URL: https://issues.apache.org/jira/browse/HIVE-16061
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16061.1.patch, HIVE-16061.2.patch
>
>
> Run a hiveserver2 instance "hive --service hiveserver2".
> Then from another console, connect to hiveserver2 "beeline -u 
> "jdbc:hive2://localhost:1"
> When you run a MR job like "select t1.key from src t1 join src t2 on 
> t1.key=t2.key", some of the console logs like MR job info are not printed to 
> the console while it just print to the hiveserver2 console.
> When hive.async.log.enabled is set to false and restarts the HiveServer2, 
> then the output will be printed to the beeline console.
> OperationLog implementation uses the ThreadLocal variable to store associated 
> the log file. When the hive.async.log.enabled is set to true, the logs will 
> be processed by a ThreadPool and  the actual threads from the pool which 
> prints the message won't be able to access the log file stored in the 
> original thread. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16061) When hive.async.log.enabled is set to true, some output is not printed to the beeline console

2017-03-21 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16061:

Description: 
Run a hiveserver2 instance "hive --service hiveserver2".
Then from another console, connect to hiveserver2 "beeline -u 
"jdbc:hive2://localhost:1"

When you run a MR job like "select t1.key from src t1 join src t2 on 
t1.key=t2.key", some of the console logs like MR job info are not printed to 
the console while it just print to the hiveserver2 console.

When hive.async.log.enabled is set to false and restarts the HiveServer2, then 
the output will be printed to the beeline console.

OperationLog implementation uses the ThreadLocal variable to store associated 
the log file. When the hive.async.log.enabled is set to true, the logs will be 
processed by a ThreadPool and  the actual threads from the pool which prints 
the message won't be able to access the log file stored in the original thread. 

  was:
Run a hiveserver2 instance "hive --service hiveserver2".
Then from another console, connect to hiveserver2 "beeline -u 
"jdbc:hive2://localhost:1"

When you run a MR job like "select t1.key from src t1 join src t2 on 
t1.key=t2.key", some of the console logs like MR job info are not printed to 
the console while it just print to the hiveserver2 console.




> When hive.async.log.enabled is set to true, some output is not printed to the 
> beeline console
> -
>
> Key: HIVE-16061
> URL: https://issues.apache.org/jira/browse/HIVE-16061
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16061.1.patch, HIVE-16061.2.patch
>
>
> Run a hiveserver2 instance "hive --service hiveserver2".
> Then from another console, connect to hiveserver2 "beeline -u 
> "jdbc:hive2://localhost:1"
> When you run a MR job like "select t1.key from src t1 join src t2 on 
> t1.key=t2.key", some of the console logs like MR job info are not printed to 
> the console while it just print to the hiveserver2 console.
> When hive.async.log.enabled is set to false and restarts the HiveServer2, 
> then the output will be printed to the beeline console.
> OperationLog implementation uses the ThreadLocal variable to store associated 
> the log file. When the hive.async.log.enabled is set to true, the logs will 
> be processed by a ThreadPool and  the actual threads from the pool which 
> prints the message won't be able to access the log file stored in the 
> original thread. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16271) add support for STRUCT in VectorSerializeRow/VectorDeserializeRow

2017-03-21 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935190#comment-15935190
 ] 

Zoltan Haindrich commented on HIVE-16271:
-

thank you [~gopalv], I'll look into it.

> add support for STRUCT in VectorSerializeRow/VectorDeserializeRow
> -
>
> Key: HIVE-16271
> URL: https://issues.apache.org/jira/browse/HIVE-16271
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>
> Add support for the STRUCT type by interleaving its fields.
> VectorizedRowBatch seems to be already capable of handling STRUCT.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16061) When hive.async.log.enabled is set to true, some output is not printed to the beeline console

2017-03-21 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16061:

Summary: When hive.async.log.enabled is set to true, some output is not 
printed to the beeline console  (was: Some of console output is not printed to 
the beeline console)

> When hive.async.log.enabled is set to true, some output is not printed to the 
> beeline console
> -
>
> Key: HIVE-16061
> URL: https://issues.apache.org/jira/browse/HIVE-16061
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16061.1.patch, HIVE-16061.2.patch
>
>
> Run a hiveserver2 instance "hive --service hiveserver2".
> Then from another console, connect to hiveserver2 "beeline -u 
> "jdbc:hive2://localhost:1"
> When you run a MR job like "select t1.key from src t1 join src t2 on 
> t1.key=t2.key", some of the console logs like MR job info are not printed to 
> the console while it just print to the hiveserver2 console.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16252) Vectorization: Cannot vectorize: Aggregation Function UDF avg

2017-03-21 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935186#comment-15935186
 ] 

Zoltan Haindrich commented on HIVE-16252:
-

okay the 'Bug' type mislead me a bit :)
I'm trying to address this and HIVE-16253 with HIVE-16264 - but there are other 
missing pieces to the puzzle...I'll link these tickets to relate to eachother

> Vectorization: Cannot vectorize: Aggregation Function UDF avg 
> --
>
> Key: HIVE-16252
> URL: https://issues.apache.org/jira/browse/HIVE-16252
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> {noformat}
> select 
> ss_store_sk, ss_item_sk, avg(ss_sales_price) as revenue
> from
> store_sales, date_dim
> where
> ss_sold_date_sk = d_date_sk
> and d_month_seq between 1212 and 1212 + 11
> group by ss_store_sk , ss_item_sk limit 100;
> 2017-03-20T00:59:49,526  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Validating ReduceWork...
> 2017-03-20T00:59:49,526 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Using reduce tag 0
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> lazybinary.LazyBinarySerDe: LazyBinarySerDe initialized with: 
> columnNames=[_col0] columnTypes=[struct]
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> vector.VectorizationContext: Input Expression = Column[KEY._col0], Vectorized 
> Expression = col 0
> ...
> ...
> 2017-03-20T00:59:49,528  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Cannot vectorize: Aggregation Function UDF avg parameter 
> expression for GROUPBY operator: Data type 
> struct of Column[VALUE._col0] not 
> supported
> {noformat}
> Env: Hive build from: commit 71f4930d95475e7e63b5acc55af3809aefcc71e0 (march 
> 16)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935183#comment-15935183
 ] 

Hive QA commented on HIVE-16206:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859752/HIVE-16206.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4272/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4272/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4272/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859752 - PreCommit-HIVE-Build

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.3.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16180) LLAP: Native memory leak in EncodedReader

2017-03-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935180#comment-15935180
 ] 

Sergey Shelukhin commented on HIVE-16180:
-

Failure is unrelated. [~prasanth_j]  can you take a look?

> LLAP: Native memory leak in EncodedReader
> -
>
> Key: HIVE-16180
> URL: https://issues.apache.org/jira/browse/HIVE-16180
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: DirectCleaner.java, FullGC-15GB-cleanup.png, 
> Full-gc-native-mem-cleanup.png, HIVE-16180.03.patch, HIVE-16180.04.patch, 
> HIVE-16180.1.patch, HIVE-16180.2.patch, Native-mem-spike.png
>
>
> Observed this in internal test run. There is a native memory leak in Orc 
> EncodedReaderImpl that can cause YARN pmem monitor to kill the container 
> running the daemon. Direct byte buffers are null'ed out which is not 
> guaranteed to be cleaned until next Full GC. To show this issue, attaching a 
> small test program that allocates 3x256MB direct byte buffers. First buffer 
> is null'ed out but still native memory is used. Second buffer user Cleaner to 
> clean up native allocation. Third buffer is also null'ed but this time 
> invoking a System.gc() which cleans up all native memory. Output from the 
> test program is below
> {code}
> Allocating 3x256MB direct memory..
> Native memory used: 786432000
> Native memory used after data1=null: 786432000
> Native memory used after data2.clean(): 524288000
> Native memory used after data3=null: 524288000
> Native memory used without gc: 524288000
> Native memory used after gc: 0
> {code}
> Longer term improvements/solutions:
> 1) Use DirectBufferPool from hadoop or netty's 
> https://netty.io/4.0/api/io/netty/buffer/PooledByteBufAllocator.html as 
> direct byte buffer allocations are expensive (System.gc() + 100ms thread 
> sleep).
> 2) Use HADOOP-12760 for proper cleaner invocation in JDK8 and JDK9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935091#comment-15935091
 ] 

Hive QA commented on HIVE-16231:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859763/HIVE-16231.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10498 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4271/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4271/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4271/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859763 - PreCommit-HIVE-Build

> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16231.01.patch, HIVE-16231.02.patch
>
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-21 Thread Sunitha Beeram (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunitha Beeram updated HIVE-16206:
--
Status: Patch Available  (was: Open)

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.3.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934983#comment-15934983
 ] 

Eugene Koifman commented on HIVE-15691:
---

The link above shows the signatures with StreamingConnection param in master 
branch of Hive1.  This is where Hive 1.3 will be cut from so to be included in 
the future of 1.x line releases there must be signatures with 
StreamingConnection.  There is no need to provide the deprecated version of 
constructors w/o StreamingConnection parameters since StrictRegexWriter is new 
API.

1.2 version (Hive1) doesn't have StreamingConnection parameter - it was 
introduced in HIVE-14114.  If you'd like these changes to show up in upcoming 
1.2.2 release then HIVE-15691-branch-1.2.patch is fine but we cannot check this 
into 1.2 w/o checking into 1.3 which requires a patch with appropriate 
signatures.

HIVE-15691.3.patch looks ok for Hive2 but I don't see the build bot run for it. 
 Perhaps there was some glitch in the system and it was missed.  In such cases 
we usually resubmit the same patch but with new name (.4) to trigger another 
run.

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
>Priority: Critical
> Attachments: HIVE-15691.1.patch, HIVE-15691.2.patch, 
> HIVE-15691.3.patch, HIVE-15691-branch-1.2.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16246) Support auto gather column stats for columns with trailing white spaces

2017-03-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16246:
---
Affects Version/s: 2.1.0

> Support auto gather column stats for columns with trailing white spaces
> ---
>
> Key: HIVE-16246
> URL: https://issues.apache.org/jira/browse/HIVE-16246
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-16246.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16246) Support auto gather column stats for columns with trailing white spaces

2017-03-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16246:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Support auto gather column stats for columns with trailing white spaces
> ---
>
> Key: HIVE-16246
> URL: https://issues.apache.org/jira/browse/HIVE-16246
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-16246.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16246) Support auto gather column stats for columns with trailing white spaces

2017-03-21 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934975#comment-15934975
 ] 

Pengcheng Xiong commented on HIVE-16246:


pushed to master. Thanks [~ashutoshc] for the review!

> Support auto gather column stats for columns with trailing white spaces
> ---
>
> Key: HIVE-16246
> URL: https://issues.apache.org/jira/browse/HIVE-16246
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-16246.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16246) Support auto gather column stats for columns with trailing white spaces

2017-03-21 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16246:
---
Fix Version/s: 2.2.0

> Support auto gather column stats for columns with trailing white spaces
> ---
>
> Key: HIVE-16246
> URL: https://issues.apache.org/jira/browse/HIVE-16246
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-16246.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2017-03-21 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934965#comment-15934965
 ] 

Sahil Takiar commented on HIVE-14919:
-

[~kellyzly] my guess is that any Hive table should benefit from using the 
DataFrames API. I'm not a Spark expert, but I believe RDDs are just a 
distributed, collection of objects, but those objects don't have a defined 
schema. A DataFrame is similar to a table in a database, it has a set of named 
columns. So naturally I would think Hive fits more into the DataFrames model 
since it works with tables that have a set of pre-defined columns.

According to some blog posts on the DataFrames API, it has a number of 
performance optimizations built in due to the fact that column types are known. 
These optimizations were not possible with RDDs because RDDs don't have a 
schema.

>From a DataBricks blog post:

{quote}
It can also perform lower level optimizations such as eliminating expensive 
object allocations and reducing virtual function calls. As a result, we expect 
performance improvements for existing Spark programs when they migrate to 
DataFrames.
{quote}

Sources:

https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html

> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 1 TB data set comparing with 
> Spark 1.6. We can see performance improvments about 5.4% in general and 45% 
> for the best case. However, some queries doesn't have significant performance 
> improvements.  This JIRA is the umbrella ticket addressing those performance 
> issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-21 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16231:
---
Attachment: HIVE-16231.02.patch

> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16231.01.patch, HIVE-16231.02.patch
>
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-21 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16231:
---
Status: Patch Available  (was: Open)

> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16231.01.patch, HIVE-16231.02.patch
>
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-21 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16231:
---
Attachment: (was: HIVE-16231.02.patch)

> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16231.01.patch
>
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-21 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16231:
---
Attachment: HIVE-16231.02.patch

Refactored timezone check into a separate method.

> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16231.01.patch, HIVE-16231.02.patch
>
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-21 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16178:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

pushed to master, thank you Ashutosh for the review!

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-16178.1.patch, HIVE-16178.2.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16252) Vectorization: Cannot vectorize: Aggregation Function UDF avg

2017-03-21 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934887#comment-15934887
 ] 

Rajesh Balamohan commented on HIVE-16252:
-

Observed slow performance with this query as opposed to sum function.

> Vectorization: Cannot vectorize: Aggregation Function UDF avg 
> --
>
> Key: HIVE-16252
> URL: https://issues.apache.org/jira/browse/HIVE-16252
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> {noformat}
> select 
> ss_store_sk, ss_item_sk, avg(ss_sales_price) as revenue
> from
> store_sales, date_dim
> where
> ss_sold_date_sk = d_date_sk
> and d_month_seq between 1212 and 1212 + 11
> group by ss_store_sk , ss_item_sk limit 100;
> 2017-03-20T00:59:49,526  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Validating ReduceWork...
> 2017-03-20T00:59:49,526 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Using reduce tag 0
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> lazybinary.LazyBinarySerDe: LazyBinarySerDe initialized with: 
> columnNames=[_col0] columnTypes=[struct]
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> vector.VectorizationContext: Input Expression = Column[KEY._col0], Vectorized 
> Expression = col 0
> ...
> ...
> 2017-03-20T00:59:49,528  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Cannot vectorize: Aggregation Function UDF avg parameter 
> expression for GROUPBY operator: Data type 
> struct of Column[VALUE._col0] not 
> supported
> {noformat}
> Env: Hive build from: commit 71f4930d95475e7e63b5acc55af3809aefcc71e0 (march 
> 16)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15766) DBNotificationlistener leaks JDOPersistenceManager

2017-03-21 Thread Mohit Sabharwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934893#comment-15934893
 ] 

Mohit Sabharwal commented on HIVE-15766:


Thanks, [~vgumashta], latest patch LGTM. Sorry about the late response.

> DBNotificationlistener leaks JDOPersistenceManager
> --
>
> Key: HIVE-15766
> URL: https://issues.apache.org/jira/browse/HIVE-15766
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.0, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 2.2.0
>
> Attachments: HIVE-15766.1.patch, HIVE-15766.2.patch, 
> HIVE-15766.3.patch, HIVE-15766.4.patch, HIVE-15766.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16007) When the query does not complie the LogRunnable never stops

2017-03-21 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16007:
--
Attachment: HIVE-16007.6.patch

The BeeLine tests error were relevant. The last lines of the log was not 
fetched. Removed the Exception instead.

> When the query does not complie the LogRunnable never stops
> ---
>
> Key: HIVE-16007
> URL: https://issues.apache.org/jira/browse/HIVE-16007
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16007.02.patch, HIVE-16007.2.patch, 
> HIVE-16007.3.patch, HIVE-16007.4.patch, HIVE-16007.5.patch, 
> HIVE-16007.6.patch, HIVE-16007.patch
>
>
> When issuing a sql command which does not compile then the LogRunnable thread 
> is never closed.
> The issue can be easily detected when running beeline with showWarnings=true.
> {code}
> $ ./beeline -u "jdbc:hive2://localhost:1 pvary pvary" --showWarnings=true
> [..]
> Connecting to jdbc:hive2://localhost:1
> Connected to: Apache Hive (version 2.2.0-SNAPSHOT)
> Driver: Hive JDBC (version 2.2.0-SNAPSHOT)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.2.0-SNAPSHOT by Apache Hive
> 0: jdbc:hive2://localhost:1> selekt;
> Warning: java.sql.SQLException: Method getQueryLog() failed. Because the 
> stmtHandle in HiveStatement is null and the statement execution might fail. 
> (state=,code=0)
> [..]
> Warning: java.sql.SQLException: Can't getQueryLog after statement has been 
> closed (state=,code=0)
> [..]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16166:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

[~mi...@cloudera.com] Thanks for your contribution. I committed this to master.

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Fix For: 2.2.0
>
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, 
> HIVE-16166.02.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16231) Parquet timestamp may be stored differently since HIVE-12767

2017-03-21 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16231:
---
Status: Open  (was: Patch Available)

> Parquet timestamp may be stored differently since HIVE-12767
> 
>
> Key: HIVE-16231
> URL: https://issues.apache.org/jira/browse/HIVE-16231
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16231.01.patch
>
>
> If the parquet table is missing its timezone property then the timestamp will 
> be stored with an adjustment instead of without it. This will cause a 
> regression with other applications like Impala or Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934827#comment-15934827
 ] 

Sergio Peña commented on HIVE-16166:


I found this jira HIVE-15776 that mentions the vector_if_expr is a flaky test.


> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, 
> HIVE-16166.02.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 >

1 - 100 of 120 matches

Mail list logo