[jira] [Commented] (HIVE-16232) Support stats computation for column in QuotedIdentifier

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932235#comment-15932235
 ] 

Lefty Leverenz commented on HIVE-16232:
---

Does this need to be documented in the wiki?

> Support stats computation for column in QuotedIdentifier 
> -
>
> Key: HIVE-16232
> URL: https://issues.apache.org/jira/browse/HIVE-16232
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-16232.01.patch
>
>
> right now if a column contains double quotes ``, we can not compute its stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15616) Improve contents of qfile test output

2017-03-20 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-15616:
---
Attachment: (was: HIVE-15616.5.patch)

> Improve contents of qfile test output
> -
>
> Key: HIVE-15616
> URL: https://issues.apache.org/jira/browse/HIVE-15616
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15616.1.patch, HIVE-15616.2.patch, 
> HIVE-15616.3.patch, HIVE-15616.4.patch, HIVE-15616.patch
>
>
> The current output of the failed qtests has a less than ideal signal to noise 
> ratio.
> We have duplicated stack traces and messages between the error message/stack 
> trace/error out.
> For diff errors the actual difference is missing from the error message and 
> can be found only in the standard out.
> I would like to simplify this output by removing duplications, moving 
> relevant information to the top.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15616) Improve contents of qfile test output

2017-03-20 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-15616:
---
Attachment: HIVE-15616.5.patch

> Improve contents of qfile test output
> -
>
> Key: HIVE-15616
> URL: https://issues.apache.org/jira/browse/HIVE-15616
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15616.1.patch, HIVE-15616.2.patch, 
> HIVE-15616.3.patch, HIVE-15616.4.patch, HIVE-15616.5.patch, HIVE-15616.patch
>
>
> The current output of the failed qtests has a less than ideal signal to noise 
> ratio.
> We have duplicated stack traces and messages between the error message/stack 
> trace/error out.
> For diff errors the actual difference is missing from the error message and 
> can be found only in the standard out.
> I would like to simplify this output by removing duplications, moving 
> relevant information to the top.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16219) metastore notification_log contains serialized message with non functional fields

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16219:
---
Attachment: HIVE-16219.1.patch

reattaching again so build runs

> metastore notification_log contains serialized message with  non functional 
> fields
> --
>
> Key: HIVE-16219
> URL: https://issues.apache.org/jira/browse/HIVE-16219
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Fix For: 2.2.0
>
> Attachments: HIVE-16219.1.patch, HIVE-16219.1.patch
>
>
> the event notification logs stored in hive metastore have json serialized 
> messages stored in NOTIFICATION_LOG table,  these messages also store the 
> serialized Thrift API objects in them for ex for create table :
> {code}
> {
>   "eventType": "CREATE_TABLE",
>   "server": "",
>   "servicePrincipal": "",
>   "db": "default",
>   "table": "a",
>   "tableObjJson": 
> "{\"1\":{\"str\":\"a\"},\"2\":{\"str\":\"default\"},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1489552350},\"5\":{\"i32\":0},\"6\":{\"i32\":0},\"7\":{\"rec\":{\"1\":{\"lst\":[\"rec\",1,{\"1\":{\"str\":\"name\"},\"2\":{\"str\":\"string\"}}]},\"2\":{\"str\":\"file:/tmp/warehouse/a\"},\"3\":{\"str\":\"org.apache.hadoop.mapred.TextInputFormat\"},\"4\":{\"str\":\"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\"},\"5\":{\"tf\":0},\"6\":{\"i32\":-1},\"7\":{\"rec\":{\"2\":{\"str\":\"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe\"},\"3\":{\"map\":[\"str\",\"str\",2,{\"field.delim\":\"\\n\",\"serialization.format\":\"\\n\"}]}}},\"8\":{\"lst\":[\"str\",0]},\"9\":{\"lst\":[\"rec\",0]},\"10\":{\"map\":[\"str\",\"str\",0,{}]},\"11\":{\"rec\":{\"1\":{\"lst\":[\"str\",0]},\"2\":{\"lst\":[\"lst\",0]},\"3\":{\"map\":[\"lst\",\"str\",0,{}]}}},\"12\":{\"tf\":0}}},\"8\":{\"lst\":[\"rec\",0]},\"9\":{\"map\":[\"str\",\"str\",7,{\"totalSize\":\"0\",\"EXTERNAL\":\"TRUE\",\"numRows\":\"0\",\"rawDataSize\":\"0\",\"COLUMN_STATS_ACCURATE\":\"{\\\"BASIC_STATS\\\":\\\"true\\\"}\",\"numFiles\":\"0\",\"transient_lastDdlTime\":\"1489552350\"}]},\"12\":{\"str\":\"EXTERNAL_TABLE\"},\"13\":{\"rec\":{\"1\":{\"map\":[\"str\",\"lst\",1,{\"anagarwal\":[\"rec\",4,{\"1\":{\"str\":\"INSERT\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"SELECT\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"UPDATE\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}},{\"1\":{\"str\":\"DELETE\"},\"2\":{\"i32\":-1},\"3\":{\"str\":\"anagarwal\"},\"4\":{\"i32\":1},\"5\":{\"tf\":1}}]}]}}},\"14\":{\"tf\":0}}",
>   "timestamp": 1489552350,
>   "files": [],
>   "tableObj": {
> "tableName": "a",
> "dbName": "default",
> "owner": "anagarwal",
> "createTime": 1489552350,
> "lastAccessTime": 0,
> "retention": 0,
> "sd": {
>   "cols": [
> {
>   "name": "name",
>   "type": "string",
>   "comment": null,
>   "setName": true,
>   "setType": true,
>   "setComment": false
> }
>   ],
>   "location": "file:/tmp/warehouse/a",
>   "inputFormat": "org.apache.hadoop.mapred.TextInputFormat",
>   "outputFormat": 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
>   "compressed": false,
>   "numBuckets": -1,
>   "serdeInfo": {
> "name": null,
> "serializationLib": 
> "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
> "parameters": {
>   "serialization.format": "\n",
>   "field.delim": "\n"
> },
> "setName": false,
> "parametersSize": 2,
> "setParameters": true,
> "setSerializationLib": true
>   },
>   "bucketCols": [],
>   "sortCols": [],
>   "parameters": {},
>   "skewedInfo": {
> "skewedColNames": [],
> "skewedColValues": [],
> "skewedColValueLocationMaps": {},
> "setSkewedColNames": true,
> "setSkewedColValues": true,
> "setSkewedColValueLocationMaps": true,
> "skewedColNamesSize": 0,
> "skewedColNamesIterator": [],
> "skewedColValuesSize": 0,
> "skewedColValuesIterator": [],
> "skewedColValueLocationMapsSize": 0
>   },
>   "storedAsSubDirectories": false,
>   "setSkewedInfo": true,
>   "parametersSize": 0,
>   "colsSize": 1,
>   "setParameters": true,
>   "setLocation": true,
>   "setInputFormat": true,
>   "setCols": true,
>   "setOutputFormat": true,
>   "setSerdeInfo": true,
>   "setBucketCols": true,
>   "setSortCols": true,
>   "colsIterator": [
> {
>   "name":

[jira] [Updated] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-20 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16024:
---
Attachment: HIVE-16024.07.patch

Updated patch after RB comments. (Changed qtest)

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch, HIVE-16024.07.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16254) temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek reassigned HIVE-16254:
--


> temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table create values__tmp__table__[nmber], which is also 
> present in the dumped information, this should not be processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15616) Improve contents of qfile test output

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932398#comment-15932398
 ] 

Hive QA commented on HIVE-15616:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859510/HIVE-15616.5.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10447 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=13)

[avro_joins.q,udf_divide.q,serde_reported_schema.q,input40.q,annotate_stats_join_pkfk.q,udf_unix_timestamp.q,union22.q,non_ascii_literal1.q,describe_comment_nonascii.q,orc_analyze.q,schema_evol_orc_acidvec_part_update.q,stats15.q,tez_join_result_complex.q,alter_numbuckets_partitioned_table2_h23.q,transform_ppr1.q,spark_vectorized_dynamic_partition_pruning.q,unionDistinct_2.q,udaf_histogram_numeric.q,authorization_index.q,auto_join26.q,vector_count.q,decimal_trailing.q,parquet_types_vectorization.q,notable_alias2.q,smb_mapjoin_22.q,vector_decimal_6.q,autoColumnStats_8.q,input5.q,constant_prop.q,sample1.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=233)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4243/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4243/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4243/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859510 - PreCommit-HIVE-Build

> Improve contents of qfile test output
> -
>
> Key: HIVE-15616
> URL: https://issues.apache.org/jira/browse/HIVE-15616
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15616.1.patch, HIVE-15616.2.patch, 
> HIVE-15616.3.patch, HIVE-15616.4.patch, HIVE-15616.5.patch, HIVE-15616.patch
>
>
> The current output of the failed qtests has a less than ideal signal to noise 
> ratio.
> We have duplicated stack traces and messages between the error message/stack 
> trace/error out.
> For diff errors the actual difference is missing from the error message and 
> can be found only in the standard out.
> I would like to simplify this output by removing duplications, moving 
> relevant information to the top.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15616) Improve contents of qfile test output

2017-03-20 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932431#comment-15932431
 ] 

Barna Zsombor Klara commented on HIVE-15616:


Failed tests are flaky:
- TestSparkNegativeCliDriver is well known flaky HIVE-15165
- comments has been failing for 4 runs

> Improve contents of qfile test output
> -
>
> Key: HIVE-15616
> URL: https://issues.apache.org/jira/browse/HIVE-15616
> Project: Hive
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15616.1.patch, HIVE-15616.2.patch, 
> HIVE-15616.3.patch, HIVE-15616.4.patch, HIVE-15616.5.patch, HIVE-15616.patch
>
>
> The current output of the failed qtests has a less than ideal signal to noise 
> ratio.
> We have duplicated stack traces and messages between the error message/stack 
> trace/error out.
> For diff errors the actual difference is missing from the error message and 
> can be found only in the standard out.
> I would like to simplify this output by removing duplications, moving 
> relevant information to the top.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932473#comment-15932473
 ] 

Hive QA commented on HIVE-16024:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859516/HIVE-16024.07.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10449 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=12)

[auto_join18.q,partition_coltype_literals.q,input1_limit.q,load_dyn_part3.q,autoColumnStats_4.q,correlationoptimizer8.q,auto_sortmerge_join_14.q,udf_array_contains.q,bucket_map_join_tez2.q,sample_islocalmode_hook.q,literal_decimal.q,constprog2.q,parquet_external_time.q,mapjoin_hook.q,schema_evol_orc_nonvec_table.q,cbo_rp_subq_in.q,authorization_view_disable_cbo_4.q,list_bucket_dml_2.q,input20.q,smb_join_partition_key.q,union_remove_14.q,non_ascii_literal2.q,udf_if.q,input38.q,load_fs_overwrite.q,input21.q,join_reorder.q,groupby_cube_multi_gby.q,bucketmapjoin8.q,union34.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4244/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4244/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4244/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859516 - PreCommit-HIVE-Build

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch, HIVE-16024.07.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-20 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932477#comment-15932477
 ] 

Barna Zsombor Klara commented on HIVE-16024:


The test "comments" has been failing for 5 builds.

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch, HIVE-16024.07.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12860) Add WITH HEADER option to INSERT OVERWRITE DIRECTORY

2017-03-20 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-12860:
---
Target Version/s: 2.2.0  (was: 1.3.0)

> Add WITH HEADER option to INSERT OVERWRITE DIRECTORY
> 
>
> Key: HIVE-12860
> URL: https://issues.apache.org/jira/browse/HIVE-12860
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Elliot West
>Assignee: Elliot West
>
> _As a Hive user_
> _I'd like the option to seamlessly write out a header row to file system 
> based result sets_
> _So that I can generate reports with a specification that mandates a header 
> row._
> h3. Motivations
> There is a significant use-case where Hive is used to construct a scheduled 
> data processing pipeline that generates a report in HDFS for consumption by 
> some third party (internal or external). This report may then be transferred 
> out of the system for consumption by other tools or processes. It is not 
> uncommon for the third party to specify that the report includes a header row 
> at the start of the file. The current options for adding headers are 
> difficult to use effectively and elegantly.
> h3. Acceptance criteria
> * {{INSERT OVERWRITE DIRECTORY}} commands can be invoked with an option to 
> include a header row at the start of the result set file.
> * The header row will contain the column names derived from the accompanying 
> {{SELECT}} query.
> * It will likely be the case that multiple tasks will be writing the final 
> file of the query result set. In this event only the task writing the first 
> chunk of the file should emit the header row.
> h3. Proposed HQL changes
> {code}
> 1.  INSERT OVERWRITE [LOCAL] DIRECTORY directory1
> 2.[ROW FORMAT row_format] [STORED AS file_format]
> 3.[WITH HEADER]
> 4.SELECT ... FROM ...
> {code}
> It is proposed that the {{WITH HEADER}} stanza at line 3 be introduced to 
> enable this feature.
> h3. Current workarounds
> * It is usually suggested that users set the CLI option 
> {{hive.cli.print.header=true}} and capture the result set from standard out. 
> However, this does not work well in scheduled, headless environments such as 
> the Oozie Hive action. This can also push the file handling into shell 
> scripts and complicate the process of getting the report into HDFS.
> * The keep report processing entirely within the domain of Hive some users 
> {{UNION}} the result of their query with a tiny table of a single row 
> containing the header names. A synthesised rank column is used with an 
> {{ORDER BY}} to ensure that the header is written to the very start of the 
> file. See [this example on Stack 
> Overflow|http://stackoverflow.com/questions/15139561/adding-column-headers-to-hive-result-set/25214480#25214480].
> h3. References
> * HIVE-138: Original request for header functionality.
> * [Hive Wiki: writing data into the file system from 
> queries|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16007) When the query does not complie the LogRunnable never stops

2017-03-20 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16007:
--
Attachment: HIVE-16007.4.patch

Trying again - pre-commit jenkins was not working

> When the query does not complie the LogRunnable never stops
> ---
>
> Key: HIVE-16007
> URL: https://issues.apache.org/jira/browse/HIVE-16007
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16007.02.patch, HIVE-16007.2.patch, 
> HIVE-16007.3.patch, HIVE-16007.4.patch, HIVE-16007.patch
>
>
> When issuing a sql command which does not compile then the LogRunnable thread 
> is never closed.
> The issue can be easily detected when running beeline with showWarnings=true.
> {code}
> $ ./beeline -u "jdbc:hive2://localhost:1 pvary pvary" --showWarnings=true
> [..]
> Connecting to jdbc:hive2://localhost:1
> Connected to: Apache Hive (version 2.2.0-SNAPSHOT)
> Driver: Hive JDBC (version 2.2.0-SNAPSHOT)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.2.0-SNAPSHOT by Apache Hive
> 0: jdbc:hive2://localhost:1> selekt;
> Warning: java.sql.SQLException: Method getQueryLog() failed. Because the 
> stmtHandle in HiveStatement is null and the statement execution might fail. 
> (state=,code=0)
> [..]
> Warning: java.sql.SQLException: Can't getQueryLog after statement has been 
> closed (state=,code=0)
> [..]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16254) temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16254:
---
Description: 
create table a (age int);
insert into table a values (34),(4);
repl dump default;

there is a temporary table created as  values__tmp__table__[nmber], which is 
also present in the dumped information with only metadata, this should not be 
processed.



  was:
create table a (age int);
insert into table a values (34),(4);
repl dump default;

there is a temporary table create values__tmp__table__[nmber], which is also 
present in the dumped information, this should not be processed.




> temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16254) values temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16254:
---
Summary: values temporary tables for INSERT's are getting replicated  (was: 
temporary tables for INSERT's are getting replicated)

> values temporary tables for INSERT's are getting replicated
> ---
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16254) metadata for values temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16254:
---
Summary: metadata for values temporary tables for INSERT's are getting 
replicated  (was: values temporary tables for INSERT's are getting replicated)

> metadata for values temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12703) CLI agnostic HQL import command implementation

2017-03-20 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-12703:
---
Target Version/s: 2.2.0  (was: 1.3.0)

> CLI agnostic HQL import command implementation
> --
>
> Key: HIVE-12703
> URL: https://issues.apache.org/jira/browse/HIVE-12703
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: hql
>
> _As an HQL developer_
> _I'd like a single command to import HQL script files, that works across all 
> Hive CLIs and shells_
> _So that I can compose larger scripts from smaller components irrespective of 
> my Hive execution environment._
> h3. Motivation
> Current Hive CLIs include commands that allow the user to effectively import 
> and execute HQL scripts from a file. The {{hive}} CLI provides the {{SOURCE}} 
> command and {{beeline}} provides the {{!run}} command. This allows HQL 
> developers to decompose complex HQL processes into multiple HQL scripts 
> files. These can be individually executed or tested, and in the case of 
> {{MACROs}}, imported in a manner similar to a functionary library. These 
> 'source' commands allow HQL developers to compose these smaller modules 
> purely in the domain of Hive (i.e. no external shell such as {{bash}} is 
> needed).
> However, this seems to be a feature of the individual CLIs and not part of 
> the core HQL language. Consequently this can lead to the development of Hive 
> processes that are not portable across different execution contexts, even for 
> the same version of Hive.
> h3. Proposal
> The ability to compose and encapsulate logic is a fundamental building block 
> of any scalable language and therefore I believe that this functionality 
> should be available in the core HQL language and not just implemented in the 
> CLIs. I propose that the {{SOURCE}} command be a first class citizen of the 
> HQL language and is available consistently in all execution contexts.
> h3. References
> * [{{SOURCE}} implementation in the {{hive}} 
> CLI.|https://github.com/apache/hive/blob/0ae374a320d1cae523ba2b434800e97692507db8/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L137]
> * [{{SOURCE}} 
> documentation.|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands]
> * [{{!run}} implementation in 
> {{beeline}}.|https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Commands.java#L1535]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16242) Run BeeLine tests parallel

2017-03-20 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932503#comment-15932503
 ] 

Peter Vary commented on HIVE-16242:
---

Thanks [~cwsteinbach] for the info!

I have seen in HIVE-14443 - "Improve ide&testing support", that the BeeLine 
driver was disabled at the time. Since the BeeLine is the official client at 
the moment, I think it would be good to have it reenabled, and running again.
I have created a series of patches to address the issues:
- HIVE-14459 TestBeeLineDriver - migration and re-enable
- HIVE-16127 Separate database initialization from actual query run in 
TestBeeLineDriver
- HIVE-16146 If possible find a better way to filter the TestBeeLineDriver 
output
- HIVE-16152 TestBeeLineDriver logging improvements
- HIVE-16242 Run BeeLine tests parallel

I have several more in my mind after the ones above are committed:
- Create a transformer method to run CLI .q files on BeeLine without changing 
the .q, and .q.out files - This I might have to separate into multiple changes 
in case it proves to be more difficult
- Create multiple BeeLine driver versions to match the CLI driver versions 
(Negative, Compare, HBase)
- Make it possible to run the BeeLine tests agains real clusters, not just 
MiniHS2

Any insight from a veteran Hive developer would be very much appreciated.

Thanks,
Peter

> Run BeeLine tests parallel
> --
>
> Key: HIVE-16242
> URL: https://issues.apache.org/jira/browse/HIVE-16242
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> Provide the ability for BeeLine tests to run parallel against the MiniHS2 
> cluster



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16242) Run BeeLine tests parallel

2017-03-20 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16242:
--
Attachment: HIVE-16242.patch

Added a parallelized + parametrized junit runner

> Run BeeLine tests parallel
> --
>
> Key: HIVE-16242
> URL: https://issues.apache.org/jira/browse/HIVE-16242
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16242.patch
>
>
> Provide the ability for BeeLine tests to run parallel against the MiniHS2 
> cluster



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16242) Run BeeLine tests parallel

2017-03-20 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16242:
--
Status: Patch Available  (was: Open)

> Run BeeLine tests parallel
> --
>
> Key: HIVE-16242
> URL: https://issues.apache.org/jira/browse/HIVE-16242
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16242.patch
>
>
> Provide the ability for BeeLine tests to run parallel against the MiniHS2 
> cluster



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15316) CTAS STORED AS AVRO: AvroTypeException Found default.record_0, expecting union

2017-03-20 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-15316:
---
Labels: avro avrostorage  (was: )

> CTAS STORED AS AVRO: AvroTypeException Found default.record_0, expecting union
> --
>
> Key: HIVE-15316
> URL: https://issues.apache.org/jira/browse/HIVE-15316
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: David Maughan
>Priority: Minor
>  Labels: avro, avrostorage
>
> There's an issue when querying a table that has been created as Avro via CTAS 
> when the target struct is at least 2 struct-levels deep. It can be replicated 
> with the following steps:
> {code}
> CREATE TABLE a
>   STORED AS AVRO
>   AS
> SELECT named_struct('c', named_struct('d', 1)) as b;
> SELECT b FROM a;
> org.apache.avro.AvroTypeException: Found default.record_0, expecting union
> {code}
> The reason for this is that during table creation, the Avro schema is 
> generated from the Hive columns in {{AvroSerDe}} and then passed through the 
> Avro Schema Parser: {{new Schema.Parser().parse(schema.toString())}}. For the 
> above example, this creates the below schema in the Avro file. Note that the 
> lowest level struct, {{record_0}} has {{"namespace": "default"}}.
> {code}
> {
>   "type": "record",
>   "name": "a",
>   "namespace": "default",
>   "fields": [
> {
>   "name": "b",
>   "type": [
> "null",
> {
>   "type": "record",
>   "name": "record_1",
>   "namespace": "",
>   "doc": "struct>",
>   "fields": [
> {
>   "name": "c",
>   "type": [
> "null",
> {
>   "type": "record",
>   "name": "record_0",
>   "namespace": "default",
>   "doc": "struct",
>   "fields": [
> {
>   "name": "d",
>   "type": [ "null", "int" ],
>   "doc": "int",
>   "default": null
> }
>   ]
> }
>   ],
>   "doc": "struct",
>   "default": null
> }
>   ]
> }
>   ],
>   "default": null
> }
>   ]
> }
> {code}
> On a subsequent select query, the Avro schema is again generated from the 
> Hive columns. However, this time it is not passed through the Avro Schema 
> Parser and the {{namespace}} attribute is not present in {{record_0}}. The 
> actual Error message _"Found default.record_0, expecting union"_ is slightly 
> misleading. Although it is expecting a union, it is specifically expecting a 
> null or a record named {{record_0}} but it finds {{default.record_0}}.
> I believe this is a bug in Avro. I'm not sure whether the correct behaviour 
> is to cascade the namespace down or not but it is definitely an inconsistency 
> between creating a schema via the builders and parser. I've created 
> [AVRO-1965|https://issues.apache.org/jira/browse/AVRO-1965] for this. 
> However, I believe that defensively passing the schema through the Avro 
> Schema Parser on a select query would fix this issue in Hive without an Avro 
> fix and version bump in Hive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15328) Inconsistent/incorrect handling of NULL in nested structs

2017-03-20 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-15328:
---
Target Version/s: 2.2.0

> Inconsistent/incorrect handling of NULL in nested structs
> -
>
> Key: HIVE-15328
> URL: https://issues.apache.org/jira/browse/HIVE-15328
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: David Maughan
>
> h2. Overview
> Performing {{IS NULL}} checks against a null struct that is generated as part 
> of a UDF correctly returns {{true}}. However, the same check against the same 
> null struct that has been persisted to a table incorrectly returns {{false}}. 
> Additionally, when a child field of the null struct is inspected in the same 
> query, the result of the null check on the struct itself reverses itself to 
> {{true}}.
> The issue does not appear to be dependent on the storage format of the table 
> as the same result is repeated with TEXTFILE, PARQUET, ORC and AVRO.
> h2. Example
> In this example I have used {{if(1=1, null, named_struct('c', 1))}} as a 
> crude method of generating a simple null struct.
> h4. 'b' is correctly reported as {{true}}.
> {code}
> hive> select
> >   b is null,
> >   b
> > from (
> >   select
> > if(1=1, null, named_struct('c', 1)) as b
> >   ) as a;
> OK
> true  NULL
> {code}
> h4. 'b' is correctly reported as {{true}} when also inspecting 'b.c'.
> {code}
> hive>
> > select
> >   b is null,
> >   b.c is null,
> >   b
> > from (
> >   select
> > if(1=1, null, named_struct('c', 1)) as b
> >   ) as a;
> OK
> true  trueNULL
> {code}
> h4. Persist the data to a table
> {code}
> hive>
> > create table a
> >   as
> > select
> >   if(1=1, null, named_struct('c', 1)) as b;
> OK
> {code}
> h4. 'b' is incorrectly reported as {{false}}.
> {code}
> hive>
> > select
> >   b is null,
> >   b
> > from a;
> OK
> false NULL
> {code}
> h4. 'b' is now correctly reported as {{true}} when also inspecting 'b.c'.
> {code}
> hive>
> > select
> >   b is null,
> >   b.c is null,
> >   b
> > from a;
> OK
> true  trueNULL
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15965) Metastore incorrectly re-uses a broken database connection

2017-03-20 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-15965:
---
Affects Version/s: (was: storage-2.2.0)
   2.1.1
 Target Version/s: 2.2.0

> Metastore incorrectly re-uses a broken database connection
> --
>
> Key: HIVE-15965
> URL: https://issues.apache.org/jira/browse/HIVE-15965
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Mass Dosage
> Attachments: hive.log
>
>
> *Background*
> In our setup we have a shared standalone MetaStore server running on EMR that 
> is accessed by various clients (Hive CLI, HiveServer2, Spark etc.) and 
> connects to an external MariaDB database for the MetaStore DB. It came to our 
> attention that MetaStore (or rather the underlying DataNucleus / BoneCP 
> combo) will keep re-using the same DB connections even when those get 
> suddenly closed for a reason that renders them unusable.
> For instance, due to a bug in the MariaDB JDBC driver v1.3.6 (see 
> https://jira.mariadb.org/browse/CONJ-270), a huge query including over 8 
> thousand parameter placeholders (e.g. partition IDs in case of a 
> {{get_partitions_by_expr}} function call)
> will yield a {{java.nio.BufferOverflowException}} and cause the SQL 
> connection be closed by the driver itself.
> This will ultimately result in the abortion of all further MetaStore Thrift 
> calls due to the failure of {{bonecp.ConnectionHandle.prepareStatement()}}.
> Such scenarios will be then caught by DataNucleus and translated to an 
> appropriate {{JDOException}}, only to be "ignored" by the 
> MetaStore.{{RetryingHMSHandler}} will, of course, continue retrying the 
> failing operation, but this is already pointless by that time since they will 
> invariably fail as long as the SQL connection remains closed. Please see the 
> attached MetaStore log [^hive.log] for details
> (captured from Hive 2.1.1 running on Windows in Eclipse IDE).
>  *Proposed behavior*
> We suggest that MetaStore should automatically renew the DB connection 
> whenever:
> * The connection gets closed by one of the underlying frameworks 
> (DataNucleus, BoneCP, JDBC driver); or
> * Query timeout is detected.
> This feature should be optional and configurable (disabled by default for 
> backward compatibility). Reconnection failures could probably be treated as 
> fatal errors and cause the immediate termination of MetaStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16254) metadata for values temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16254:
---
Attachment: HIVE-16254.1.patch

> metadata for values temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16254.1.patch
>
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16254) metadata for values temporary tables for INSERT's are getting replicated

2017-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932540#comment-15932540
 ] 

ASF GitHub Bot commented on HIVE-16254:
---

GitHub user anishek opened a pull request:

https://github.com/apache/hive/pull/162

HIVE-16254 : metadata for values temporary tables for INSERT's are getting 
replicated



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anishek/hive HIVE-16254

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #162


commit 7746e8cac7c5b2a93f638441771fa29699732950
Author: Anishek Agarwal 
Date:   2017-03-20T11:02:01Z

HIVE-16254 : metadata for values temporary tables for INSERT's are getting 
replicated




> metadata for values temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16254.1.patch
>
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16254) metadata for values temporary tables for INSERT's are getting replicated

2017-03-20 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932542#comment-15932542
 ] 

anishek commented on HIVE-16254:


[~thejas]/[~vgumashta]/[~daijy] Please review.

> metadata for values temporary tables for INSERT's are getting replicated
> 
>
> Key: HIVE-16254
> URL: https://issues.apache.org/jira/browse/HIVE-16254
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
> Attachments: HIVE-16254.1.patch
>
>
> create table a (age int);
> insert into table a values (34),(4);
> repl dump default;
> there is a temporary table created as  values__tmp__table__[nmber], which is 
> also present in the dumped information with only metadata, this should not be 
> processed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-20 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16178:

Status: Patch Available  (was: Open)

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16178.1.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14735) Build Infra: Spark artifacts download takes a long time

2017-03-20 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932639#comment-15932639
 ] 

Zoltan Haindrich commented on HIVE-14735:
-

[~spena] I've asked the spark developers about this: 
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-without-hive-assembly-for-hive-build-development-purposes-td21188.html

I didn't got back a clear answer to my question...beyond a "why do we use that" 
and a reference to HIVE-15302. 
what should we do now?


> Build Infra: Spark artifacts download takes a long time
> ---
>
> Key: HIVE-14735
> URL: https://issues.apache.org/jira/browse/HIVE-14735
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Vaibhav Gumashta
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14735.1.patch, HIVE-14735.1.patch, 
> HIVE-14735.1.patch, HIVE-14735.1.patch, HIVE-14735.2.patch, 
> HIVE-14735.3.patch, HIVE-14735.4.patch, HIVE-14735.4.patch, HIVE-14735.5.patch
>
>
> In particular this command:
> {{curl -Sso ./../thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz 
> http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.6.0-bin-hadoop2-without-hive.tgz}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-6905) Implement Auto increment, primary-foreign Key, not null constraints and default value in Hive Table columns

2017-03-20 Thread Richard Lloyd (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932644#comment-15932644
 ] 

Richard Lloyd commented on HIVE-6905:
-

1-up from me as this is a nice-to-have.  I agree there are workarounds in cases 
when uniqueness is the requirement though.

> Implement  Auto increment, primary-foreign Key, not null constraints and 
> default value in Hive Table columns
> 
>
> Key: HIVE-6905
> URL: https://issues.apache.org/jira/browse/HIVE-6905
> Project: Hive
>  Issue Type: New Feature
>  Components: Database/Schema
>Affects Versions: 0.14.0
>Reporter: Pardeep Kumar
>
> For Hive to replace a modern datawarehouse based on RDBMS, it must have 
> support for keys, constraints, auto-increment values, surrogate keys and not 
> null features etc. Many customers do not move their EDW to Hive due to these 
> reasons as these have been challenging to maintain in Hive.
> This must be implemented once https://issues.apache.org/jira/browse/HIVE-5317 
> for Updates, Deletes and Inserts are done in Hive. This should be next step 
> for Hive enhancement to take it closer to a very wide mainstream adoption..



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-03-20 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932646#comment-15932646
 ] 

Sushanth Sowmyan commented on HIVE-15642:
-

Hi [~vgumashta], HIVE-15478 is in, do you have any tests to add to this, or 
should that be taken up as a follow-up task?

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16186) REPL DUMP shows last event ID of the database even if we use LIMIT option.

2017-03-20 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932702#comment-15932702
 ] 

Sushanth Sowmyan commented on HIVE-16186:
-

Good spot, [~sankarh], I'd suggest further change though. Instead of this bit:

{noformat}
//Set the current last repl ID
404 eventTo = lastReplId;
{noformat}

where you change eventTo to be the last replicated id, I'd suggest leaving 
eventTo alone, as that is user-specified, and instead, change the dmd output to 
output the last replicated id as its range instead. i.e.:

{noformat}
407 writeOutput(
408 Arrays.asList("incremental", String.valueOf(eventFrom), 
String.valueOf(eventTo)),
409 dmd.getDumpFilePath());
410 dmd.setDump(DUMPTYPE.INCREMENTAL, eventFrom, eventTo, cmRoot);
411 dmd.write();
{noformat}

->

{noformat}
407 writeOutput(
408 Arrays.asList("incremental", String.valueOf(eventFrom), 
String.valueOf(lastReplId)),
409 dmd.getDumpFilePath());
410 dmd.setDump(DUMPTYPE.INCREMENTAL, eventFrom, lastReplId, 
cmRoot);
411 dmd.write();
{noformat}

I think this makes the intent very easy to read in code when approached later 
on, rather than a reader wondering why eventTo was changed.

(Also, looks like the buildbot didn't upload test results for this patch 
either, so having a new patch might trigger it to run again)

> REPL DUMP shows last event ID of the database even if we use LIMIT option.
> --
>
> Key: HIVE-16186
> URL: https://issues.apache.org/jira/browse/HIVE-16186
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR
> Attachments: HIVE-16186.01.patch
>
>
> Looks like LIMIT option doesn't work well with REPL DUMP.
> 0: jdbc:hive2://localhost:10001/default> REPL DUMP default FROM 170 LIMIT 1;
> +--+---+
> | dump_dir | last_repl_id  |
> +--+---+
> | /tmp/dump/1489395053411  | 195   |
> +--+---+



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-4095) Add exchange partition in Hive

2017-03-20 Thread Rick Moritz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932825#comment-15932825
 ] 

Rick Moritz commented on HIVE-4095:
---

Thanks [~erwaman] -- that should fix it.

> Add exchange partition in Hive
> --
>
> Key: HIVE-4095
> URL: https://issues.apache.org/jira/browse/HIVE-4095
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Dheeraj Kumar Singh
> Fix For: 0.12.0
>
> Attachments: hive.4095.1.patch, HIVE-4095.D10155.1.patch, 
> HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch, 
> HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt, 
> hive.4095.refresh.patch, hive.4095.svn.thrift.patch, 
> hive.4095.svn.thrift.patch.refresh
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16252) Vectorization: Cannot vectorize: Aggregation Function UDF avg

2017-03-20 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932826#comment-15932826
 ] 

Zoltan Haindrich commented on HIVE-16252:
-

this might cause quite a few vectorization problems (I think)

{{git grep notVectorizedReason}} returned a lot of occurences of this.

there is a type filtering logic - based on the type's string at:

https://github.com/apache/hive/blob/27f27219a2b965958f850a92bf581d7b9c3ddfb0/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2278

which rejects {{struct<...>}} - the actual value of 'type' in this case.

so generally I would say (based on what I've saw now): vectorization is 
currently unsupported for any aggregator in case {{group by}} is being used.

> Vectorization: Cannot vectorize: Aggregation Function UDF avg 
> --
>
> Key: HIVE-16252
> URL: https://issues.apache.org/jira/browse/HIVE-16252
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> {noformat}
> select 
> ss_store_sk, ss_item_sk, avg(ss_sales_price) as revenue
> from
> store_sales, date_dim
> where
> ss_sold_date_sk = d_date_sk
> and d_month_seq between 1212 and 1212 + 11
> group by ss_store_sk , ss_item_sk limit 100;
> 2017-03-20T00:59:49,526  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Validating ReduceWork...
> 2017-03-20T00:59:49,526 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Using reduce tag 0
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> lazybinary.LazyBinarySerDe: LazyBinarySerDe initialized with: 
> columnNames=[_col0] columnTypes=[struct]
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> vector.VectorizationContext: Input Expression = Column[KEY._col0], Vectorized 
> Expression = col 0
> ...
> ...
> 2017-03-20T00:59:49,528  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Cannot vectorize: Aggregation Function UDF avg parameter 
> expression for GROUPBY operator: Data type 
> struct of Column[VALUE._col0] not 
> supported
> {noformat}
> Env: Hive build from: commit 71f4930d95475e7e63b5acc55af3809aefcc71e0 (march 
> 16)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16205) Improving type safety in Objectstore

2017-03-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16205:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master.
Thanks [~vihangk1] for your contribution.

> Improving type safety in Objectstore
> 
>
> Key: HIVE-16205
> URL: https://issues.apache.org/jira/browse/HIVE-16205
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-16205.01.patch, HIVE-16205.02.patch, 
> HIVE-16205.03.patch
>
>
> Modify the queries in ObjectStore for better type safety



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16242) Run BeeLine tests parallel

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932870#comment-15932870
 ] 

Hive QA commented on HIVE-16242:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859546/HIVE-16242.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10476 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=231)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4245/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4245/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4245/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859546 - PreCommit-HIVE-Build

> Run BeeLine tests parallel
> --
>
> Key: HIVE-16242
> URL: https://issues.apache.org/jira/browse/HIVE-16242
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16242.patch
>
>
> Provide the ability for BeeLine tests to run parallel against the MiniHS2 
> cluster



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932874#comment-15932874
 ] 

Sergio Peña commented on HIVE-16024:


Thanks [~zsombor.klara].

+1

Are those failing flaky tests already reported on another JIRA?

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch, HIVE-16024.07.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-20 Thread Sunitha Beeram (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunitha Beeram updated HIVE-16206:
--
Attachment: HIVE-16206.2.patch

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-20 Thread Sunitha Beeram (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunitha Beeram updated HIVE-16206:
--
Status: Patch Available  (was: Open)

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-20 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932907#comment-15932907
 ] 

Barna Zsombor Klara commented on HIVE-16024:


TestCliDriver[comments] didn't have a Jira yet. I raised HIVE-16256 to cover it 
but based on the diff I'm not sure if this is the fault of the test itself. I 
checked locally and I cannot reproduce it so it's not just that someone forgot 
to update the baseline.

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch, HIVE-16024.07.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16024:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

I committed this patch to master.
Thanks [~zsombor.klara] for your contribution. I see the tests are not related, 
and a JIRA is reported.

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch, HIVE-16024.07.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16252) Vectorization: Cannot vectorize: Aggregation Function UDF avg

2017-03-20 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932949#comment-15932949
 ] 

Zoltan Haindrich commented on HIVE-16252:
-

hmm...seems like even the following is enough to reproduce, but the proplem 
doesn't appear for min/max:
{code}
explain vectorization select 
avg(ss_sales_price) as revenue
from
store_sales;
{code}

> Vectorization: Cannot vectorize: Aggregation Function UDF avg 
> --
>
> Key: HIVE-16252
> URL: https://issues.apache.org/jira/browse/HIVE-16252
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> {noformat}
> select 
> ss_store_sk, ss_item_sk, avg(ss_sales_price) as revenue
> from
> store_sales, date_dim
> where
> ss_sold_date_sk = d_date_sk
> and d_month_seq between 1212 and 1212 + 11
> group by ss_store_sk , ss_item_sk limit 100;
> 2017-03-20T00:59:49,526  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Validating ReduceWork...
> 2017-03-20T00:59:49,526 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Using reduce tag 0
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> lazybinary.LazyBinarySerDe: LazyBinarySerDe initialized with: 
> columnNames=[_col0] columnTypes=[struct]
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> vector.VectorizationContext: Input Expression = Column[KEY._col0], Vectorized 
> Expression = col 0
> ...
> ...
> 2017-03-20T00:59:49,528  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Cannot vectorize: Aggregation Function UDF avg parameter 
> expression for GROUPBY operator: Data type 
> struct of Column[VALUE._col0] not 
> supported
> {noformat}
> Env: Hive build from: commit 71f4930d95475e7e63b5acc55af3809aefcc71e0 (march 
> 16)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933004#comment-15933004
 ] 

Hive QA commented on HIVE-16178:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859183/HIVE-16178.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10494 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_corr] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_covar_samp] 
(batchId=6)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_windowing_2]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[windowing] 
(batchId=150)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[windowing] 
(batchId=118)
org.apache.hadoop.hive.ql.udf.generic.TestGenericUDAFCorrelation.testCorr 
(batchId=245)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4246/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4246/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4246/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859183 - PreCommit-HIVE-Build

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16178.1.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16242) Run BeeLine tests parallel

2017-03-20 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16242:
--
Attachment: HIVE-16242.2.patch

The parallelized execution bitten me, as it should :)

The regexp was not corrext for removing messages indicating that the thread is 
waiting for a lock. Fixed now.

> Run BeeLine tests parallel
> --
>
> Key: HIVE-16242
> URL: https://issues.apache.org/jira/browse/HIVE-16242
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16242.2.patch, HIVE-16242.patch
>
>
> Provide the ability for BeeLine tests to run parallel against the MiniHS2 
> cluster



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15978) Support regr_* functions

2017-03-20 Thread Carter Shanklin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter Shanklin updated HIVE-15978:
---
Labels:   (was: TODOC2.2)

> Support regr_* functions
> 
>
> Key: HIVE-15978
> URL: https://issues.apache.org/jira/browse/HIVE-15978
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-15978.1.patch, HIVE-15978.2.patch, 
> HIVE-15978.2.patch, HIVE-15978.3.patch
>
>
> Support the standard regr_* functions, regr_slope, regr_intercept, regr_r2, 
> regr_sxx, regr_syy, regr_sxy, regr_avgx, regr_avgy, regr_count. SQL reference 
> section 10.9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16227) GenMRFileSink1.java should refer to its nearest MR task

2017-03-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933069#comment-15933069
 ] 

Ashutosh Chauhan commented on HIVE-16227:
-

+1

> GenMRFileSink1.java should refer to its nearest MR task
> ---
>
> Key: HIVE-16227
> URL: https://issues.apache.org/jira/browse/HIVE-16227
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16227.01.patch, HIVE-16227.02.patch
>
>
> exposed by setting hive.stats.column.autogather=true and running parallel.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16072) LLAP: Add some additional jvm metrics for hadoop-metrics2

2017-03-20 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933122#comment-15933122
 ] 

Prasanth Jayachandran commented on HIVE-16072:
--

[~leftylev] Thanks for pointing out! Yes. I think metrics added for LLAP should 
be documented. Will update the docs shortly.

> LLAP: Add some additional jvm metrics for hadoop-metrics2 
> --
>
> Key: HIVE-16072
> URL: https://issues.apache.org/jira/browse/HIVE-16072
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-16072.1.patch, HIVE-16072.2.patch
>
>
> It will be helpful for debugging to expose some metrics like buffer pool, 
> file descriptors etc. that are not exposed via Hadoop's JvmMetrics. We 
> already a /jmx endpoint that gives out these info but we don't know the 
> timestamp of allocations, number file descriptors to correlated with the 
> logs. This will better suited for graphing tools. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16230) Enable CBO in presence of hints

2017-03-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933142#comment-15933142
 ] 

Pengcheng Xiong commented on HIVE-16230:


+1

> Enable CBO in presence of hints
> ---
>
> Key: HIVE-16230
> URL: https://issues.apache.org/jira/browse/HIVE-16230
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-16230.1.patch, HIVE-16230.2.patch, 
> HIVE-16230.3.patch, HIVE-16230.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16007) When the query does not complie the LogRunnable never stops

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933160#comment-15933160
 ] 

Hive QA commented on HIVE-16007:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859545/HIVE-16007.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10478 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[drop_with_concurrency]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=231)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hive.beeline.TestBeeLineWithArgs.testEmbeddedBeelineOutputs 
(batchId=214)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4247/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4247/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4247/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859545 - PreCommit-HIVE-Build

> When the query does not complie the LogRunnable never stops
> ---
>
> Key: HIVE-16007
> URL: https://issues.apache.org/jira/browse/HIVE-16007
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16007.02.patch, HIVE-16007.2.patch, 
> HIVE-16007.3.patch, HIVE-16007.4.patch, HIVE-16007.patch
>
>
> When issuing a sql command which does not compile then the LogRunnable thread 
> is never closed.
> The issue can be easily detected when running beeline with showWarnings=true.
> {code}
> $ ./beeline -u "jdbc:hive2://localhost:1 pvary pvary" --showWarnings=true
> [..]
> Connecting to jdbc:hive2://localhost:1
> Connected to: Apache Hive (version 2.2.0-SNAPSHOT)
> Driver: Hive JDBC (version 2.2.0-SNAPSHOT)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.2.0-SNAPSHOT by Apache Hive
> 0: jdbc:hive2://localhost:1> selekt;
> Warning: java.sql.SQLException: Method getQueryLog() failed. Because the 
> stmtHandle in HiveStatement is null and the statement execution might fail. 
> (state=,code=0)
> [..]
> Warning: java.sql.SQLException: Can't getQueryLog after statement has been 
> closed (state=,code=0)
> [..]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16257) Intermittent issue with incorrect resultset with Spark

2017-03-20 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-16257:
-
Description: 
This issue is highly intermittent that only seems to occurs with spark engine 
when the query has a GROUPBY clause. The following is the testcase.
{code}
drop table if exists test_hos_sample;
create table test_hos_sample (name string, val1 decimal(18,2), val2 
decimal(20,3));
insert into test_hos_sample values 
('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);

set hive.execution.engine=spark;
select  name, val1,val2 from test_hos_sample group by name, val1, val2;
{code}

Expected Results:
{code}
nameval1val2
test5   105.52  105.567
test3   103.52  102.345
test1   101.12  102.123
test4   104.52  104.456
test2   102.12  103.234
{code}

Incorrect results once in a while:
{code}
nameval1val2
test5   105.52  105.567
test3   103.52  102.345
test1   104.52  102.123
test4   104.52  104.456
test2   102.12  103.234
{code}

1) Not reproducible with HoMR.
2) Not an issue when running from spark-shell.
3) Not reproducible when the column data type is String or double. Only 
reproducible with decimal data types. Also works fine for decimal datatype if 
you cast decimal as string on read and cast it back to decimal on select.
4) Occurs with parquet and text file format as well. (havent tried with other 
formats).
5) Occurs in both scenarios when table data is within encryption zone and 
outside.
6) Even in clusters where this is reproducible, this occurs once in like 20 
times or more.
7) Occurs with both Beeline and Hive CLI.
8) Reproducible only when there is a a groupby clause.


  was:
This issue is highly intermittent that only seems to occurs with spark engine. 
The following is the testcase.
{code}
drop table if exists test_hos_sample;
create table test_hos_sample (name string, val1 decimal(18,2), val2 
decimal(20,3));
insert into test_hos_sample values 
('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);

set hive.execution.engine=spark;
select  name, val1,val2 from test_hos_sample group by name, val1, val2;
{code}

Expected Results:
{code}
nameval1val2
test5   105.52  105.567
test3   103.52  102.345
test1   101.12  102.123
test4   104.52  104.456
test2   102.12  103.234
{code}

Incorrect results once in a while:
{code}
nameval1val2
test5   105.52  105.567
test3   103.52  102.345
test1   104.52  102.123
test4   104.52  104.456
test2   102.12  103.234
{code}

1) Not reproducible with HoMR.
2) Not an issue when running from spark-shell.
3) Occurs with parquet and text file format as well. (havent tried with other 
formats).
4) Occurs in both scenarios when table data is within encryption zone and 
outside.
5) Even in clusters where this is reproducible, this occurs once in like 20 
times or more.
6) Occurs with both beeline and Hive CLI.



> Intermittent issue with incorrect resultset with Spark
> --
>
> Key: HIVE-16257
> URL: https://issues.apache.org/jira/browse/HIVE-16257
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>
> This issue is highly intermittent that only seems to occurs with spark engine 
> when the query has a GROUPBY clause. The following is the testcase.
> {code}
> drop table if exists test_hos_sample;
> create table test_hos_sample (name string, val1 decimal(18,2), val2 
> decimal(20,3));
> insert into test_hos_sample values 
> ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);
> set hive.execution.engine=spark;
> select  name, val1,val2 from test_hos_sample group by name, val1, val2;
> {code}
> Expected Results:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   101.12  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> Incorrect results once in a while:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.5

[jira] [Commented] (HIVE-16257) Intermittent issue with incorrect resultset with Spark

2017-03-20 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933187#comment-15933187
 ] 

Naveen Gangam commented on HIVE-16257:
--

[~xuefuz] [~szehon] Any clues on where this could be originating? When the 
problem does occur, the incorrect column value always seems to match a value 
from another row like show above. 
Ruled out any beeline display issue with output because it is reproducible from 
CLI too.
Although this is not reproducible with spark-shell, I have not ruled out to be 
a spark issue because the set of transformations used by spark-shell could be 
different from the transformations used by Hive.

What code should we instrument to confirm or eliminate hive as a source of the 
problem? Any help appreciated. Thank you

> Intermittent issue with incorrect resultset with Spark
> --
>
> Key: HIVE-16257
> URL: https://issues.apache.org/jira/browse/HIVE-16257
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>
> This issue is highly intermittent that only seems to occurs with spark engine 
> when the query has a GROUPBY clause. The following is the testcase.
> {code}
> drop table if exists test_hos_sample;
> create table test_hos_sample (name string, val1 decimal(18,2), val2 
> decimal(20,3));
> insert into test_hos_sample values 
> ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);
> set hive.execution.engine=spark;
> select  name, val1,val2 from test_hos_sample group by name, val1, val2;
> {code}
> Expected Results:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   101.12  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> Incorrect results once in a while:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   104.52  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> 1) Not reproducible with HoMR.
> 2) Not an issue when running from spark-shell.
> 3) Not reproducible when the column data type is String or double. Only 
> reproducible with decimal data types. Also works fine for decimal datatype if 
> you cast decimal as string on read and cast it back to decimal on select.
> 4) Occurs with parquet and text file format as well. (havent tried with other 
> formats).
> 5) Occurs in both scenarios when table data is within encryption zone and 
> outside.
> 6) Even in clusters where this is reproducible, this occurs once in like 20 
> times or more.
> 7) Occurs with both Beeline and Hive CLI.
> 8) Reproducible only when there is a a groupby clause.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16227) GenMRFileSink1.java may refer to a wrong MR task in multi-insert case

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16227:
---
Summary: GenMRFileSink1.java may refer to a wrong MR task in multi-insert 
case  (was: GenMRFileSink1.java should refer to its nearest MR task)

> GenMRFileSink1.java may refer to a wrong MR task in multi-insert case
> -
>
> Key: HIVE-16227
> URL: https://issues.apache.org/jira/browse/HIVE-16227
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16227.01.patch, HIVE-16227.02.patch
>
>
> exposed by setting hive.stats.column.autogather=true and running parallel.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16227) GenMRFileSink1.java may refer to a wrong MR task in multi-insert case

2017-03-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933210#comment-15933210
 ] 

Pengcheng Xiong commented on HIVE-16227:


I took a careful look at those test cases. It seems that the current golden 
file is wrong. I updated all the files and pushed the patch to master. Thanks 
[~ashutoshc] for the review.

> GenMRFileSink1.java may refer to a wrong MR task in multi-insert case
> -
>
> Key: HIVE-16227
> URL: https://issues.apache.org/jira/browse/HIVE-16227
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-16227.01.patch, HIVE-16227.02.patch
>
>
> exposed by setting hive.stats.column.autogather=true and running parallel.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16227) GenMRFileSink1.java may refer to a wrong MR task in multi-insert case

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16227:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> GenMRFileSink1.java may refer to a wrong MR task in multi-insert case
> -
>
> Key: HIVE-16227
> URL: https://issues.apache.org/jira/browse/HIVE-16227
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16227.01.patch, HIVE-16227.02.patch
>
>
> exposed by setting hive.stats.column.autogather=true and running parallel.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16227) GenMRFileSink1.java may refer to a wrong MR task in multi-insert case

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16227:
---
Fix Version/s: 2.2.0

> GenMRFileSink1.java may refer to a wrong MR task in multi-insert case
> -
>
> Key: HIVE-16227
> URL: https://issues.apache.org/jira/browse/HIVE-16227
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-16227.01.patch, HIVE-16227.02.patch
>
>
> exposed by setting hive.stats.column.autogather=true and running parallel.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16227) GenMRFileSink1.java may refer to a wrong MR task in multi-insert case

2017-03-20 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16227:
---
Affects Version/s: 1.0.0
   1.2.0
   2.0.0
   2.1.0

> GenMRFileSink1.java may refer to a wrong MR task in multi-insert case
> -
>
> Key: HIVE-16227
> URL: https://issues.apache.org/jira/browse/HIVE-16227
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.0.0, 1.2.0, 2.0.0, 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-16227.01.patch, HIVE-16227.02.patch
>
>
> exposed by setting hive.stats.column.autogather=true and running parallel.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16246) Support auto gather column stats for columns with trailing white spaces

2017-03-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933241#comment-15933241
 ] 

Pengcheng Xiong commented on HIVE-16246:


[~ashutoshc], could u take a look at this too?  I will update the golden file 
of column_names_with_leading_and_trailing_spaces and all the other failures are 
unrelated. Thanks.

> Support auto gather column stats for columns with trailing white spaces
> ---
>
> Key: HIVE-16246
> URL: https://issues.apache.org/jira/browse/HIVE-16246
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16246.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16057) SchemaTool ignores --passWord argument if hadoop.security.credential.provider.path is configured

2017-03-20 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933256#comment-15933256
 ] 

Aihua Xu commented on HIVE-16057:
-

+1 on the new patch. 

> SchemaTool ignores --passWord argument if 
> hadoop.security.credential.provider.path is configured
> 
>
> Key: HIVE-16057
> URL: https://issues.apache.org/jira/browse/HIVE-16057
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Fix For: 2.2.0
>
> Attachments: HIVE-16057.02.patch, HIVE-16057.03.patch, 
> HIVE-16057.patch
>
>
> It the {{hadoop.security.credential.provider.path}} is defined in command 
> line, but the correct {{HADOOP_CREDSTORE_PASSWORD}} is not provided the 
> SchemaTool fails, even if the correct metastore password is provided with 
> {{--passWord}}
> Could be reproduced if the hive-site.xml contains the following:
> {code}
>   
> hadoop.security.credential.provider.path
> 
> localjceks://file//Users/petervary/tmp/conf/creds.localjceks
>   
> {code}
> {code}
> $ ../schemaTool --dbType=mysql --info --passWord=pwd
> Metastore connection URL:  
> jdbc:mysql://localhost:3306/hive?useUnicode=true&characterEncoding=UTF-8
> Metastore Connection Driver :  com.mysql.jdbc.Driver
> Metastore connection User: hive
> org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema 
> version.
> *** schemaTool failed ***
> {code}
> The {{--passWord}} argument should override the errors from the credential 
> provider



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933271#comment-15933271
 ] 

Hive QA commented on HIVE-16206:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859579/HIVE-16206.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10478 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4248/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4248/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4248/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859579 - PreCommit-HIVE-Build

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-20 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-16166:
--
Attachment: HIVE-16166.02.patch

Fixed the problem with a List not providing the proper Iterator.set() 
functionality as explained above, i.e. by catching and ignoring 
UnsupportedOperationException.

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, 
> HIVE-16166.02.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16180) LLAP: Native memory leak in EncodedReader

2017-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933299#comment-15933299
 ] 

Sergey Shelukhin commented on HIVE-16180:
-

Tests at https://builds.apache.org/job/PreCommit-HIVE-Build/4234/testReport/ 
(HiveQA still fails to post). Failures need to be looked at

> LLAP: Native memory leak in EncodedReader
> -
>
> Key: HIVE-16180
> URL: https://issues.apache.org/jira/browse/HIVE-16180
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: DirectCleaner.java, FullGC-15GB-cleanup.png, 
> Full-gc-native-mem-cleanup.png, HIVE-16180.03.patch, HIVE-16180.1.patch, 
> HIVE-16180.2.patch, Native-mem-spike.png
>
>
> Observed this in internal test run. There is a native memory leak in Orc 
> EncodedReaderImpl that can cause YARN pmem monitor to kill the container 
> running the daemon. Direct byte buffers are null'ed out which is not 
> guaranteed to be cleaned until next Full GC. To show this issue, attaching a 
> small test program that allocates 3x256MB direct byte buffers. First buffer 
> is null'ed out but still native memory is used. Second buffer user Cleaner to 
> clean up native allocation. Third buffer is also null'ed but this time 
> invoking a System.gc() which cleans up all native memory. Output from the 
> test program is below
> {code}
> Allocating 3x256MB direct memory..
> Native memory used: 786432000
> Native memory used after data1=null: 786432000
> Native memory used after data2.clean(): 524288000
> Native memory used after data3=null: 524288000
> Native memory used without gc: 524288000
> Native memory used after gc: 0
> {code}
> Longer term improvements/solutions:
> 1) Use DirectBufferPool from hadoop or netty's 
> https://netty.io/4.0/api/io/netty/buffer/PooledByteBufAllocator.html as 
> direct byte buffer allocations are expensive (System.gc() + 100ms thread 
> sleep).
> 2) Use HADOOP-12760 for proper cleaner invocation in JDK8 and JDK9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933353#comment-15933353
 ] 

Sergio Peña commented on HIVE-16166:


Thanks. As you mentioned, this is a performance optimization, and if it does 
not work for some reason, then we can still rely on the normal way without 
using intern strings.

+1

Let's wait for HiveQA.

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, 
> HIVE-16166.02.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16049) upgrade to jetty 9

2017-03-20 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933362#comment-15933362
 ] 

Aihua Xu commented on HIVE-16049:
-

[~spena], [~mohitsabharwal], [~ctang.ma] can you guys also help review this 
jetty upgrade as well? Thanks.

> upgrade to jetty 9
> --
>
> Key: HIVE-16049
> URL: https://issues.apache.org/jira/browse/HIVE-16049
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Aihua Xu
> Attachments: HIVE-16049.0.patch, HIVE-16049.1.patch, 
> HIVE-16049.2.patch
>
>
> Jetty 7 has been deprecated for a couple of years now. Hadoop and HBase have 
> both updated to Jetty 9 for their next major releases, which will complicate 
> classpath concerns.
> Proactively update to Jetty 9 in the few places we use a web server.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16242) Run BeeLine tests parallel

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933386#comment-15933386
 ] 

Hive QA commented on HIVE-16242:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859590/HIVE-16242.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10480 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=231)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hive.hcatalog.templeton.TestConcurrentJobRequestsThreadsAndTimeout.ConcurrentListJobsVerifyExceptions
 (batchId=173)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4249/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4249/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4249/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859590 - PreCommit-HIVE-Build

> Run BeeLine tests parallel
> --
>
> Key: HIVE-16242
> URL: https://issues.apache.org/jira/browse/HIVE-16242
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16242.2.patch, HIVE-16242.patch
>
>
> Provide the ability for BeeLine tests to run parallel against the MiniHS2 
> cluster



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16049) upgrade to jetty 9

2017-03-20 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933416#comment-15933416
 ] 

Sean Busbey commented on HIVE-16049:


Change looks reasonable to me. Was the update to exclude stuff in 
{{hcatalog/webhcat/svr}} the only difference?

Any particular testing folks would like to see?

> upgrade to jetty 9
> --
>
> Key: HIVE-16049
> URL: https://issues.apache.org/jira/browse/HIVE-16049
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Aihua Xu
> Attachments: HIVE-16049.0.patch, HIVE-16049.1.patch, 
> HIVE-16049.2.patch
>
>
> Jetty 7 has been deprecated for a couple of years now. Hadoop and HBase have 
> both updated to Jetty 9 for their next major releases, which will complicate 
> classpath concerns.
> Proactively update to Jetty 9 in the few places we use a web server.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16257) Intermittent issue with incorrect resultset with Spark

2017-03-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933423#comment-15933423
 ] 

Edward Capriolo commented on HIVE-16257:


Now THAT is how to file a bug report! Tried it 8 different ways!

> Intermittent issue with incorrect resultset with Spark
> --
>
> Key: HIVE-16257
> URL: https://issues.apache.org/jira/browse/HIVE-16257
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>
> This issue is highly intermittent that only seems to occurs with spark engine 
> when the query has a GROUPBY clause. The following is the testcase.
> {code}
> drop table if exists test_hos_sample;
> create table test_hos_sample (name string, val1 decimal(18,2), val2 
> decimal(20,3));
> insert into test_hos_sample values 
> ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);
> set hive.execution.engine=spark;
> select  name, val1,val2 from test_hos_sample group by name, val1, val2;
> {code}
> Expected Results:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   101.12  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> Incorrect results once in a while:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   104.52  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> 1) Not reproducible with HoMR.
> 2) Not an issue when running from spark-shell.
> 3) Not reproducible when the column data type is String or double. Only 
> reproducible with decimal data types. Also works fine for decimal datatype if 
> you cast decimal as string on read and cast it back to decimal on select.
> 4) Occurs with parquet and text file format as well. (havent tried with other 
> formats).
> 5) Occurs in both scenarios when table data is within encryption zone and 
> outside.
> 6) Even in clusters where this is reproducible, this occurs once in like 20 
> times or more.
> 7) Occurs with both Beeline and Hive CLI.
> 8) Reproducible only when there is a a groupby clause.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16049) upgrade to jetty 9

2017-03-20 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933420#comment-15933420
 ] 

Aihua Xu commented on HIVE-16049:
-

[~busbey] Thanks for the review. Yes. The exclude stuff is the only main change 
(of course, I removed the code you commented out...)

> upgrade to jetty 9
> --
>
> Key: HIVE-16049
> URL: https://issues.apache.org/jira/browse/HIVE-16049
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Aihua Xu
> Attachments: HIVE-16049.0.patch, HIVE-16049.1.patch, 
> HIVE-16049.2.patch
>
>
> Jetty 7 has been deprecated for a couple of years now. Hadoop and HBase have 
> both updated to Jetty 9 for their next major releases, which will complicate 
> classpath concerns.
> Proactively update to Jetty 9 in the few places we use a web server.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16257) Intermittent issue with incorrect resultset with Spark

2017-03-20 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933437#comment-15933437
 ] 

Naveen Gangam commented on HIVE-16257:
--

Thanks [~appodictic] I try to add as many details as possible. It helps others 
understand the issue as well. Heck, even myself. A couple of months from now, I 
would be wondering what this issue was. :)

> Intermittent issue with incorrect resultset with Spark
> --
>
> Key: HIVE-16257
> URL: https://issues.apache.org/jira/browse/HIVE-16257
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>
> This issue is highly intermittent that only seems to occurs with spark engine 
> when the query has a GROUPBY clause. The following is the testcase.
> {code}
> drop table if exists test_hos_sample;
> create table test_hos_sample (name string, val1 decimal(18,2), val2 
> decimal(20,3));
> insert into test_hos_sample values 
> ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);
> set hive.execution.engine=spark;
> select  name, val1,val2 from test_hos_sample group by name, val1, val2;
> {code}
> Expected Results:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   101.12  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> Incorrect results once in a while:
> {code}
> nameval1val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   104.52  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> 1) Not reproducible with HoMR.
> 2) Not an issue when running from spark-shell.
> 3) Not reproducible when the column data type is String or double. Only 
> reproducible with decimal data types. Also works fine for decimal datatype if 
> you cast decimal as string on read and cast it back to decimal on select.
> 4) Occurs with parquet and text file format as well. (havent tried with other 
> formats).
> 5) Occurs in both scenarios when table data is within encryption zone and 
> outside.
> 6) Even in clusters where this is reproducible, this occurs once in like 20 
> times or more.
> 7) Occurs with both Beeline and Hive CLI.
> 8) Reproducible only when there is a a groupby clause.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2017-03-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933440#comment-15933440
 ] 

Sahil Takiar commented on HIVE-14919:
-

[~lirui], [~Ferd] would it make sense to add HoS integration with Spark's 
DataFrame / DataSets API? From the Spark docs it sounds like the DataFrame / 
DataSets API could improve performance since the APIs require specifying column 
types.

> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 1 TB data set comparing with 
> Spark 1.6. We can see performance improvments about 5.4% in general and 45% 
> for the best case. However, some queries doesn't have significant performance 
> improvements.  This JIRA is the umbrella ticket addressing those performance 
> issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16206) Make Codahale metrics reporters pluggable

2017-03-20 Thread Sunitha Beeram (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933458#comment-15933458
 ] 

Sunitha Beeram commented on HIVE-16206:
---

Test failure unrelated to updated code: 
https://builds.apache.org/job/PreCommit-HIVE-Build/4248/testReport/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_comments_/
The test has failed for the past 9 builds.

> Make Codahale metrics reporters pluggable
> -
>
> Key: HIVE-16206
> URL: https://issues.apache.org/jira/browse/HIVE-16206
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.2
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16206.2.patch, HIVE-16206.patch
>
>
> Hive metrics code currently allows pluggable metrics handlers - ie, handlers 
> that take care of providing interfaces for metrics collection as well as a 
> reporting; one of the 'handlers' is CodahaleMetrics. Codahale can work with 
> different reporters - currently supported ones are Console, JMX, JSON file 
> and hadoop2 sink. However, adding a new reporter involves changing that 
> class. We would like to make this conf driven just the way MetricsFactory 
> handles configurable Metrics classes.
> Scope of work:
> - Provide a new configuration option, HIVE_CODAHALE_REPORTER_CLASSES that 
> enumerates classes (like HIVE_METRICS_CLASS and unlike HIVE_METRICS_REPORTER).
> - Move JsonFileReporter into its own class.
> - Update CodahaleMetrics.java to read new config option and if the new option 
> is not present, look for the old option and instantiate accordingly) - ie, 
> make the code backward compatible.
> - Update and add new tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-15134) Branch-1.2: Investigate failure of TestMiniTezCliDriver#vector_auto_smb_mapjoin_14

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-15134.
-
Resolution: Not A Problem

> Branch-1.2: Investigate failure of 
> TestMiniTezCliDriver#vector_auto_smb_mapjoin_14
> --
>
> Key: HIVE-15134
> URL: https://issues.apache.org/jira/browse/HIVE-15134
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15134) Branch-1.2: Investigate failure of TestMiniTezCliDriver#vector_auto_smb_mapjoin_14

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933472#comment-15933472
 ] 

Vaibhav Gumashta commented on HIVE-15134:
-

Looked at this and  seems to be a diff only issue with different explain 
output. This fails on master as well when I run locally (note on master 
vector_auto_smb_mapjoin_14.q is now run thru MiniLLAP so likely the MiniTez 
output isn't updated anymore). Closing this.

> Branch-1.2: Investigate failure of 
> TestMiniTezCliDriver#vector_auto_smb_mapjoin_14
> --
>
> Key: HIVE-15134
> URL: https://issues.apache.org/jira/browse/HIVE-15134
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933491#comment-15933491
 ] 

Hive QA commented on HIVE-16166:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859609/HIVE-16166.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10480 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=141)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4250/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4250/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4250/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859609 - PreCommit-HIVE-Build

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, 
> HIVE-16166.02.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16259) Eclipse formatter for Hive code

2017-03-20 Thread Sunitha Beeram (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunitha Beeram reassigned HIVE-16259:
-


> Eclipse formatter for Hive code
> ---
>
> Key: HIVE-16259
> URL: https://issues.apache.org/jira/browse/HIVE-16259
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
>Priority: Trivial
>
> Provide a source code formatter to use with Hive project for popular IDEs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-20 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16178:

Attachment: HIVE-16178.2.patch

patch#2)  fixed q.out files

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16178.1.patch, HIVE-16178.2.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933512#comment-15933512
 ] 

Sergio Peña commented on HIVE-16166:


The {{testCliDriver[comments]}} test fails on other patches as well. This is 
not related to your patch, and is already reported.

Could you check only if {[testCliDriver[vector_if_expr]}} is related? I don't 
see it failing on previous Jenkins builds.

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, 
> HIVE-16166.02.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16259) Eclipse formatter for Hive code

2017-03-20 Thread Sunitha Beeram (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunitha Beeram resolved HIVE-16259.
---
Resolution: Duplicate

> Eclipse formatter for Hive code
> ---
>
> Key: HIVE-16259
> URL: https://issues.apache.org/jira/browse/HIVE-16259
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
>Priority: Trivial
>
> Provide a source code formatter to use with Hive project for popular IDEs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16259) Eclipse formatter for Hive code

2017-03-20 Thread Sunitha Beeram (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933513#comment-15933513
 ] 

Sunitha Beeram commented on HIVE-16259:
---

Nevermind, found one here: 
https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml

> Eclipse formatter for Hive code
> ---
>
> Key: HIVE-16259
> URL: https://issues.apache.org/jira/browse/HIVE-16259
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
>Priority: Trivial
>
> Provide a source code formatter to use with Hive project for popular IDEs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-16260:
-


> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16260 started by Deepak Jaiswal.
-
> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15127) Branch-1.2: Investigate failure of TestMinimrCliDriver.exchgpartition2lel.q

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933556#comment-15933556
 ] 

Vaibhav Gumashta commented on HIVE-15127:
-

I'll work on a fix for branch-1 and post it on HIVE-11554. Closing this.

> Branch-1.2: Investigate failure of TestMinimrCliDriver.exchgpartition2lel.q
> ---
>
> Key: HIVE-15127
> URL: https://issues.apache.org/jira/browse/HIVE-15127
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
>
> HIVE-12215, HIVE-12865, HIVE-11554 seem to be related.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11554:

Priority: Critical  (was: Major)
Target Version/s: 1.2.2

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-15127) Branch-1.2: Investigate failure of TestMinimrCliDriver.exchgpartition2lel.q

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-15127.
-
Resolution: Fixed

> Branch-1.2: Investigate failure of TestMinimrCliDriver.exchgpartition2lel.q
> ---
>
> Key: HIVE-15127
> URL: https://issues.apache.org/jira/browse/HIVE-15127
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
>
> HIVE-12215, HIVE-12865, HIVE-11554 seem to be related.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta reassigned HIVE-11554:
---

Assignee: Vaibhav Gumashta

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11554:

Target Version/s: 1.3.0  (was: 1.2.2)

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11554:

Target Version/s: 1.3.0, 1.2.2  (was: 1.3.0)

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933568#comment-15933568
 ] 

Vaibhav Gumashta commented on HIVE-11554:
-

Modifying target version to 1.3.0 since this is not a regression and 1.2.2 is a 
minor version update. If i'm able to put commit this before 1.2.2 RC is cut, 
I'll commit this to 1.2 branch

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933568#comment-15933568
 ] 

Vaibhav Gumashta edited comment on HIVE-11554 at 3/20/17 9:09 PM:
--

Modifying target version to add 1.3.0 since this is not a regression and 1.2.2 
is a minor version update. If i'm able to put commit this before 1.2.2 RC is 
cut, I'll commit this to 1.2 branch


was (Author: vgumashta):
Modifying target version to 1.3.0 since this is not a regression and 1.2.2 is a 
minor version update. If i'm able to put commit this before 1.2.2 RC is cut, 
I'll commit this to 1.2 branch

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16166) HS2 may still waste up to 15% of memory on duplicate strings

2017-03-20 Thread Misha Dmitriev (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933583#comment-15933583
 ] 

Misha Dmitriev commented on HIVE-16166:
---

I ran 'mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_if_expr.q' 
locally, and it passed.

I then checked the hive log at 
http://104.198.109.242/logs/PreCommit-HIVE-Build-4250/failed/141-TestMiniLlapLocalCliDriver-skewjoinopt15.q-vector_coalesce.q-orc_ppd_decimal.q-and-27-more/logs/hive.log
 It does have a bunch of exception stack traces, but it doesn't look like they 
are related with my changes. At least I don't see 'StringInternUtils' (my class 
where an NPE or some such is most likely to happen), and a bunch of NPEs all 
across this log are all of the same type and have no traces of the code that 
I've modified. I can't see where in this log the problematic test 
(vector_if_expr) starts, or do all the tests run in parallel?

> HS2 may still waste up to 15% of memory on duplicate strings
> 
>
> Key: HIVE-16166
> URL: https://issues.apache.org/jira/browse/HIVE-16166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: ch_2_excerpt.txt, HIVE-16166.01.patch, 
> HIVE-16166.02.patch
>
>
> A heap dump obtained from one of our users shows that 15% of memory is wasted 
> on duplicate strings, despite the recent optimizations that I made. The 
> problematic strings just come from different sources this time. See the 
> excerpt from the jxray (www.jxray.com) analysis attached.
> Adding String.intern() calls in the appropriate places reduces the overhead 
> of duplicate strings with this workload to ~6%. The remaining duplicates come 
> mostly from JDK internal and MapReduce data structures, and thus are more 
> difficult to fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16252) Vectorization: Cannot vectorize: Aggregation Function UDF avg

2017-03-20 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933581#comment-15933581
 ] 

Zoltan Haindrich commented on HIVE-16252:
-

I think this is by-design...the avg udaf uses an internal temporary type to 
communicate after partial1 
{{struct}} is that 
format...since it would be tricky to process this format in vectorized mode - 
it leaves it to the standard udaf to do this work; however the message is a bit 
misleading; it tries to apply it to an aggregate for FINAL which doesn't seem 
right.

[~rajesh.balamohan] did it cause any trouble?

> Vectorization: Cannot vectorize: Aggregation Function UDF avg 
> --
>
> Key: HIVE-16252
> URL: https://issues.apache.org/jira/browse/HIVE-16252
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>
> {noformat}
> select 
> ss_store_sk, ss_item_sk, avg(ss_sales_price) as revenue
> from
> store_sales, date_dim
> where
> ss_sold_date_sk = d_date_sk
> and d_month_seq between 1212 and 1212 + 11
> group by ss_store_sk , ss_item_sk limit 100;
> 2017-03-20T00:59:49,526  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Validating ReduceWork...
> 2017-03-20T00:59:49,526 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Using reduce tag 0
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> lazybinary.LazyBinarySerDe: LazyBinarySerDe initialized with: 
> columnNames=[_col0] columnTypes=[struct]
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> vector.VectorizationContext: Input Expression = Column[KEY._col0], Vectorized 
> Expression = col 0
> ...
> ...
> 2017-03-20T00:59:49,528  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] 
> physical.Vectorizer: Cannot vectorize: Aggregation Function UDF avg parameter 
> expression for GROUPBY operator: Data type 
> struct of Column[VALUE._col0] not 
> supported
> {noformat}
> Env: Hive build from: commit 71f4930d95475e7e63b5acc55af3809aefcc71e0 (march 
> 16)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14348:

Affects Version/s: (was: 2.1.0)
   2.1.1

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-11554) Exchange partition does not properly populate fields for post/pre execute hooks

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-11554.
-
Resolution: Duplicate

> Exchange partition does not properly populate fields for post/pre execute 
> hooks
> ---
>
> Key: HIVE-11554
> URL: https://issues.apache.org/jira/browse/HIVE-11554
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0
>Reporter: Paul Yang
>Assignee: Vaibhav Gumashta
>Priority: Critical
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: null
> {noformat}
> The post hook should not say null.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14348:

Target Version/s: 1.3.0, 1.2.2, 2.2.0

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16258) Suggesting a non-standard extension to MERGE

2017-03-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16258:
--
Component/s: Transactions

> Suggesting a non-standard extension to MERGE
> 
>
> Key: HIVE-16258
> URL: https://issues.apache.org/jira/browse/HIVE-16258
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Carter Shanklin
>
> Some common data maintenance strategies, especially the Type 2 SCD update, 
> would become substantially easier with a small extension to the SQL standard 
> for MERGE, specifically the ability to say "when matched then insert". Per 
> the standard, matched records can only be updated or deleted.
> In the Type 2 SCD, when a new record comes in you update the old version of 
> the record and insert the new version of the same record. If this extension 
> were supported, sample Type 2 SCD code would look as follows:
> {code}
> merge into customer
> using new_customer_stage stage
> on stage.source_pk = customer.source_pk
> when not matched then insert values/* Insert a net new record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null)
> when matched then update set   /* Update an old record to mark it as 
> out-of-date */
>   is_current = false, end_date = current_date()
> when matched then insert values/* Insert a new current record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null);
> {code}
> Without this support, the user needs to devise some sort of workaround. A 
> common approach is to first left join the staging table against the table to 
> be updated, then to join these results to a helper table that will spit out 
> two records for each match and one record for each miss. One of the matching 
> records needs to have a join key that can never occur in the source data so 
> this requires precise knowledge of the source dataset.
> An example of this:
> {code}
> merge into customer
> using (
>   select
> *,
> coalesce(invalid_key, source_pk) as join_key
>   from (
> select
>   stage.source_pk, stage.name, stage.state,
>   case when customer.source_pk is null then 1
>   when stage.name <> customer.name or stage.state <> customer.state then 2
>   else 0 end as scd_row_type
> from
>   new_customer_stage stage
> left join
>   customer
> on (stage.source_pk = customer.source_pk and customer.is_current = true)
>   ) updates
>   join scd_types on scd_types.type = scd_row_type
> ) sub
> on sub.join_key = customer.source_pk
> when matched then update set
>   is_current = false,
>   end_date = current_date()
> when not matched then insert values
>   (sub.source_pk, upper(substr(sub.name, 0, 3)), sub.name, sub.state, true, 
> null);
> select * from customer order by source_pk;
> {code}
> This code is very complicated and will fail if the "invalid" key ever shows 
> up in the source dataset. This simple extension provides a lot of value and 
> likely very little maintenance overhead.
> /cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16258) Suggesting a non-standard extension to MERGE

2017-03-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16258:
--
Affects Version/s: 2.2.0

> Suggesting a non-standard extension to MERGE
> 
>
> Key: HIVE-16258
> URL: https://issues.apache.org/jira/browse/HIVE-16258
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>
> Some common data maintenance strategies, especially the Type 2 SCD update, 
> would become substantially easier with a small extension to the SQL standard 
> for MERGE, specifically the ability to say "when matched then insert". Per 
> the standard, matched records can only be updated or deleted.
> In the Type 2 SCD, when a new record comes in you update the old version of 
> the record and insert the new version of the same record. If this extension 
> were supported, sample Type 2 SCD code would look as follows:
> {code}
> merge into customer
> using new_customer_stage stage
> on stage.source_pk = customer.source_pk
> when not matched then insert values/* Insert a net new record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null)
> when matched then update set   /* Update an old record to mark it as 
> out-of-date */
>   is_current = false, end_date = current_date()
> when matched then insert values/* Insert a new current record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null);
> {code}
> Without this support, the user needs to devise some sort of workaround. A 
> common approach is to first left join the staging table against the table to 
> be updated, then to join these results to a helper table that will spit out 
> two records for each match and one record for each miss. One of the matching 
> records needs to have a join key that can never occur in the source data so 
> this requires precise knowledge of the source dataset.
> An example of this:
> {code}
> merge into customer
> using (
>   select
> *,
> coalesce(invalid_key, source_pk) as join_key
>   from (
> select
>   stage.source_pk, stage.name, stage.state,
>   case when customer.source_pk is null then 1
>   when stage.name <> customer.name or stage.state <> customer.state then 2
>   else 0 end as scd_row_type
> from
>   new_customer_stage stage
> left join
>   customer
> on (stage.source_pk = customer.source_pk and customer.is_current = true)
>   ) updates
>   join scd_types on scd_types.type = scd_row_type
> ) sub
> on sub.join_key = customer.source_pk
> when matched then update set
>   is_current = false,
>   end_date = current_date()
> when not matched then insert values
>   (sub.source_pk, upper(substr(sub.name, 0, 3)), sub.name, sub.state, true, 
> null);
> select * from customer order by source_pk;
> {code}
> This code is very complicated and will fail if the "invalid" key ever shows 
> up in the source dataset. This simple extension provides a lot of value and 
> likely very little maintenance overhead.
> /cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16178) corr/covar_samp UDAF standard compliance

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933639#comment-15933639
 ] 

Hive QA commented on HIVE-16178:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12859631/HIVE-16178.2.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10496 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4251/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4251/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4251/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12859631 - PreCommit-HIVE-Build

> corr/covar_samp UDAF standard compliance
> 
>
> Key: HIVE-16178
> URL: https://issues.apache.org/jira/browse/HIVE-16178
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16178.1.patch, HIVE-16178.2.patch
>
>
> h3. corr
> the standard defines corner cases when it should return null - but the 
> current result is NaN.
> If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
> and
> If N * SUMY2 equals SUMY * SUMY , then the result is the null value.
> h3. covar_samp
> returns 0 instead 1
> `If N is 1 (one), then the result is the null value.`
> h3. check (x,y) vs (y,x) args in docs
> the standard uses (y,x) order; and some of the function names are also 
> contain X and Y...so the order does matter..currently at least corr uses 
> (x,y) order which is okay - because its symmetric; but it would be great to 
> have the same order everywhere (check others)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14348:

Priority: Blocker  (was: Major)

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14348) Add tests for alter table exchange partition

2017-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933643#comment-15933643
 ] 

Hive QA commented on HIVE-14348:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12820687/HIVE-14348.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4252/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4252/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4252/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-03-20 21:46:07.990
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4252/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-03-20 21:46:07.993
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 7ea85a0 HIVE-16227: GenMRFileSink1.java may refer to a wrong MR 
task in multi-insert case (Pengcheng Xiong, reviewed by Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 7ea85a0 HIVE-16227: GenMRFileSink1.java may refer to a wrong MR 
task in multi-insert case (Pengcheng Xiong, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-03-20 21:46:09.085
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java: 
No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java: No 
such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java: No such 
file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/Operation2Privilege.java:
 No such file or directory
error: a/ql/src/test/results/clientnegative/exchange_partition.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/exchange_partition.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/exchange_partition2.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/exchange_partition3.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/exchgpartition2lel.q.out: No such 
file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12820687 - PreCommit-HIVE-Build

> Add tests for alter table exchange partition
> 
>
> Key: HIVE-14348
> URL: https://issues.apache.org/jira/browse/HIVE-14348
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Blocker
> Attachments: HIVE-14348.1.patch, HIVE-14348.2.patch, 
> HIVE-14348.3.patch, HIVE-14348.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933647#comment-15933647
 ] 

Lefty Leverenz commented on HIVE-16024:
---

Does this need any documentation in the wiki?  It looks like a simple bug fix, 
but here are the relevant doc links just in case:

* [Recover Partitions (MSCK REPAIR TABLE) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)]
* [Configuration Properties -- hive.mapred.mode | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapred.mode]

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 2.2.0
>
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch, HIVE-16024.05.patch, 
> HIVE-16024.06.patch, HIVE-16024.07.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16260:
--
Status: Patch Available  (was: In Progress)

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16260) Remove parallel edges of semijoin with map joins.

2017-03-20 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16260:
--
Attachment: HIVE-16260.1.patch

> Remove parallel edges of semijoin with map joins.
> -
>
> Key: HIVE-16260
> URL: https://issues.apache.org/jira/browse/HIVE-16260
> Project: Hive
>  Issue Type: Task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16260.1.patch
>
>
> Remove parallel edges of semijoin with map joins as they don't give any 
> benefit to the query.
> Also, ensure that bloom filters are created to handle at least 1M entries and 
> the semijoin is disabled if the big table has less than 1M rows.
> Both these features are configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16209) Vectorization: Add support for complex types to VectorExtractRow and VectorAssignRow

2017-03-20 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16209:

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-15468

> Vectorization: Add support for complex types to VectorExtractRow and 
> VectorAssignRow
> 
>
> Key: HIVE-16209
> URL: https://issues.apache.org/jira/browse/HIVE-16209
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> Supports complex types in non-native VectorReduceSink, row mode Text 
> Vectorization, and some cases of Vectorized Schema Evolution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >