date:20161214

[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-14 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15422:

Attachment: HIVE-15422.3.patch

There is one additional place in {{AbstractMapOperator::getNominalPath}} where 
it can be optimized for path checks when large number of partitions are 
present. Attaching .3 version. 

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, 
> HIVE-15422.3.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: In Progress  (was: Patch Available)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: Patch Available  (was: In Progress)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Attachment: HIVE-15335.09.patch

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747683#comment-15747683
 ] 

Hive QA commented on HIVE-15192:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843172/HIVE-15192.9.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10815 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_subq_exists] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_subq_in] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_subq_not_in] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2571/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2571/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2571/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843172 - PreCommit-HIVE-Build

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.2.patch, HIVE-15192.3.patch, 
> HIVE-15192.4.patch, HIVE-15192.5.patch, HIVE-15192.6.patch, 
> HIVE-15192.7.patch, HIVE-15192.8.patch, HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views

2016-12-14 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747748#comment-15747748
 ] 

Rajesh Balamohan commented on HIVE-15420:
-

+1

> LLAP UI: Relativize resources to allow proxied/secured views 
> -
>
> Key: HIVE-15420
> URL: https://issues.apache.org/jira/browse/HIVE-15420
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Web UI
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-15420.1.patch
>
>
> If the UI is secured behind a gateway firewall instance, this allows for the 
> UI to function with a base URL like http:///proxy/
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15397) metadata-only queries may return incorrect results with empty tables

2016-12-14 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15397:
--
Labels: TODOC2.2  (was: )

> metadata-only queries may return incorrect results with empty tables
> 
>
> Key: HIVE-15397
> URL: https://issues.apache.org/jira/browse/HIVE-15397
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15397.01.patch, HIVE-15397.patch
>
>
> Queries like select 1=1 from t group by 1=1 may return rows, based on 
> OneNullRowInputFormat, even if the source table is empty. For now, add some 
> basic detection of empty tables and turn this off by default (since we can't 
> know whether a table is empty or not based on there being some files, without 
> reading them).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15397) metadata-only queries may return incorrect results with empty tables

2016-12-14 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747826#comment-15747826
 ] 

Lefty Leverenz commented on HIVE-15397:
---

Doc note:  This changes the default value of *hive.optimize.metadataonly* and 
gives it a description.  It was created in 0.8.0 by HIVE-1003 and isn't 
documented in the wiki yet.  Note that the wiki description will need editing 
to cover the earlier default value of true.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Added a TODOC2.2 label.

> metadata-only queries may return incorrect results with empty tables
> 
>
> Key: HIVE-15397
> URL: https://issues.apache.org/jira/browse/HIVE-15397
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15397.01.patch, HIVE-15397.patch
>
>
> Queries like select 1=1 from t group by 1=1 may return rows, based on 
> OneNullRowInputFormat, even if the source table is empty. For now, add some 
> basic detection of empty tables and turn this off by default (since we can't 
> know whether a table is empty or not based on there being some files, without 
> reading them).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-1003) optimize metadata only queries

2016-12-14 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747837#comment-15747837
 ] 

Lefty Leverenz commented on HIVE-1003:
--

Doc note:  This adds *hive.optimize.metadataonly* to HiveConf.java with a 
default value of true in release 0.8.0.

HIVE-15397 changes the default to false in release 2.2.0.

When it is documented in the wiki, this link will work:

* [Configuration Properties -- hive.optimize.metadataonly | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.metadataonly]


> optimize metadata only queries
> --
>
> Key: HIVE-1003
> URL: https://issues.apache.org/jira/browse/HIVE-1003
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Marcin Kurczych
> Fix For: 0.8.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--D105.1.patch, 
> ASF.LICENSE.NOT.GRANTED--D105.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-1003.D105.3.patch, HIVE-1003.1.patch, 
> hive.1003.2.patch, hive.1003.3.patch, hive.1003.4.patch
>
>
> Queries like:
> select max(ds) from T 
> where ds is a partitioning column should be optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747847#comment-15747847
 ] 

Hive QA commented on HIVE-15422:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843175/HIVE-15422.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10796 tests 
executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=161)

[scriptfile1.q,vector_outer_join5.q,file_with_header_footer.q,bucket4.q,input16_cc.q,bucket5.q,infer_bucket_sort_merge.q,constprog_partitioner.q,orc_merge2.q,reduce_deduplicate.q,schemeAuthority2.q,load_fs2.q,orc_merge8.q,orc_merge_incompat2.q,infer_bucket_sort_bucketed_table.q,vector_outer_join4.q,disable_merge_for_bucketing.q,vector_inner_join.q,orc_merge7.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2572/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2572/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2572/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843175 - PreCommit-HIVE-Build

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, 
> HIVE-15422.3.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-14 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747890#comment-15747890
 ] 

Jesus Camacho Rodriguez commented on HIVE-15277:


[~bslim], there are some failures related to this patch. We need to fix them 
before checking it in.

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747971#comment-15747971
 ] 

Hive QA commented on HIVE-15335:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843179/HIVE-15335.09.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 10899 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_boolean] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_char] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_data_types] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_reduce_groupby_decimal]
 (batchId=29)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_ppd_date]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_ppd_decimal]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_ppd_timestamp]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_ppd_varchar]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_data_types]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce_groupby_decimal]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_data_types] 
(batchId=127)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2573/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2573/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2573/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843179 - PreCommit-HIVE-Build

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15427) Hadoop3 support

2016-12-14 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748141#comment-15748141
 ] 

Peter Vary commented on HIVE-15427:
---

[~spena] did some work on this. It might worth to take a look this jira: 
HIVE-15016

> Hadoop3 support
> ---
>
> Key: HIVE-15427
> URL: https://issues.apache.org/jira/browse/HIVE-15427
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15427.WIP.patch
>
>
> Need to start working on Hadoop 3 support at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15427) Hadoop3 support

2016-12-14 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748155#comment-15748155
 ] 

Peter Vary commented on HIVE-15427:
---

I do not know if he did take a look at these things lately, just thought that 
it would be good idea to connect the two jiras :) 

> Hadoop3 support
> ---
>
> Key: HIVE-15427
> URL: https://issues.apache.org/jira/browse/HIVE-15427
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15427.WIP.patch
>
>
> Need to start working on Hadoop 3 support at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-15182) Move 'clause' rules from IdentifierParser to a different file

2016-12-14 Thread Daniel Voros (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Voros reassigned HIVE-15182:
---

Assignee: Daniel Voros  (was: Zoltan Haindrich)

> Move 'clause' rules from IdentifierParser to a different file
> -
>
> Key: HIVE-15182
> URL: https://issues.apache.org/jira/browse/HIVE-15182
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Daniel Voros
>
> I'm hitting antlr / code too large errors ; and these rules belong to a 
> different class than the other.
> Moving them to a separate file greatly reduces generated IdentifierParser 
> size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-15409) Add support for GROUPING function with grouping sets

2016-12-14 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-15409:
--

Assignee: Jesus Camacho Rodriguez

> Add support for GROUPING function with grouping sets
> 
>
> Key: HIVE-15409
> URL: https://issues.apache.org/jira/browse/HIVE-15409
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> The _grouping(col_expr)_ function indicates whether a given column is 
> aggregated in each row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-15409) Add support for GROUPING function with grouping sets

2016-12-14 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15409 started by Jesus Camacho Rodriguez.
--
> Add support for GROUPING function with grouping sets
> 
>
> Key: HIVE-15409
> URL: https://issues.apache.org/jira/browse/HIVE-15409
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> The _grouping(col_expr)_ function indicates whether a given column is 
> aggregated in each row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15409) Add support for GROUPING function with grouping sets

2016-12-14 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15409:
---
Status: Patch Available  (was: In Progress)

> Add support for GROUPING function with grouping sets
> 
>
> Key: HIVE-15409
> URL: https://issues.apache.org/jira/browse/HIVE-15409
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> The _grouping(col_expr)_ function indicates whether a given column is 
> aggregated in each row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15409) Add support for GROUPING function with grouping sets

2016-12-14 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15409:
---
Attachment: HIVE-15409.patch

Initial patch; expecting a few q file changes due to type change for 
GROUPING__ID column.

> Add support for GROUPING function with grouping sets
> 
>
> Key: HIVE-15409
> URL: https://issues.apache.org/jira/browse/HIVE-15409
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15409.patch
>
>
> The _grouping(col_expr)_ function indicates whether a given column is 
> aggregated in each row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-14 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-13278:
---
Comment: was deleted

(was: [~csun], please feel free to address MR case first if we need more time 
for HoS case. Thanks. )

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: In Progress  (was: Patch Available)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Attachment: HIVE-15335.091.patch

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: Patch Available  (was: In Progress)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-15182) Move 'clause' rules from IdentifierParser to a different file

2016-12-14 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-15182:
---

Assignee: Zoltan Haindrich  (was: Daniel Voros)

> Move 'clause' rules from IdentifierParser to a different file
> -
>
> Key: HIVE-15182
> URL: https://issues.apache.org/jira/browse/HIVE-15182
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>
> I'm hitting antlr / code too large errors ; and these rules belong to a 
> different class than the other.
> Moving them to a separate file greatly reduces generated IdentifierParser 
> size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-15182) Move 'clause' rules from IdentifierParser to a different file

2016-12-14 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reopened HIVE-15182:
-

> Move 'clause' rules from IdentifierParser to a different file
> -
>
> Key: HIVE-15182
> URL: https://issues.apache.org/jira/browse/HIVE-15182
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>
> I'm hitting antlr / code too large errors ; and these rules belong to a 
> different class than the other.
> Moving them to a separate file greatly reduces generated IdentifierParser 
> size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-15182) Move 'clause' rules from IdentifierParser to a different file

2016-12-14 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-15182.
-
Resolution: Invalid

> Move 'clause' rules from IdentifierParser to a different file
> -
>
> Key: HIVE-15182
> URL: https://issues.apache.org/jira/browse/HIVE-15182
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>
> I'm hitting antlr / code too large errors ; and these rules belong to a 
> different class than the other.
> Moving them to a separate file greatly reduces generated IdentifierParser 
> size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748570#comment-15748570
 ] 

Hive QA commented on HIVE-15335:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843231/HIVE-15335.091.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10899 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2574/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2574/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2574/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843231 - PreCommit-HIVE-Build

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-14 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15425:
-
Status: Open  (was: Patch Available)

Pre-commit build hasnt been triggered. Hence canceling the patch.

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-14 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15425:
-
Attachment: HIVE-15425.patch

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch, HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-14 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15425:
-
Status: Patch Available  (was: Open)

Re-attaching the patch from last night.

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch, HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-14 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748595#comment-15748595
 ] 

Naveen Gangam commented on HIVE-15425:
--

Sorry, the pre-commits build was executed. None of the failures appear related 
to the patch. [~aihuaxu] can you review? Thanks

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch, HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15428) HoS DPP doesn't remove cyclic dependency

2016-12-14 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748659#comment-15748659
 ] 

Rui Li commented on HIVE-15428:
---

The reason why we didn't see this before is related to how we generate the 
GenericUDFIn. DPP is created for UDF like {noformat}GenericUDFIn(Column[ds], 
RS[12]){noformat} and is not created if the joining column is resolved to 
constant like {noformat}GenericUDFIn(Const string 2008-04-08, RS[11]){noformat} 
Previously we had one UDF resolved like above, so we only created one DPP.
Anyway our code needs to be robust to handle both cases. I'll upload a patch to 
add cycle detection.

> HoS DPP doesn't remove cyclic dependency
> 
>
> Key: HIVE-15428
> URL: https://issues.apache.org/jira/browse/HIVE-15428
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>
> More details in HIVE-15357



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15297) Hive should not split semicolon within quoted string literals

2016-12-14 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748697#comment-15748697
 ] 

Ashutosh Chauhan commented on HIVE-15297:
-

OK..+1

> Hive should not split semicolon within quoted string literals
> -
>
> Key: HIVE-15297
> URL: https://issues.apache.org/jira/browse/HIVE-15297
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15297.01.patch, HIVE-15297.02.patch, 
> HIVE-15297.03.patch
>
>
> String literals in query cannot have reserved symbols. The same set of query 
> works fine in mysql and postgresql. 
> {code}
> hive> CREATE TABLE ts(s varchar(550));
> OK
> Time taken: 0.075 seconds
> hive> INSERT INTO ts VALUES ('Mozilla/5.0 (iPhone; CPU iPhone OS 5_0');
> MismatchedTokenException(14!=326)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:7271)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:7370)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:7510)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:51854)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45432)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44578)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:8)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1694)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1176)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:31 mismatched input '/' expecting ) near 
> 'Mozilla' in value row constructor
> hive>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator

2016-12-14 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748709#comment-15748709
 ] 

Ashutosh Chauhan commented on HIVE-15339:
-

[~jcamachorodriguez] Can you please review this one?

> Batch metastore calls to get column stats for fields needed in 
> FilterSelectivityEstimator
> -
>
> Key: HIVE-15339
> URL: https://issues.apache.org/jira/browse/HIVE-15339
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch
>
>
> Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics 
> from metastore in multiple calls. For instance, in the following query, it 
> ends up getting individual column statistics for for flights multiple number 
> of times.
> When the table has large number of partitions, getting statistics for columns 
> via multiple calls can be very expensive. This would adversely impact the 
> overall compilation time. The following query took 14 seconds to compile.
> {noformat}
> SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
> YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
> FROM `flights` as `flights`
> JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
> JOIN `airports` as `source_airport` ON (`flights`.`origin` = 
> `source_airport`.`iata`)
> JOIN `airports` as `dest_airport` ON (`flights`.`dest` = 
> `dest_airport`.`iata`)
> GROUP BY YEAR(`flights`.`dateofflight`);
> {noformat}
> It may be helpful to club all columns that need statistics and fetch these 
> details in single remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748708#comment-15748708
 ] 

Hive QA commented on HIVE-15425:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843243/HIVE-15425.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10814 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hive.hcatalog.api.TestHCatClientNotification.createTable 
(batchId=220)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2575/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2575/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2575/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843243 - PreCommit-HIVE-Build

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch, HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15395) Don't try to intern strings from empty map

2016-12-14 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748714#comment-15748714
 ] 

Ashutosh Chauhan commented on HIVE-15395:
-

[~rajesh.balamohan] Can you please review this one?

> Don't try to intern strings from empty map
> --
>
> Key: HIVE-15395
> URL: https://issues.apache.org/jira/browse/HIVE-15395
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15395.patch
>
>
> Otherwise it unnecessarily create another map object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator

2016-12-14 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748718#comment-15748718
 ] 

Jesus Camacho Rodriguez commented on HIVE-15339:


[~rajesh.balamohan], I have taken a look at the latest patch and I have a 
couple of comments.

* We might end up retrieving stats for more columns than we actually need, e.g, 
if folding leads to new pruning opportunities? If most of the time is spent in 
the round trip to gather the stats, that should still be fine. But is this 
always the case? Could this harm us for some queries?
* Further, your latest patch does the consolidation at column projection time 
and thus assumes that we have projected some columns from the table. If we bail 
out in L167 (no columns are projected), we will not consolidate the calls. This 
might introduce some more variability in the time to plan the query.

> Batch metastore calls to get column stats for fields needed in 
> FilterSelectivityEstimator
> -
>
> Key: HIVE-15339
> URL: https://issues.apache.org/jira/browse/HIVE-15339
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch
>
>
> Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics 
> from metastore in multiple calls. For instance, in the following query, it 
> ends up getting individual column statistics for for flights multiple number 
> of times.
> When the table has large number of partitions, getting statistics for columns 
> via multiple calls can be very expensive. This would adversely impact the 
> overall compilation time. The following query took 14 seconds to compile.
> {noformat}
> SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
> YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
> FROM `flights` as `flights`
> JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
> JOIN `airports` as `source_airport` ON (`flights`.`origin` = 
> `source_airport`.`iata`)
> JOIN `airports` as `dest_airport` ON (`flights`.`dest` = 
> `dest_airport`.`iata`)
> GROUP BY YEAR(`flights`.`dateofflight`);
> {noformat}
> It may be helpful to club all columns that need statistics and fetch these 
> details in single remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14998) Fix and update test: TestPluggableHiveSessionImpl

2016-12-14 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14998:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, [~kgyrtkirk]

> Fix and update test: TestPluggableHiveSessionImpl
> -
>
> Key: HIVE-14998
> URL: https://issues.apache.org/jira/browse/HIVE-14998
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-14998.1.patch, HIVE-14998.2.patch
>
>
> this test either prints an exception to the stdout ... or not - in its 
> current form it doesn't really usefull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14948) properly handle special characters in identifiers

2016-12-14 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14948:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

committed to master
thanks Alan for the review

> properly handle special characters in identifiers
> -
>
> Key: HIVE-14948
> URL: https://issues.apache.org/jira/browse/HIVE-14948
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 2.2.0
>
> Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch, 
> HIVE-14948.03.patch
>
>
> The treatment of quoted identifiers in HIVE-14943 is inconsistent.  Need to 
> clean this up and if possible only quote those identifiers that need to be 
> quoted in the generated SQL statement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15378) clean up HADOOP_USER_CLASSPATH_FIRST in bin scripts

2016-12-14 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748859#comment-15748859
 ] 

Sergio Peña commented on HIVE-15378:


Thanks.
+1

> clean up HADOOP_USER_CLASSPATH_FIRST in bin scripts
> ---
>
> Key: HIVE-15378
> URL: https://issues.apache.org/jira/browse/HIVE-15378
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15378.1.patch
>
>
> beeline, hive, hplsql have this statement
> export HADOOP_USER_CLASSPATH_FIRST=true
> beeline and hplsql use 'hive --service' to start, so it is uselese in beeline 
> and hplsql
> add export HADOOP_USER_CLASSPATH_FIRST=true to hive.cmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15378) clean up HADOOP_USER_CLASSPATH_FIRST in bin scripts

2016-12-14 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15378:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~ferhui]. I committed to master.

> clean up HADOOP_USER_CLASSPATH_FIRST in bin scripts
> ---
>
> Key: HIVE-15378
> URL: https://issues.apache.org/jira/browse/HIVE-15378
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Fix For: 2.2.0
>
> Attachments: HIVE-15378.1.patch
>
>
> beeline, hive, hplsql have this statement
> export HADOOP_USER_CLASSPATH_FIRST=true
> beeline and hplsql use 'hive --service' to start, so it is uselese in beeline 
> and hplsql
> add export HADOOP_USER_CLASSPATH_FIRST=true to hive.cmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-14 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748950#comment-15748950
 ] 

Chao Sun commented on HIVE-13278:
-

Thanks [~lirui]. I think in Hive 2.x {{SparkPlanGenerator}} is the only likely 
place where the FileNotFound issue could be triggered, while in older versions 
of Hive (e.g., 1.1.0) it could be triggered in some other places.
I've attached a new patch, which I checked with some simple MR & Spark queries. 
Didn't see any FileNotFound message with the patch applied. Also, I don't think 
the test failures are related.
[~xuefuz], [~lirui], [~stakiar], could you take a look if you have time? Thanks.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15409) Add support for GROUPING function with grouping sets

2016-12-14 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15409:
---
Attachment: HIVE-15409.01.patch

> Add support for GROUPING function with grouping sets
> 
>
> Key: HIVE-15409
> URL: https://issues.apache.org/jira/browse/HIVE-15409
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15409.01.patch, HIVE-15409.patch
>
>
> The _grouping(col_expr)_ function indicates whether a given column is 
> aggregated in each row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-14 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749097#comment-15749097
 ] 

Xuefu Zhang commented on HIVE-13278:


I like the new patch, which takes a simpler, easy-to-understand approach. On 
the other hand, this approach requires that planner set the flags correctly in 
order to avoid FNF problem, which shouldn't be an issue.

Though not essential, I noticed there are are a few cases where planers such as 
PartialScanTask.execute() set 0 as the number of reducers (no reducers), which 
might be better to set the flag as well. However, I think we can have a 
separate JIRA for thos.

+1 on my side. Please also provide your review feedback, [~lirui]. Thanks.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15409) Add support for GROUPING function with grouping sets

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749182#comment-15749182
 ] 

Hive QA commented on HIVE-15409:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843265/HIVE-15409.01.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 10816 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby2] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_cube1] 
(batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_cube_multi_gby] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets2] 
(batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets3] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets4] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets5] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets6] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets_limit]
 (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_window] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_rollup1] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_grouping_operators]
 (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[limit_pushdown2] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets] 
(batchId=75)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_grouping_sets]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct]
 (batchId=92)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query18] 
(batchId=222)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query80] 
(batchId=222)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_cube1] 
(batchId=95)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_rollup1] 
(batchId=108)
org.apache.hive.hcatalog.api.TestHCatClientNotification.createTable 
(batchId=220)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2576/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2576/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2576/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 33 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843265 - PreCommit-HIVE-Build

> Add support for GROUPING function with grouping sets
> 
>
> Key: HIVE-15409
> URL: https://issues.apache.org/jira/browse/HIVE-15409
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15409.01.patch, HIVE-15409.patch
>
>
> The _grouping(col_expr)_ function indicates whether a given column is 
> aggregated in each row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-14 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15277:
--
Attachment: HIVE-15277.patch

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-14 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: HIVE-13278.3.patch

Thanks [~xuefuz], I think you're right, for MR, we need to set the flag for any 
place where a {{submitJob}} is called. Otherwise the control flow will go to 
{{checkOutputSpecs -> getMapRedWork()}} and trigger FNF.
I think I missed two places: {{PartialScanTask}} and {{ColumnTruncateTask}}. 
Attaching patch v3 to address that.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-14 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15277:
--
Attachment: HIVE-15277.patch

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15016) Run tests with Hadoop 3.0.0-alpha1

2016-12-14 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749299#comment-15749299
 ] 

Sergey Shelukhin commented on HIVE-15016:
-

I tried to build yday with Hadoop 3 (see HIVE-15427), was not aware of this 
JIRA. Seems like some extra changes are needed to build.
I also wonder about e.g. the pausemonitor change in this patch... it would mean 
that Hive will no longer work with Hadoop 2. I think we'd need to go back to 
shim model and 2 builds :(

> Run tests with Hadoop 3.0.0-alpha1
> --
>
> Key: HIVE-15016
> URL: https://issues.apache.org/jira/browse/HIVE-15016
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: Hadoop3Upstream.patch
>
>
> Hadoop 3.0.0-alpha1 was released back on Sep/16 to allow other components run 
> tests against this new version before GA.
> We should start running tests with Hive to validate compatibility against 
> Hadoop 3.0.
> NOTE: The patch used to test must not be committed to Hive until Hadoop 3.0 
> GA is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749407#comment-15749407
 ] 

Hive QA commented on HIVE-15277:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843276/HIVE-15277.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10782 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_location] 
(batchId=85)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2577/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2577/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2577/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843276 - PreCommit-HIVE-Build

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast

[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-14 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15277:
--
Attachment: HIVE-15277.patch

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749455#comment-15749455
 ] 

Matt McCline commented on HIVE-15335:
-

Patch 091 is a good test run (all test failures are old unrelated ones).

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15410) WebHCat supports get/set table property with its name containing period and hyphen

2016-12-14 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-15410:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to 2.2.0. Thanks [~thejas] for reviewing the patch.

> WebHCat supports get/set table property with its name containing period and 
> hyphen
> --
>
> Key: HIVE-15410
> URL: https://issues.apache.org/jira/browse/HIVE-15410
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15410.1.patch, HIVE-15410.patch
>
>
> Hive table properties could have period (.) or hyphen (-) in their names, 
> auto.purge is one of the examples. But WebHCat APIs does not support either 
> set or get these properties, and they throw out the error msg ""Invalid DDL 
> identifier :property". For example:
> {code}
> [root@ctang-1 ~]# curl -s 
> 'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key1?user.name=hiveuser'
> {"error":"Invalid DDL identifier :property"}
> [root@ctang-1 ~]# curl -s -X PUT -HContent-type:application/json -d '{ 
> "value": "true" }' 
> 'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key2?user.name=hiveuser/'
> {"error":"Invalid DDL identifier :property"}
> {code}
> This patch is going to add the supports to the property name containing 
> period and/or hyphen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14707) ACID: Insert shuffle sort-merges on blank KEY

2016-12-14 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-14707:
-

Assignee: Eugene Koifman

> ACID: Insert shuffle sort-merges on blank KEY
> -
>
> Key: HIVE-14707
> URL: https://issues.apache.org/jira/browse/HIVE-14707
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Eugene Koifman
>
> The ACID insert codepath uses a sorted shuffle, while they key used for 
> shuffle is always 0 bytes long.
> {code}
> hive (sales_acid)> explain insert into sales values(1, 2, 
> '3400---009', 1, null);
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: gopal_20160906172626_80261c4c-79cc-4e02-87fe-3133be404e55:2
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> ...
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: values__tmp__table__2
>   Statistics: Num rows: 1 Data size: 28 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: tmp_values_col1 (type: string), 
> tmp_values_col2 (type: string), tmp_values_col3 (type: string), 
> tmp_values_col4 (type: string), tmp_values_col5 (type: string)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4
> Statistics: Num rows: 1 Data size: 28 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order: 
>   Map-reduce partition columns: UDFToLong(_col1) (type: 
> bigint)
>   Statistics: Num rows: 1 Data size: 28 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: string), _col1 (type: 
> string), _col2 (type: string), _col3 (type: string), _col4 (type: string)
> Execution mode: vectorized, llap
> LLAP IO: no inputs
> {code}
> Note the missing "+" / "-" in the Sort Order fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749512#comment-15749512
 ] 

Hive QA commented on HIVE-13278:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843278/HIVE-13278.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10813 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2578/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2578/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2578/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843278 - PreCommit-HIVE-Build

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> or

[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-14 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749533#comment-15749533
 ] 

Xuefu Zhang commented on HIVE-13278:


+1 on patch #3.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15395) Don't try to intern strings from empty map

2016-12-14 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749647#comment-15749647
 ] 

Rajesh Balamohan commented on HIVE-15395:
-

+1. Thanks [~ashutoshc].

> Don't try to intern strings from empty map
> --
>
> Key: HIVE-15395
> URL: https://issues.apache.org/jira/browse/HIVE-15395
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-15395.patch
>
>
> Otherwise it unnecessarily create another map object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749650#comment-15749650
 ] 

Hive QA commented on HIVE-15277:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843292/HIVE-15277.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10813 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2579/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2579/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2579/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843292 - PreCommit-HIVE-Build

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14007) Replace ORC module with ORC release

2016-12-14 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14007:
-
Attachment: HIVE-14007.patch

Fix one more case of the ORC writer version changing.

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, 
> HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15383) Add additional info to 'desc function extended' output

2016-12-14 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15383:

Attachment: HIVE-15383.3.patch

patch-3: fix unit test related change.

> Add additional info to 'desc function extended' output
> --
>
> Key: HIVE-15383
> URL: https://issues.apache.org/jira/browse/HIVE-15383
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch, 
> HIVE-15383.3.patch
>
>
> Add additional info to the output to 'desc function extended'. The resources 
> would be helpful for the user to check which jars are referred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-14 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749719#comment-15749719
 ] 

Owen O'Malley commented on HIVE-15335:
--

Please create a pull request.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-14 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749741#comment-15749741
 ] 

Owen O'Malley commented on HIVE-15335:
--

* This breaks API compatibility of DecimalColumnVector, which is public API. I 
suspect you'll need to create a new class (mabye FastDecimalColumnVector?) 
* TypeDescription.createBatch will need an option about which kind of batch 
create.
* The ORC reader and writer need to support both formats.
* You'll need to port this code to the ORC project as well.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-14 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Summary: Avoid FileNotFoundException when map/reduce.xml is not available  
(was: Many redundant 'File not found' messages appeared in container log during 
query execution with Hive on Spark)

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749770#comment-15749770
 ] 

Hive QA commented on HIVE-15277:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843292/HIVE-15277.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10813 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2580/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2580/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2580/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843292 - PreCommit-HIVE-Build

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> HIVE-15277.patch, HIVE-15277.patch, HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: In Progress  (was: Patch Available)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Attachment: HIVE-15335.092.patch

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-14 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749825#comment-15749825
 ] 

Sergey Shelukhin commented on HIVE-15422:
-

+1 on updated patch

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, 
> HIVE-15422.3.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749845#comment-15749845
 ] 

Matt McCline commented on HIVE-15335:
-

I haven't created a pull request before. But, do you mean a pull request for 
the ORC github and just the changes that will affect orc and storage-api 
directories?

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749883#comment-15749883
 ] 

Hive QA commented on HIVE-14007:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843303/HIVE-14007.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 9991 tests 
executed
*Failed tests:*
{noformat}
TestBitFieldReader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestBitPack - did not produce a TEST-*.xml file (likely timed out) (batchId=237)
TestColumnStatistics - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestColumnStatisticsImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestDataReaderProperties - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestDynamicArray - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestFileDump - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestInStream - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestIntegerCompressionReader - did not produce a TEST-*.xml file (likely timed 
out) (batchId=236)
TestJsonFileDump - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestMemoryManager - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=133)

[mapreduce2.q,orc_llap_counters1.q,bucket6.q,insert_into1.q,empty_dir_in_table.q,orc_merge1.q,script_env_var1.q,orc_merge_diff_fs.q,llapdecider.q,load_hdfs_file_with_space_in_the_name.q,llap_nullscan.q,orc_ppd_basic.q,transform_ppr1.q,rcfile_merge4.q,orc_merge3.q]
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=137)

[orc_merge2.q,insert_into2.q,reduce_deduplicate.q,orc_llap_counters.q,cte_4.q,schemeAuthority2.q,file_with_header_footer.q,rcfile_merge3.q]
TestNewIntegerEncoding - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestOrcNullOptimization - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestOrcTimezone1 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestOrcTimezone2 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestOrcTimezone3 - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestOrcWideTable - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestOutStream - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestRLEv2 - did not produce a TEST-*.xml file (likely timed out) (batchId=236)
TestReaderImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestRecordReaderImpl - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestRunLengthByteReader - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestRunLengthIntegerReader - did not produce a TEST-*.xml file (likely timed 
out) (batchId=237)
TestSchemaEvolution - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestSerializationUtils - did not produce a TEST-*.xml file (likely timed out) 
(batchId=237)
TestStreamName - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestStringDictionary - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestStringRedBlackTree - did not produce a TEST-*.xml file (likely timed out) 
(batchId=236)
TestTypeDescription - did not produce a TEST-*.xml file (likely timed out) 
(batchId=238)
TestUnrolledBitPack - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestVectorOrcFile - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
TestZlib - did not produce a TEST-*.xml file (likely timed out) (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2581/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2581/console
Test logs: http://104.198.109.

[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-14 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15422:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review [~sershe].

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, 
> HIVE-15422.3.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749966#comment-15749966
 ] 

Matt McCline commented on HIVE-15335:
-

I have great difficulty accepting that DecimalColumnVector is now a public API. 
 Gunther will need to take that up with you.

I made quite a number of changes to HiveDecimal and HiveDecimalWritable.  Not 
just the internals but to the interfaces.  For example, current 
HiveDecimalWritable is very slow because it internally represents decimals as 
BigInteger binary bytes. It exposes the binary bytes through 
getInternalStorage().   I zapped that immediately.  The compatibility I 
designed for was serialization/deserialization of binary bits and text and 
decimal execution behavior -- not code compatibility.  Binary bit compatibility 
ensures ORC will be able to read/write the same information.  The 
TestHiveDecimal class verifies that the binary bit compatibility with 
SerializationUtils (ORC’s serialization), with BigInteger binary bit 
compatibility (LazyBinary, Avro, Parquet), and same behavior with 
OldHiveDecimal/OldHiveDecimalWritable (the original 
HiveDecimal/HiveDecimalWritable renamed).  I needed to be able to make major 
code changes (the core fast decimal implementation class is 9,000 lines) to get 
good performance with ORC serialization/deserialization of decimals and with 
all other decimal operations (except division/remainder).  Matching the 
semantics of Hive decimals and BigDecimal that execute quickly is quite 
challenging.

I need to be able to take a hammer to the code in the future to get good 
performance.  I've done some experimenting improving the performance of 
HiveChar/HiveVarchar and its writables.  Very little of the original code will 
survive -- just like with fast decimals.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15411) ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES

2016-12-14 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15411:
---
Attachment: HIVE-15411.1.patch

> ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES
> ---
>
> Key: HIVE-15411
> URL: https://issues.apache.org/jira/browse/HIVE-15411
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15411.1.patch
>
>
> Currently, {{ALTER TABLE ... ADD PARTITION}} only lets you set the 
> partition's LOCATION but not its FILEFORMAT or SERDEPROPERTIES. In order to 
> change the FILEFORMAT or SERDEPROPERTIES, you have to issue two additional 
> calls to {{ALTER TABLE ... PARTITION ... SET FILEFORMAT}} and {{ALTER TABLE 
> ... PARTITION ... SET SERDEPROPERTIES}}. This is not atomic, and queries that 
> interleave the ALTER TABLE commands may fail.
> We should extend the grammar to support setting FILEFORMAT and 
> SERDEPROPERTIES atomically as part of the ADD PARTITION command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15411) ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES

2016-12-14 Thread Anthony Hsu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15411:
---
Status: Patch Available  (was: Open)

Uploaded patch. Also created RB here: https://reviews.apache.org/r/54765/

> ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES
> ---
>
> Key: HIVE-15411
> URL: https://issues.apache.org/jira/browse/HIVE-15411
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15411.1.patch
>
>
> Currently, {{ALTER TABLE ... ADD PARTITION}} only lets you set the 
> partition's LOCATION but not its FILEFORMAT or SERDEPROPERTIES. In order to 
> change the FILEFORMAT or SERDEPROPERTIES, you have to issue two additional 
> calls to {{ALTER TABLE ... PARTITION ... SET FILEFORMAT}} and {{ALTER TABLE 
> ... PARTITION ... SET SERDEPROPERTIES}}. This is not atomic, and queries that 
> interleave the ALTER TABLE commands may fail.
> We should extend the grammar to support setting FILEFORMAT and 
> SERDEPROPERTIES atomically as part of the ADD PARTITION command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15383) Add additional info to 'desc function extended' output

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1574#comment-1574
 ] 

Hive QA commented on HIVE-15383:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843305/HIVE-15383.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10799 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=124)

[table_access_keys_stats.q,bucketmapjoin11.q,auto_join4.q,join34.q,nullgroup.q,mergejoins_mixed.q,sort.q,join_nullsafe.q,stats8.q,auto_join28.q,join17.q,union17.q,skewjoinopt11.q,groupby1_map.q,load_dyn_part11.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=92)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2582/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2582/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2582/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843305 - PreCommit-HIVE-Build

> Add additional info to 'desc function extended' output
> --
>
> Key: HIVE-15383
> URL: https://issues.apache.org/jira/browse/HIVE-15383
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15383.1.patch, HIVE-15383.2.patch, 
> HIVE-15383.3.patch
>
>
> Add additional info to the output to 'desc function extended'. The resources 
> would be helpful for the user to check which jars are referred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15335) Fast Decimal

2016-12-14 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749966#comment-15749966
 ] 

Matt McCline edited comment on HIVE-15335 at 12/15/16 1:08 AM:
---

I have great difficulty accepting that DecimalColumnVector is now a public API. 
 I haven't even begun to think of all the problems this will create.

I made quite a number of changes to HiveDecimal and HiveDecimalWritable.  Not 
just the internals but to the interfaces.  For example, current 
HiveDecimalWritable is very slow because it internally represents decimals as 
BigInteger binary bytes. It exposes the binary bytes through 
getInternalStorage().   I zapped that immediately.  The compatibility I 
designed for was serialization/deserialization of binary bits and text and 
decimal execution behavior -- not code compatibility.  Binary bit compatibility 
ensures ORC will be able to read/write the same information.  The 
TestHiveDecimal class verifies that the binary bit compatibility with 
SerializationUtils (ORC’s serialization), with BigInteger binary bit 
compatibility (LazyBinary, Avro, Parquet), and same behavior with 
OldHiveDecimal/OldHiveDecimalWritable (the original 
HiveDecimal/HiveDecimalWritable renamed).  I needed to be able to make major 
code changes (the core fast decimal implementation class is 9,000 lines) to get 
good performance with ORC serialization/deserialization of decimals and with 
all other decimal operations (except division/remainder).  Matching the 
semantics of Hive decimals and BigDecimal that execute quickly is quite 
challenging.

I need to be able to take a hammer to the code in the future to get good 
performance.  I've done some experimenting improving the performance of 
HiveChar/HiveVarchar and its writables.  Very little of the original code will 
survive -- just like with fast decimals.


was (Author: mmccline):
I have great difficulty accepting that DecimalColumnVector is now a public API. 
 Gunther will need to take that up with you.

I made quite a number of changes to HiveDecimal and HiveDecimalWritable.  Not 
just the internals but to the interfaces.  For example, current 
HiveDecimalWritable is very slow because it internally represents decimals as 
BigInteger binary bytes. It exposes the binary bytes through 
getInternalStorage().   I zapped that immediately.  The compatibility I 
designed for was serialization/deserialization of binary bits and text and 
decimal execution behavior -- not code compatibility.  Binary bit compatibility 
ensures ORC will be able to read/write the same information.  The 
TestHiveDecimal class verifies that the binary bit compatibility with 
SerializationUtils (ORC’s serialization), with BigInteger binary bit 
compatibility (LazyBinary, Avro, Parquet), and same behavior with 
OldHiveDecimal/OldHiveDecimalWritable (the original 
HiveDecimal/HiveDecimalWritable renamed).  I needed to be able to make major 
code changes (the core fast decimal implementation class is 9,000 lines) to get 
good performance with ORC serialization/deserialization of decimals and with 
all other decimal operations (except division/remainder).  Matching the 
semantics of Hive decimals and BigDecimal that execute quickly is quite 
challenging.

I need to be able to take a hammer to the code in the future to get good 
performance.  I've done some experimenting improving the performance of 
HiveChar/HiveVarchar and its writables.  Very little of the original code will 
survive -- just like with fast decimals.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination

2016-12-14 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15294:

Attachment: HIVE-15294.2.patch

> Capture additional metadata to replicate a simple insert at destination
> ---
>
> Key: HIVE-15294
> URL: https://issues.apache.org/jira/browse/HIVE-15294
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15294.1.patch, HIVE-15294.2.patch
>
>
> For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will 
> need to capture the newly added files in the notification message to be able 
> to replicate the event at destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination

2016-12-14 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-15294:

Attachment: HIVE-15294.2-nothrift.patch

> Capture additional metadata to replicate a simple insert at destination
> ---
>
> Key: HIVE-15294
> URL: https://issues.apache.org/jira/browse/HIVE-15294
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15294.1.patch, HIVE-15294.2-nothrift.patch, 
> HIVE-15294.2.patch
>
>
> For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will 
> need to capture the newly added files in the notification message to be able 
> to replicate the event at destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15411) ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES

2016-12-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750131#comment-15750131
 ] 

Hive QA commented on HIVE-15411:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843332/HIVE-15411.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10786 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=92)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2583/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2583/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2583/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843332 - PreCommit-HIVE-Build

> ADD PARTITION should support setting FILEFORMAT and SERDEPROPERTIES
> ---
>
> Key: HIVE-15411
> URL: https://issues.apache.org/jira/browse/HIVE-15411
> Project: Hive
>  Issue Type: Improvement
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15411.1.patch
>
>
> Currently, {{ALTER TABLE ... ADD PARTITION}} only lets you set the 
> partition's LOCATION but not its FILEFORMAT or SERDEPROPERTIES. In order to 
> change the FILEFORMAT or SERDEPROPERTIES, you have to issue two additional 
> calls to {{ALTER TABLE ... PARTITION ... SET FILEFORMAT}} and {{ALTER TABLE 
> ... PARTITION ... SET SERDEPROPERTIES}}. This is not atomic, and queries that 
> interleave the ALTER TABLE commands may fail.
> We should extend the grammar to support setting FILEFORMAT and 
> SERDEPROPERTIES atomically as part of the ADD PARTITION command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15395) Don't try to intern strings from empty map

2016-12-14 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15395:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. 

> Don't try to intern strings from empty map
> --
>
> Key: HIVE-15395
> URL: https://issues.apache.org/jira/browse/HIVE-15395
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-15395.patch
>
>
> Otherwise it unnecessarily create another map object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-14 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750452#comment-15750452
 ] 

Rui Li commented on HIVE-13278:
---

Sorry about the delay. I have a concern about using flag: it seems difficult to 
make it exhaustive and maintain. What about the solution Xuefu mentioned:
bq. Following your idea, can we first check if the mapwork ends a RS and use 
this to determine if reduce.xml is expected?
Will this be cleaner and more reliable?

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-14 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750533#comment-15750533
 ] 

Xuefu Zhang commented on HIVE-13278:


[~lirui], the concern is valid and shared, but on the other hand the current 
approach is simple and easy to understand. At least, the could be new cases 
where the problem may appear, but this doesn't make it worse and we don't 
expect too many such cases now or in the future.

Further thoughts?

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator

2016-12-14 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750688#comment-15750688
 ] 

Rajesh Balamohan commented on HIVE-15339:
-

Thank you for the comments [~jcamachorodriguez].

If the col stats are already present in cache, these calls would be very fast. 
Otherwise, it ends up fetching col stats per column and multiple queries to DB 
is slowing things down especially with larger set of partitions. I checked by 
running tpc-ds dataset and haven't observed any regression with this. I will 
check with more queries.

If it bails out in L167, it would be fine still as it would follow the old code 
path. That is, it wouldn't be worse than the current runtime.

> Batch metastore calls to get column stats for fields needed in 
> FilterSelectivityEstimator
> -
>
> Key: HIVE-15339
> URL: https://issues.apache.org/jira/browse/HIVE-15339
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch
>
>
> Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics 
> from metastore in multiple calls. For instance, in the following query, it 
> ends up getting individual column statistics for for flights multiple number 
> of times.
> When the table has large number of partitions, getting statistics for columns 
> via multiple calls can be very expensive. This would adversely impact the 
> overall compilation time. The following query took 14 seconds to compile.
> {noformat}
> SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
> YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
> FROM `flights` as `flights`
> JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
> JOIN `airports` as `source_airport` ON (`flights`.`origin` = 
> `source_airport`.`iata`)
> JOIN `airports` as `dest_airport` ON (`flights`.`dest` = 
> `dest_airport`.`iata`)
> GROUP BY YEAR(`flights`.`dateofflight`);
> {noformat}
> It may be helpful to club all columns that need statistics and fetch these 
> details in single remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-14 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750696#comment-15750696
 ] 

Rui Li commented on HIVE-13278:
---

Hi [~xuefuz], I just think it'll be even simpler to go the checking RS way - we 
can constrain the fix in just one method 
{{HiveOutputFormatImpl.checkOutputSpecs}}, rather than making changes to all 
these different tasks. Besides, with the flag it seems we're adding extra 
burden to ourselves to keep the logic consistent during plan generation.

On the other hand, if we decide to add the flag, I also have one suggestion. We 
can make {{has.map/reduce.work}} default to false. And we set them to true 
respectively in {{Utilities::setMapWork/setReduceWork}}. The logic behind this 
is if you haven't set a work with the JobConf, you shouldn't try to get one 
from it. Does this make sense?

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

83 matches

Mail list logo