[jira] [Commented] (HIVE-21327) Predicate is not pushed to Parquet if hive.parquet.timestamp.skip.conversion=true

Hive QA (JIRA) Thu, 28 Feb 2019 19:14:10 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-21327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781231#comment-16781231
 ]


Hive QA commented on HIVE-21327:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12960627/HIVE-21327.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15824 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamptz_2] 
(batchId=86)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16301/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16301/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16301/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12960627 - PreCommit-HIVE-Build

> Predicate is not pushed to Parquet if 
> hive.parquet.timestamp.skip.conversion=true
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-21327
>                 URL: https://issues.apache.org/jira/browse/HIVE-21327
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>         Attachments: HIVE-21327.1.patch
>
>
> The Parquet FilterPredicate is created and set to the configuration in the 
> ParquetRecordReaderBase.setFilter method. This method is used from the 
> ParquetRecordReaderWrapper constructor through the 
> ParquetRecordReaderBase.getSplit method and expects a JobConf as parameter 
> where it sets the created filter predicate. In the ParquetRecordReaderWrapper 
> constructor, multiple JobConf object is used:
> {noformat}
>     jobConf = oldJobConf;
>     final ParquetInputSplit split = getSplit(oldSplit, jobConf);
>     TaskAttemptID taskAttemptID = 
> TaskAttemptID.forName(jobConf.get(IOConstants.MAPRED_TASK_ID));
>     if (taskAttemptID == null) {
>       taskAttemptID = new TaskAttemptID();
>     }
>     // create a TaskInputOutputContext
>     Configuration conf = jobConf;
>     if (skipTimestampConversion ^ HiveConf.getBoolVar(
>         conf, HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION)) {
>       conf = new JobConf(oldJobConf);
>       HiveConf.setBoolVar(conf,
>         HiveConf.ConfVars.HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION, 
> skipTimestampConversion);
>     }
>     final TaskAttemptContext taskContext = 
> ContextUtil.newTaskAttemptContext(conf, taskAttemptID);
> {noformat}
> So we have the jobConf, oldJobConf and conf objects and the getSplit is 
> called with the jobConf object, so the filter predicate will be set into this 
> config object. Based on this code part, the jobConf and oldJobConf should be 
> the same reference inside the if statement, so the newly created conf should 
> also contain the filter predicate. However in the getSplit method the value 
> of the jobConf is changed by the projectionPusher.pushProjectionsAndFilters 
> method, so inside the if statement, the jobConf and the oldJobConf are 
> actually different references. The filter predicate is set in the jobConf, 
> but if the if condition is true, the conf will be created from the oldJobConf 
> so it won't contain the filter predicate.
> Just for reference, this behavior was introduced in 
> [HIVE-9873|https://issues.apache.org/jira/browse/HIVE-9873]. 
> Since the goal of the if statement is only to update the 
> HIVE_PARQUET_TIMESTAMP_SKIP_CONVERSION property in the configuration, it 
> should be using the jobConf where the filter predicate is correctly set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21327) Predicate is not pushed to Parquet if hive.parquet.timestamp.skip.conversion=true

Reply via email to