Partition pruner cancels pruning if non-deterministic function present in 
filtering expression only in joins is present in query
--------------------------------------------------------------------------------------------------------------------------------

                 Key: HIVE-1173
                 URL: https://issues.apache.org/jira/browse/HIVE-1173
             Project: Hadoop Hive
          Issue Type: Bug
    Affects Versions: 0.4.1, 0.4.0
            Reporter: Vladimir Klimontovich


Brief description:

case 1) non-deterministic present in partition condition, joins are present in 
query => partition pruner doesn't do filtering of partitions based on condition
case 2) non-deterministic present in partition condition, joins aren't present 
in query => partition pruner do filtering of partitions based on condition
It's quite illogical when pruning depends on presence of joins in query.

Example:

Let's consider following sequence of hive queries:

1) Create non-deterministic function:

create temporary function UDF2 as 'UDF2';
{{
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;

@UDFType(deterministic=false)
        public class UDF2 extends UDF {
                public String evaluate(String val) {
                        return val;
                }
        }
}}

2) Create tables

CREATE TABLE Main (
        a STRING,
        b INT
)

PARTITIONED BY(part STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '10'
STORED AS TEXTFILE;

ALTER TABLE Main ADD PARTITION (part="part1") LOCATION "/hive-join-test/part1/";
ALTER TABLE Main ADD PARTITION (part="part2") LOCATION "/hive-join-test/part2/";

CREATE TABLE Joined (
        a STRING,
        f STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '10'
STORED AS TEXTFILE
LOCATION '/hive-join-test/join/';

3) Run first query:

select 
        m.a,
        m.b
from Main m
where
        part > UDF2('part0') AND part = 'part1';

The pruner will work for this query: 
mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1

4) Run second query (with join):

select 
        m.a,
        j.a,
        m.b
from Main m
join Joined j on
        j.a=m.a
where
        part > UDF2('part0') AND part = 'part1';

Pruner doesn't work: 
mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join

5) Also lets try to run query with MAPJOIN hint

select /*+MAPJOIN(j)*/ 
        m.a,
        j.a,
        m.b
from Main m
join Joined j on
        j.a=m.a
where
        part > UDF2('part0') AND part = 'part1';

The result is the same, pruner doesn't work: 
mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to