[jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query
[ https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1173: - Status: Open (was: Patch Available) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query Key: HIVE-1173 URL: https://issues.apache.org/jira/browse/HIVE-1173 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1, 0.4.0, 0.10.0 Reporter: Vladimir Klimontovich Assignee: Navis Brief description: case 1) non-deterministic present in partition condition, joins are present in query = partition pruner doesn't do filtering of partitions based on condition case 2) non-deterministic present in partition condition, joins aren't present in query = partition pruner do filtering of partitions based on condition It's quite illogical when pruning depends on presence of joins in query. Example: Let's consider following sequence of hive queries: 1) Create non-deterministic function: create temporary function UDF2 as 'UDF2'; {{ import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.udf.UDFType; @UDFType(deterministic=false) public class UDF2 extends UDF { public String evaluate(String val) { return val; } } }} 2) Create tables CREATE TABLE Main ( a STRING, b INT ) PARTITIONED BY(part STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE; ALTER TABLE Main ADD PARTITION (part=part1) LOCATION /hive-join-test/part1/; ALTER TABLE Main ADD PARTITION (part=part2) LOCATION /hive-join-test/part2/; CREATE TABLE Joined ( a STRING, f STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/hive-join-test/join/'; 3) Run first query: select m.a, m.b from Main m where part UDF2('part0') AND part = 'part1'; The pruner will work for this query: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1 4) Run second query (with join): select m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; Pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join 5) Also lets try to run query with MAPJOIN hint select /*+MAPJOIN(j)*/ m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; The result is the same, pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query
[ https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-1173: Status: Patch Available (was: Open) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query Key: HIVE-1173 URL: https://issues.apache.org/jira/browse/HIVE-1173 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1, 0.4.0, 0.10.0 Reporter: Vladimir Klimontovich Assignee: Navis Brief description: case 1) non-deterministic present in partition condition, joins are present in query = partition pruner doesn't do filtering of partitions based on condition case 2) non-deterministic present in partition condition, joins aren't present in query = partition pruner do filtering of partitions based on condition It's quite illogical when pruning depends on presence of joins in query. Example: Let's consider following sequence of hive queries: 1) Create non-deterministic function: create temporary function UDF2 as 'UDF2'; {{ import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.udf.UDFType; @UDFType(deterministic=false) public class UDF2 extends UDF { public String evaluate(String val) { return val; } } }} 2) Create tables CREATE TABLE Main ( a STRING, b INT ) PARTITIONED BY(part STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE; ALTER TABLE Main ADD PARTITION (part=part1) LOCATION /hive-join-test/part1/; ALTER TABLE Main ADD PARTITION (part=part2) LOCATION /hive-join-test/part2/; CREATE TABLE Joined ( a STRING, f STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/hive-join-test/join/'; 3) Run first query: select m.a, m.b from Main m where part UDF2('part0') AND part = 'part1'; The pruner will work for this query: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1 4) Run second query (with join): select m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; Pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join 5) Also lets try to run query with MAPJOIN hint select /*+MAPJOIN(j)*/ m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; The result is the same, pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query
[ https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1173: - Status: Open (was: Patch Available) comments on phabricator Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query Key: HIVE-1173 URL: https://issues.apache.org/jira/browse/HIVE-1173 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1, 0.4.0, 0.10.0 Reporter: Vladimir Klimontovich Assignee: Navis Brief description: case 1) non-deterministic present in partition condition, joins are present in query = partition pruner doesn't do filtering of partitions based on condition case 2) non-deterministic present in partition condition, joins aren't present in query = partition pruner do filtering of partitions based on condition It's quite illogical when pruning depends on presence of joins in query. Example: Let's consider following sequence of hive queries: 1) Create non-deterministic function: create temporary function UDF2 as 'UDF2'; {{ import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.udf.UDFType; @UDFType(deterministic=false) public class UDF2 extends UDF { public String evaluate(String val) { return val; } } }} 2) Create tables CREATE TABLE Main ( a STRING, b INT ) PARTITIONED BY(part STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE; ALTER TABLE Main ADD PARTITION (part=part1) LOCATION /hive-join-test/part1/; ALTER TABLE Main ADD PARTITION (part=part2) LOCATION /hive-join-test/part2/; CREATE TABLE Joined ( a STRING, f STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/hive-join-test/join/'; 3) Run first query: select m.a, m.b from Main m where part UDF2('part0') AND part = 'part1'; The pruner will work for this query: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1 4) Run second query (with join): select m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; Pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join 5) Also lets try to run query with MAPJOIN hint select /*+MAPJOIN(j)*/ m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; The result is the same, pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query
[ https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-1173: Status: Patch Available (was: Open) Passed all tests Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query Key: HIVE-1173 URL: https://issues.apache.org/jira/browse/HIVE-1173 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1, 0.4.0, 0.10.0 Reporter: Vladimir Klimontovich Assignee: Navis Brief description: case 1) non-deterministic present in partition condition, joins are present in query = partition pruner doesn't do filtering of partitions based on condition case 2) non-deterministic present in partition condition, joins aren't present in query = partition pruner do filtering of partitions based on condition It's quite illogical when pruning depends on presence of joins in query. Example: Let's consider following sequence of hive queries: 1) Create non-deterministic function: create temporary function UDF2 as 'UDF2'; {{ import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.udf.UDFType; @UDFType(deterministic=false) public class UDF2 extends UDF { public String evaluate(String val) { return val; } } }} 2) Create tables CREATE TABLE Main ( a STRING, b INT ) PARTITIONED BY(part STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE; ALTER TABLE Main ADD PARTITION (part=part1) LOCATION /hive-join-test/part1/; ALTER TABLE Main ADD PARTITION (part=part2) LOCATION /hive-join-test/part2/; CREATE TABLE Joined ( a STRING, f STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/hive-join-test/join/'; 3) Run first query: select m.a, m.b from Main m where part UDF2('part0') AND part = 'part1'; The pruner will work for this query: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1 4) Run second query (with join): select m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; Pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join 5) Also lets try to run query with MAPJOIN hint select /*+MAPJOIN(j)*/ m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; The result is the same, pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query
[ https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-1173: Affects Version/s: 0.10.0 Status: Open (was: Patch Available) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query Key: HIVE-1173 URL: https://issues.apache.org/jira/browse/HIVE-1173 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1, 0.4.0, 0.10.0 Reporter: Vladimir Klimontovich Assignee: Navis Brief description: case 1) non-deterministic present in partition condition, joins are present in query = partition pruner doesn't do filtering of partitions based on condition case 2) non-deterministic present in partition condition, joins aren't present in query = partition pruner do filtering of partitions based on condition It's quite illogical when pruning depends on presence of joins in query. Example: Let's consider following sequence of hive queries: 1) Create non-deterministic function: create temporary function UDF2 as 'UDF2'; {{ import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.udf.UDFType; @UDFType(deterministic=false) public class UDF2 extends UDF { public String evaluate(String val) { return val; } } }} 2) Create tables CREATE TABLE Main ( a STRING, b INT ) PARTITIONED BY(part STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE; ALTER TABLE Main ADD PARTITION (part=part1) LOCATION /hive-join-test/part1/; ALTER TABLE Main ADD PARTITION (part=part2) LOCATION /hive-join-test/part2/; CREATE TABLE Joined ( a STRING, f STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/hive-join-test/join/'; 3) Run first query: select m.a, m.b from Main m where part UDF2('part0') AND part = 'part1'; The pruner will work for this query: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1 4) Run second query (with join): select m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; Pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join 5) Also lets try to run query with MAPJOIN hint select /*+MAPJOIN(j)*/ m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; The result is the same, pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query
[ https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-1173: Status: Patch Available (was: Open) https://reviews.facebook.net/D4503 Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query Key: HIVE-1173 URL: https://issues.apache.org/jira/browse/HIVE-1173 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1, 0.4.0 Reporter: Vladimir Klimontovich Assignee: Navis Brief description: case 1) non-deterministic present in partition condition, joins are present in query = partition pruner doesn't do filtering of partitions based on condition case 2) non-deterministic present in partition condition, joins aren't present in query = partition pruner do filtering of partitions based on condition It's quite illogical when pruning depends on presence of joins in query. Example: Let's consider following sequence of hive queries: 1) Create non-deterministic function: create temporary function UDF2 as 'UDF2'; {{ import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.udf.UDFType; @UDFType(deterministic=false) public class UDF2 extends UDF { public String evaluate(String val) { return val; } } }} 2) Create tables CREATE TABLE Main ( a STRING, b INT ) PARTITIONED BY(part STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE; ALTER TABLE Main ADD PARTITION (part=part1) LOCATION /hive-join-test/part1/; ALTER TABLE Main ADD PARTITION (part=part2) LOCATION /hive-join-test/part2/; CREATE TABLE Joined ( a STRING, f STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/hive-join-test/join/'; 3) Run first query: select m.a, m.b from Main m where part UDF2('part0') AND part = 'part1'; The pruner will work for this query: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1 4) Run second query (with join): select m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; Pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join 5) Also lets try to run query with MAPJOIN hint select /*+MAPJOIN(j)*/ m.a, j.a, m.b from Main m join Joined j on j.a=m.a where part UDF2('part0') AND part = 'part1'; The result is the same, pruner doesn't work: mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira