Re: nested join issue
Found that turning off hive.optimize.remove.identity.project ( ref: HIVE-8435 https://issues.apache.org/jira/browse/HIVE-8435 ) fixes the issue. This gives us a workaround, but dunno the performance degradation this impacts yet. Thanks! -Gautam. On Fri, Jun 12, 2015 at 6:02 PM, Gautam gautamkows...@gmail.com wrote: Done. https://issues.apache.org/jira/browse/HIVE-10996 On Fri, Jun 12, 2015 at 1:47 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi Thanks for investigating.. Trying to locate the patch that fixes this between 1.1 and 2.0.0-SNAPSHOT. Any leads on what Jira this fix was part of? Or what part of the code the patch is likely to be on? git bisect is the only way usually to identify these things. But before you hunt into the patches I suggest trying combinations of constant propogation, null-scan and identity projection remover optimizations to see if there¹s a workaround in there. An explain of the query added to a new JIRA would be good, to continue the analysis. Cheers, Gopal -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers... -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers...
Re: nested join issue
Hi Thanks for investigating.. Trying to locate the patch that fixes this between 1.1 and 2.0.0-SNAPSHOT. Any leads on what Jira this fix was part of? Or what part of the code the patch is likely to be on? git bisect is the only way usually to identify these things. But before you hunt into the patches I suggest trying combinations of constant propogation, null-scan and identity projection remover optimizations to see if there¹s a workaround in there. An explain of the query added to a new JIRA would be good, to continue the analysis. Cheers, Gopal
Re: nested join issue
Done. https://issues.apache.org/jira/browse/HIVE-10996 On Fri, Jun 12, 2015 at 1:47 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi Thanks for investigating.. Trying to locate the patch that fixes this between 1.1 and 2.0.0-SNAPSHOT. Any leads on what Jira this fix was part of? Or what part of the code the patch is likely to be on? git bisect is the only way usually to identify these things. But before you hunt into the patches I suggest trying combinations of constant propogation, null-scan and identity projection remover optimizations to see if there¹s a workaround in there. An explain of the query added to a new JIRA would be good, to continue the analysis. Cheers, Gopal -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers...
Re: nested join issue
Thanks for investigating.. Trying to locate the patch that fixes this between 1.1 and 2.0.0-SNAPSHOT. Any leads on what Jira this fix was part of? Or what part of the code the patch is likely to be on? -Gautam. On Thu, Jun 11, 2015 at 8:35 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, I'm running into a peculiar issue with nested joins and outer select. I see this error on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. ... create table events (s string, st2 string, n int, timestamp int); The issue does not seem to be happening in hive-2.0.0-SNAPSHOT, which means it has already been fixed possibly can be backported easily to 1.2.1. Your test-cases threw a parse-exception when run as-is - naming a column ³timestamp² will kill you when you upgrade to the next version. Cheers, Gopal -- If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers...
nested join issue
I'm running into a peculiar issue with nested joins and outer select. I see this error on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query produces no results: select sfrom ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; While this one does produce results select *from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff 1234 The setup to test this is: create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); I realize select * and select s are not all that interesting in this context but what lead me to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. -- Slava Markeyev | Engineering | Upsight Find me on LinkedIn http://www.linkedin.com/in/slavamarkeyev http://www.linkedin.com/in/slavamarkeyev
Re: nested join issue
Hi, I'm running into a peculiar issue with nested joins and outer select. I see this error on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. ... create table events (s string, st2 string, n int, timestamp int); The issue does not seem to be happening in hive-2.0.0-SNAPSHOT, which means it has already been fixed possibly can be backported easily to 1.2.1. Your test-cases threw a parse-exception when run as-is - naming a column ³timestamp² will kill you when you upgrade to the next version. Cheers, Gopal