Re: nested join issue

2015-06-13 Thread Gautam
Found that turning off hive.optimize.remove.identity.project ( ref:
HIVE-8435 https://issues.apache.org/jira/browse/HIVE-8435 ) fixes the
issue.

This gives us a workaround, but dunno the performance degradation this
impacts yet.

Thanks!
-Gautam.


On Fri, Jun 12, 2015 at 6:02 PM, Gautam gautamkows...@gmail.com wrote:

 Done. https://issues.apache.org/jira/browse/HIVE-10996

 On Fri, Jun 12, 2015 at 1:47 PM, Gopal Vijayaraghavan gop...@apache.org
 wrote:

 Hi

  Thanks for investigating..  Trying to locate the patch that fixes this
 between 1.1 and 2.0.0-SNAPSHOT. Any leads on what Jira this fix was part
 of? Or what part of the code the patch is likely to be on?

 git bisect is the only way usually to identify these things.

 But before you hunt into the patches I suggest trying combinations of
 constant propogation, null-scan and identity projection remover
 optimizations to see if there¹s a workaround in there.

 An explain of the query added to a new JIRA would be good, to continue the
 analysis.

 Cheers,
 Gopal





 --
 If you really want something in this life, you have to work for it. Now,
 quiet! They're about to announce the lottery numbers...




-- 
If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers...


Re: nested join issue

2015-06-12 Thread Gopal Vijayaraghavan
Hi

 Thanks for investigating..  Trying to locate the patch that fixes this
between 1.1 and 2.0.0-SNAPSHOT. Any leads on what Jira this fix was part
of? Or what part of the code the patch is likely to be on?

git bisect is the only way usually to identify these things.

But before you hunt into the patches I suggest trying combinations of
constant propogation, null-scan and identity projection remover
optimizations to see if there¹s a workaround in there.

An explain of the query added to a new JIRA would be good, to continue the
analysis.

Cheers,
Gopal




Re: nested join issue

2015-06-12 Thread Gautam
Done. https://issues.apache.org/jira/browse/HIVE-10996

On Fri, Jun 12, 2015 at 1:47 PM, Gopal Vijayaraghavan gop...@apache.org
wrote:

 Hi

  Thanks for investigating..  Trying to locate the patch that fixes this
 between 1.1 and 2.0.0-SNAPSHOT. Any leads on what Jira this fix was part
 of? Or what part of the code the patch is likely to be on?

 git bisect is the only way usually to identify these things.

 But before you hunt into the patches I suggest trying combinations of
 constant propogation, null-scan and identity projection remover
 optimizations to see if there¹s a workaround in there.

 An explain of the query added to a new JIRA would be good, to continue the
 analysis.

 Cheers,
 Gopal





-- 
If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers...


Re: nested join issue

2015-06-11 Thread Gautam
Thanks for investigating..  Trying to locate the patch that fixes this
between 1.1 and 2.0.0-SNAPSHOT. Any leads on what Jira this fix was part
of? Or what part of the code the patch is likely to be on?

-Gautam.

On Thu, Jun 11, 2015 at 8:35 PM, Gopal Vijayaraghavan gop...@apache.org
wrote:

 Hi,

  I'm running into a peculiar issue with nested joins and outer select. I
 see this error on 1.1.0 and 1.2.0 but not 0.13 which seems like a
 regression.
 ...
  create table events (s string, st2 string, n int, timestamp int);


 The issue does not seem to be happening in hive-2.0.0-SNAPSHOT, which
 means it has already been fixed  possibly can be backported easily to
 1.2.1.

 Your test-cases threw a parse-exception when run as-is - naming a column
 ³timestamp² will kill you when you upgrade to the next version.

 Cheers,
 Gopal





-- 
If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers...


nested join issue

2015-06-11 Thread Slava Markeyev
I'm running into a peculiar issue with nested joins and outer select. I see
this error on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression.

The following query produces no results:

select sfrom (
  select last.*, action.st2, action.n
  from (
select purchase.s, purchase.timestamp, max (mevt.timestamp) as
last_stage_timestamp
from (select * from purchase_history) purchase
join (select * from cart_history) mevt
on purchase.s = mevt.s
where purchase.timestamp  mevt.timestamp
group by purchase.s, purchase.timestamp
  ) last
  join (select * from events) action
  on last.s = action.s and last.last_stage_timestamp = action.timestamp
) list;

While this one does produce results

select *from (
  select last.*, action.st2, action.n
  from (
select purchase.s, purchase.timestamp, max (mevt.timestamp) as
last_stage_timestamp
from (select * from purchase_history) purchase
join (select * from cart_history) mevt
on purchase.s = mevt.s
where purchase.timestamp  mevt.timestamp
group by purchase.s, purchase.timestamp
  ) last
  join (select * from events) action
  on last.s = action.s and last.last_stage_timestamp = action.timestamp
) list;

1 21 20 Bob 1234
1 31 30 Bob 1234
3 51 50 Jeff 1234

The setup to test this is:

create table purchase_history (s string, product string, price double,
timestamp int);
insert into purchase_history values ('1', 'Belt', 20.00, 21);
insert into purchase_history values ('1', 'Socks', 3.50, 31);
insert into purchase_history values ('3', 'Belt', 20.00, 51);
insert into purchase_history values ('4', 'Shirt', 15.50, 59);

create table cart_history (s string, cart_id int, timestamp int);
insert into cart_history values ('1', 1, 10);
insert into cart_history values ('1', 2, 20);
insert into cart_history values ('1', 3, 30);
insert into cart_history values ('1', 4, 40);
insert into cart_history values ('3', 5, 50);
insert into cart_history values ('4', 6, 60);

create table events (s string, st2 string, n int, timestamp int);
insert into events values ('1', 'Bob', 1234, 20);
insert into events values ('1', 'Bob', 1234, 30);
insert into events values ('1', 'Bob', 1234, 25);
insert into events values ('2', 'Sam', 1234, 30);
insert into events values ('3', 'Jeff', 1234, 50);
insert into events values ('4', 'Ted', 1234, 60);

I realize select * and select s are not all that interesting in this
context but what lead me to this issue was select count(distinct s) was not
returning results. The above queries are the simplified queries that
produce the issue. I will note that if I convert the inner join to a table
and select from that the issue does not appear.

-- 

Slava Markeyev | Engineering | Upsight

Find me on LinkedIn http://www.linkedin.com/in/slavamarkeyev
http://www.linkedin.com/in/slavamarkeyev


Re: nested join issue

2015-06-11 Thread Gopal Vijayaraghavan
Hi,

 I'm running into a peculiar issue with nested joins and outer select. I
see this error on 1.1.0 and 1.2.0 but not 0.13 which seems like a
regression.
...
 create table events (s string, st2 string, n int, timestamp int);


The issue does not seem to be happening in hive-2.0.0-SNAPSHOT, which
means it has already been fixed  possibly can be backported easily to
1.2.1.

Your test-cases threw a parse-exception when run as-is - naming a column
³timestamp² will kill you when you upgrade to the next version.

Cheers,
Gopal