Searching my computer, I find Namit quoting: "ansi sql semantics are that the filter is executed after the join."
So there u go .. In the same mail he suggested putting the filter condition for the table inside the ON clause for execution before the join. So I guess u might want to try: SELECT communications.sender_id, conversions.actor_id, communications.version, conversions.reference_id FROM communications LEFT OUTER JOIN conversions ON (communications.reference_id = conversions.reference_id AND conversions.application='reference') WHERE communications.hour >= 343013 AND communications.hour < 343014 ________________________________ From: Josh Ferguson [mailto:j...@besquared.net] Sent: Monday, February 16, 2009 10:53 PM To: hive-user@hadoop.apache.org Subject: Re: Yet another join issue The thing I want the WHERE clause to do here is filter the two sets *before* I join them not after. This is what Prasad's query was doing unfortunately it wasn't working properly inside of a SELECT TRANSFORM FROM(...) block. Anyone know the SQL standard involving WHERE clauses and joins? Josh Ferguson On Feb 16, 2009, at 10:28 PM, Zheng Shao wrote: The following will also work. In the case of left outer join, the value of columns from the missing table will be NULL, so checking whether the value IS NULL should solve the problem. SELECT communications.sender_id, conversions.actor_id, communications.version, conversions.reference_id FROM communications LEFT OUTER JOIN conversions ON communications.reference_id = conversions.reference_id WHERE communications.hour >= 343013 AND communications.hour < 343014 AND ( conversions.application = 'reference' OR conversions.application IS NULL); I am not sure whether SQL standard says something different from what Hive implements. It will be great if you could point us to some references. Zheng On Mon, Feb 16, 2009 at 10:21 PM, Josh Ferguson <j...@besquared.net<mailto:j...@besquared.net>> wrote: Will this work if this select statement is already nested in a FROM? Josh Ferguson On Feb 16, 2009, at 10:14 PM, Prasad Chakka wrote: Try below, SELECT a.sender_id, b.actor_id, a.version, b.reference_id FROM communications a LEFT OUTER JOIN (select conversions.actor_id, conversions.reference_id from conversions where conversions.application='reference') b ON a.reference_id = b.reference_id WHERE a.hour >= 343013 AND a.hour < 343014 ________________________________ From: Josh Ferguson <j...@besquared.net<http://j...@besquared.net>> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>> Date: Mon, 16 Feb 2009 22:04:14 -0800 To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>> Subject: Yet another join issue Ok so this one I just ran into. It appears when doing a join of two tables that, when using a LEFT OUTER JOIN and specifying a WHERE condition on both tables, the resulting join is an INNER JOIN. Example follows: LEFT OUTER JOIN with WHERE condition on the LEFT table SELECT communications.sender_id, conversions.actor_id, communications.version, conversions.reference_id FROM communications LEFT OUTER JOIN conversions ON communications.reference_id = conversions.reference_id WHERE communications.hour >= 343013 AND communications.hour < 343014 1 NULL A NULL 1 2 A 2 2 NULL A NULL 3 NULL B NULL correct results, as expected LEFT OUTER JOIN with WHERE conditions on both tables SELECT communications.sender_id, conversions.actor_id, communications.version, conversions.reference_id FROM communications LEFT OUTER JOIN conversions ON communications.reference_id = conversions.reference_id WHERE communications.hour >= 343013 AND communications.hour < 343014 AND conversions.application = 'reference' 1 2 A 2 incorrect results, it simply displays the row that was matched in both tables. Any ideas what's going on here? Is this intended? How can I get a LEFT OUTER JOIN and put conditions on both tables? Josh Ferguson -- Yours, Zheng