Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Peter Chu
Hi, I am trying to write a hive query to find the equivalent of NOT IN / NOT EXIST in SQL. However, Hive does not support this. It does have Left Semi Join which serves as IN but NOT does not support. I am wondering if there is any way to do this without resorting to using left outer join and

Re: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Michael Malak
--- On Sun, 5/5/13, Peter Chu pete@outlook.com wrote: I am wondering if there is any way to do this without resorting to using left outer join and finding nulls. I have found this to be an acceptable substitute. Is it not working for you?

RE: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Peter Chu
It works but it takes a very long time because the subqueries in NOT IN contains 400 million rows (the message table in the example) and the feed table contains 3 million rows. SELECT uuid from feed f WHERE f.uuid NOT IN (SELECT uuid FROM message); Date: Sun, 5 May 2013 20:25:15 -0700 From:

Re: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Stephen Boesch
@Peter Does the query plan demonstrate that the 3Meg row table is being map-joined and the 400M table streamed through? That is what you want: but you might either need to fiddle with hints to get it to happen Details: Read uuids s of feed into in-memory map on all nodes (mapjoin)

RE: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Peter Chu
Thanks, Stephen, I do not quite understand what you mean by Stream, specifically Stream the 400M message records through the in-memory maps.Can you please elaborate. Also, can you use MAPJOIN on left outer join? Peter Date: Sun, 5 May 2013 21:44:37 -0700 Subject: Re: Hive QL - NOT IN, NOT EXIST

Re: HIVE-3979 in Hive 0.11

2013-05-05 Thread Carl Steinbach
Hi John, This is a mistake in the release notes. It will be fixed in the next 0.11 release candidate. Thanks. Carl On Sat, May 4, 2013 at 6:18 AM, John Omernik j...@omernik.com wrote: I see in the release notes for HIVE -3979 [HIVE-3979 https://issues.apache.org/jira/browse/HIVE-3979] -

Re: [VOTE] Apache Hive 0.11.0 Release Candidate 1

2013-05-05 Thread Carl Steinbach
I took a quick look. Here's a list of things I noticed: * NOTICE ** Copyright is out of date. * README.txt ** Add one more '=' to the title underline. ** Requirements section only lists Hadoop 0.20.x ** Will it work with Java 1.7? * RELEASE_NOTES.txt ** The 0.11 section lists tickets that