Hi, I am trying to write a hive query to find the equivalent of NOT IN / NOT
EXIST in SQL.
However, Hive does not support this. It does have Left Semi Join which serves
as IN but NOT does not support.
I am wondering if there is any way to do this without resorting to using left
outer join and
--- On Sun, 5/5/13, Peter Chu pete@outlook.com wrote:
I am wondering if there is any way to do this without resorting to
using left outer join and finding nulls.
I have found this to be an acceptable substitute. Is it not working for you?
It works but it takes a very long time because the subqueries in NOT IN
contains 400 million rows (the message table in the example) and the feed table
contains 3 million rows.
SELECT uuid from feed f WHERE f.uuid NOT IN (SELECT uuid FROM message);
Date: Sun, 5 May 2013 20:25:15 -0700
From:
@Peter Does the query plan demonstrate that the 3Meg row table is being
map-joined and the 400M table streamed through? That is what you want: but
you might either need to fiddle with hints to get it to happen
Details:
Read uuids s of feed into in-memory map on all nodes (mapjoin)
Thanks, Stephen,
I do not quite understand what you mean by Stream, specifically Stream the
400M message records through the in-memory maps.Can you please elaborate.
Also, can you use MAPJOIN on left outer join?
Peter
Date: Sun, 5 May 2013 21:44:37 -0700
Subject: Re: Hive QL - NOT IN, NOT EXIST
Hi John,
This is a mistake in the release notes. It will be fixed in the next 0.11
release candidate.
Thanks.
Carl
On Sat, May 4, 2013 at 6:18 AM, John Omernik j...@omernik.com wrote:
I see in the release notes for HIVE -3979
[HIVE-3979 https://issues.apache.org/jira/browse/HIVE-3979] -
I took a quick look. Here's a list of things I noticed:
* NOTICE
** Copyright is out of date.
* README.txt
** Add one more '=' to the title underline.
** Requirements section only lists Hadoop 0.20.x
** Will it work with Java 1.7?
* RELEASE_NOTES.txt
** The 0.11 section lists tickets that