I believe you meant,

SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid
WHERE LOG2.recordid is null. (this would produce set of records in LOG1 and
which are not present in LOG2).

In PIG, we have to add additional filter with "is null" condition.

~Rajesh.B

On Mon, Jun 27, 2011 at 6:34 AM, Bharath Mundlapudi
<bharathw...@yahoo.com>wrote:

> SQL:
>
> SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid;
>
>
> PIG:
> data = JOIN LOG1 BY recordid LEFT OUTER, LOG2 BY recordid;
> DUMP data;
>
>
> If you need more PIG help, please post in PIG email alias.
>
> -Bharath
>
>
> ________________________________
> From: Mark Kerzner <markkerz...@gmail.com>
> To: common-user@hadoop.apache.org; Bharath Mundlapudi <
> bharathw...@yahoo.com>
> Sent: Sunday, June 26, 2011 5:50 PM
> Subject: Re: Comparing two logs, finding missing records
>
>
> Bharath,
>
> how would a Pig query look like?
>
> Thank you,
> Mark
>
>
> On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi <bharathw...@yahoo.com>
> wrote:
>
> If you have Serde or PigLoader for your log format, probably Pig or Hive
> will be a quicker solution with the join.
> >
> >-Bharath
> >
> >
> >
> >________________________________
> >From: Mark Kerzner <markkerz...@gmail.com>
> >To: Hadoop Discussion Group <core-u...@hadoop.apache.org>
> >Sent: Saturday, June 25, 2011 9:39 PM
> >Subject: Comparing two logs, finding missing records
> >
> >
> >Hi,
> >
> >I have two logs which should have all the records for the same record_id,
> in
> >other words, if this record_id is found in the first log, it should also
> be
> >found in the second one. However, I suspect that the second log is
> filtered
> >out, and I need to find the missing records. Anything is allowed:
> MapReduce
> >job, Hive, Pig, and even a NoSQL database.
> >
> >Thank you.
> >
> >It is also a good time to express my thanks to all the members of the
> group
> >who are always very helpful.
> >
> >Sincerely,
> >Mark
>



-- 
~Rajesh.B

Reply via email to