Yes. Ayan, you approach will work.

Or alternatively, use Spark, and write a Scala/Java function which
implements similar logic in your Pig UDF.

Both approaches look similar.

Personally, I would go with Spark solution, it will be slightly faster, and
easier if you already have Spark cluster setup on top of your hadoop
cluster in your infrastructure.

Cheers,
Tao


On Thu, Sep 3, 2015 at 1:15 AM, ayan guha <guha.a...@gmail.com> wrote:

> Thanks for your info. I am planning to implement a pig udf to do record
> look ups. Kindly let me know if this is a good idea.
>
> Best
> Ayan
>
> On Thu, Sep 3, 2015 at 2:55 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>>
>> You may check if it makes sense to write a coprocessor doing an upsert
>> for you, if it does not exist already. Maybe phoenix for Hbase supports
>> this already.
>>
>> Another alternative, if the records do not have an unique Id, is to put
>> them into a text index engine, such as Solr or Elasticsearch, which does in
>> this case a fast matching with relevancy scores.
>>
>>
>> You can use also Spark and Pig there. However, I am not sure if Spark is
>> suitable for these one row lookups. Same holds for Pig.
>>
>>
>> Le mer. 2 sept. 2015 à 23:53, ayan guha <guha.a...@gmail.com> a écrit :
>>
>> Hello group
>>
>> I am trying to use pig or spark in order to achieve following:
>>
>> 1. Write a batch process which will read from a file
>> 2. Lookup hbase to see if the record exists. If so then need to compare
>> incoming values with hbase and update fields which do not match. Else
>> create a new record.
>>
>> My questions:
>> 1. Is this a good use case for pig or spark?
>> 2. Is there any way to read hbase for each incoming record in pig without
>> writing map reduce code?
>> 3. In case of spark I think we have to connect to hbase for every record.
>> Is thr any other way?
>> 4. What is the best connector for hbase which gives this functionality?
>>
>> Best
>>
>> Ayan
>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>



-- 
------------------------------------------------
Thanks!
Tao

Reply via email to