You can write map that processes the entire table discarding
uninteresting rows, and the scheduler will make a best-effort attempt
to scheduling locality. You will want to set up rack awareness to
ensure this is as effective as possible.

But how big are these rows? Rows that are bigger than the Xmx of a VM
don't really work right now (see: 0.21 roadmap). And for isolated
queries, locality really doesnt buy you as much as you think it might.
Save maybe 0.1ms (ping time on a modern LAN) or less.

-ryan

On Tue, Aug 11, 2009 at 9:07 AM, Alex Spodinets<spodin...@gmail.com> wrote:
> I do know the row. I want MR job to be run on the closest server to where
> data is. So this MR job will process only data for this one row.
>
> Thanks,
> Alex.
>
> On Tue, Aug 11, 2009 at 6:50 PM, stack <st...@duboce.net> wrote:
>
>> On Tue, Aug 11, 2009 at 7:35 AM, Alex Spodinets <spodin...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> > Is it possible to run a Map\Reduce job for only one row in table? Thus
>> > skipping the unnecessary cycling through other rows by ignoring them
>> > manually or via "skip mode".
>> >
>>
>> The idea behind it is to use Map\Reduce more like an application server
>> with
>> > data location awareness vs batch\parallel processing system.
>> >
>>
>> Please add more description.  I'm having trouble understanding what you are
>> asking.
>>
>> + If you know the row you want, just ask hbase -- you don't have to go via
>> MR.
>> + MR is usually offline/batch operations but when you say things like
>> 'application server' I get the sense you are talking about real-time
>> lookups?
>>
>> Thanks,
>> St.Ack
>>
>

Reply via email to