Re: Couchbase Sqoop Data Locality question

Corey Nolet Thu, 13 Mar 2014 04:43:20 -0700

It appears that method only returns the server at some index in the array. 
Is there not any way to find what server is responsible for a vbucket?


On Thursday, March 13, 2014 12:57:29 AM UTC-4, Corey Nolet wrote:
>
> Hello,
>
> I'm looking through the source code on github for the couchbase hadoop 
> connector. If I'm understanding correctly, the code that generates the 
> splits takes all the possible VBuckets and breaks them up into groups based 
> on the expected number of mappers set by Sqoop. This means that no matter 
> what, even if a mapper is scheduled on a couchbase node, the reads from the 
> dump are ALWAYS going to be sent over the network instead of possibly 
> pulled from the local node's memory and just funneled into the mapper 
> sitting on that local node.
>
> Looking further into the code in the java couchbase client, I'm seeing a 
> class called "VBucketNodeLocator" which has a method getServerByIndex(int 
> k). If I understand this method, it's allowing me to look up the server 
> that holds the vbucket number k. Is this correct?  If it is correct, would 
> it make sense for this to be used in the getSplits() method in the 
> CouchbaseInputFormat so that the splits for the vbuckets can be grouped by 
> the server in which they live? I agree that it may not make sense for many 
> who have their couchbase cluster separate from their hadoop cluster.. but 
> it's a SIGNIFICANT optimization for those who have the two co-located.
>
> Any thoughts?
>
>
> Thanks!
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Couchbase Sqoop Data Locality question

Reply via email to