It appears that method only returns the server at some index in the array. Is there not any way to find what server is responsible for a vbucket?
On Thursday, March 13, 2014 12:57:29 AM UTC-4, Corey Nolet wrote: > > Hello, > > I'm looking through the source code on github for the couchbase hadoop > connector. If I'm understanding correctly, the code that generates the > splits takes all the possible VBuckets and breaks them up into groups based > on the expected number of mappers set by Sqoop. This means that no matter > what, even if a mapper is scheduled on a couchbase node, the reads from the > dump are ALWAYS going to be sent over the network instead of possibly > pulled from the local node's memory and just funneled into the mapper > sitting on that local node. > > Looking further into the code in the java couchbase client, I'm seeing a > class called "VBucketNodeLocator" which has a method getServerByIndex(int > k). If I understand this method, it's allowing me to look up the server > that holds the vbucket number k. Is this correct? If it is correct, would > it make sense for this to be used in the getSplits() method in the > CouchbaseInputFormat so that the splits for the vbuckets can be grouped by > the server in which they live? I agree that it may not make sense for many > who have their couchbase cluster separate from their hadoop cluster.. but > it's a SIGNIFICANT optimization for those who have the two co-located. > > Any thoughts? > > > Thanks! > > -- You received this message because you are subscribed to the Google Groups "Couchbase" group. To unsubscribe from this group and stop receiving emails from it, send an email to couchbase+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.