What's the status on this package? Is it mature enough? I am using it in my project, tried out the write method yesterday and going to incorporate into read method tomorrow.
On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <alex.barano...@gmail.com>wrote: > > The start/end rows may be written twice. > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in > attribute value no matter how long are they (keys), since we already OK > with > transferring them initially (i.e. we should be OK with transferring 2x > times > more). > > So, what about the suggestion of sourceScan attribute value I mentioned? If > you can tell why it isn't sufficient in your case, I'd have more info to > think about better suggestion ;) > > > It is Okay to keep all versions of your patch in the JIRA. > > Maybe the second should be named HBASE-3811-v2.patch< > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > >? > > np. Can do that. Just thought that they (patches) can be sorted by date to > find out the final one (aka "convention over naming-rules"). > > Alex. > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > >> Though it might be ok, since we anyways "transfer" start/stop rows > with > > Scan object. > > In write() method, we now have: > > Bytes.writeByteArray(out, this.startRow); > > Bytes.writeByteArray(out, this.stopRow); > > ... > > for (Map.Entry<String, byte[]> attr : this.attributes.entrySet()) { > > WritableUtils.writeString(out, attr.getKey()); > > Bytes.writeByteArray(out, attr.getValue()); > > } > > The start/end rows may be written twice. > > > > Of course, you have full control over how to generate the unique ID for > > "sourceScan" attribute. > > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the > second > > should be named HBASE-3811-v2.patch< > https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch > >? > > > > Thanks > > > > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <alex.barano...@gmail.com > >wrote: > > > >> > Can you remove the first version ? > >> Isn't it ok to keep it in JIRA issue? > >> > >> > >> > In HBaseWD, can you use reflection to detect whether Scan supports > >> setAttribute() ? > >> > If it does, can you encode start row and end row as "sourceScan" > >> attribute ? > >> > >> Yeah, smth like this is going to be implemented. Though I'd still want > to > >> hear from the devs the story about Scan version. > >> > >> > >> > One consideration is that start row or end row may be quite long. > >> > >> Yeah, that is was my though too at first. Though it might be ok, since > we > >> anyways "transfer" start/stop rows with Scan object. > >> > >> > What do you think ? > >> > >> I'd love to hear from you is this variant I mentioned is what we are > >> looking at here: > >> > >> > >> > From what I understand, you want to distinguish scans fired by the > same > >> distributed scan. > >> > I.e. group scans which were fired by single distributed scan. If > that's > >> what you want, distributed > >> > scan can generate unique ID and set, say "sourceScan" attribute to its > >> value. This way we'll > >> > have <# of distinct "sourceScan" attribute values> = <number of > >> distributed scans invoked by > >> > client side> and two scans on server side will have the same > >> "sourceScan" attribute iff they > >> > "belong" to same distributed scan. > >> > >> > >> Alex Baranau > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > >> HBase > >> > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> > >>> Alex: > >>> Your second patch looks good. > >>> Can you remove the first version ? > >>> > >>> In HBaseWD, can you use reflection to detect whether Scan supports > >>> setAttribute() ? > >>> If it does, can you encode start row and end row as "sourceScan" > >>> attribute ? > >>> > >>> One consideration is that start row or end row may be quite long. > >>> Ideally we should store hash code of source Scan object as "sourceScan" > >>> attribute. But Scan doesn't implement hashCode(). We can add it, that > would > >>> require running all Scan related tests. > >>> > >>> What do you think ? > >>> > >>> Thanks > >>> > >>> > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau < > alex.barano...@gmail.com>wrote: > >>> > >>>> Sorry for the delay in response (public holidays here). > >>>> > >>>> This depends on what info you are looking for on server side. > >>>> > >>>> From what I understand, you want to distinguish scans fired by the > same > >>>> distributed scan. I.e. group scans which were fired by single > distributed > >>>> scan. If that's what you want, distributed scan can generate unique ID > and > >>>> set, say "sourceScan" attribute to its value. This way we'll have <# > of > >>>> distinct "sourceScan" attribute values> = <number of distributed scans > >>>> invoked by client side> and two scans on server side will have the > same > >>>> "sourceScan" attribute iff they "belong" to same distributed scan. > >>>> > >>>> Is this what are you looking for? > >>>> > >>>> Alex Baranau > >>>> > >>>> P.S. attached patch for HBASE-3811< > https://issues.apache.org/jira/browse/HBASE-3811> > >>>> . > >>>> P.S-2. should this conversation be moved to dev list? > >>>> > >>>> ---- > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > >>>> HBase > >>>> > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >>>> > >>>>> Alex: > >>>>> What type of identification should we put in the map of the Scan > object > >>>>> ? > >>>>> I am thinking of using the Id of RowKeyDistributor. But the user can > >>>>> use same distributor on multiple scans. > >>>>> > >>>>> Please share your thought. > >>>>> > >>>>> > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau < > >>>>> alex.barano...@gmail.com> wrote: > >>>>> > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811 > >>>>>> > >>>>>> Alex Baranau > >>>>>> ---- > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop > - > >>>>>> HBase > >>>>>> > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yuzhih...@gmail.com> > wrote: > >>>>>> > >>>>>> > My plan was to make regions that have active scanners more stable > - > >>>>>> trying > >>>>>> > not to move them when balancing. > >>>>>> > I prefer second approach - adding custom attribute(s) to Scan so > >>>>>> that the > >>>>>> > Scans created by the method below can be 'grouped'. > >>>>>> > > >>>>>> > If you can file a JIRA, that would be great. > >>>>>> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau < > >>>>>> alex.barano...@gmail.com > >>>>>> > >wrote: > >>>>>> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just > >>>>>> differently) when > >>>>>> > > determining the load? > >>>>>> > > > >>>>>> > > The current code looks like this: > >>>>>> > > > >>>>>> > > class DistributedScanner: > >>>>>> > > public static DistributedScanner create(HTable hTable, Scan > >>>>>> original, > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws IOException { > >>>>>> > > byte[][] startKeys = > >>>>>> > > keyDistributor.getAllDistributedKeys(original.getStartRow()); > >>>>>> > > byte[][] stopKeys = > >>>>>> > > keyDistributor.getAllDistributedKeys(original.getStopRow()); > >>>>>> > > Scan[] scans = new Scan[startKeys.length]; > >>>>>> > > for (byte i = 0; i < startKeys.length; i++) { > >>>>>> > > scans[i] = new Scan(original); > >>>>>> > > scans[i].setStartRow(startKeys[i]); > >>>>>> > > scans[i].setStopRow(stopKeys[i]); > >>>>>> > > } > >>>>>> > > > >>>>>> > > ResultScanner[] rss = new ResultScanner[startKeys.length]; > >>>>>> > > for (byte i = 0; i < scans.length; i++) { > >>>>>> > > rss[i] = hTable.getScanner(scans[i]); > >>>>>> > > } > >>>>>> > > > >>>>>> > > return new DistributedScanner(rss); > >>>>>> > > } > >>>>>> > > > >>>>>> > > This is client code. To make these scans "identifiable" we need > to > >>>>>> either > >>>>>> > > use some different (derived from Scan) class or add some > attribute > >>>>>> to > >>>>>> > them. > >>>>>> > > There's no API for doing the latter. But we can do the former, > but > >>>>>> I > >>>>>> > don't > >>>>>> > > really like the idea of creating extra class (with no extra > >>>>>> > functionality) > >>>>>> > > just to distinguish it from the base one. > >>>>>> > > > >>>>>> > > If you can share why/how do you want to treat them differently > on > >>>>>> server > >>>>>> > > side, that would be helpful. > >>>>>> > > > >>>>>> > > Alex Baranau > >>>>>> > > ---- > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > >>>>>> Hadoop - > >>>>>> > HBase > >>>>>> > > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yuzhih...@gmail.com> > >>>>>> wrote: > >>>>>> > > > >>>>>> > > > My request would be to make the distributed scan identifiable > >>>>>> from > >>>>>> > server > >>>>>> > > > side. > >>>>>> > > > :-) > >>>>>> > > > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau < > >>>>>> > alex.barano...@gmail.com > >>>>>> > > > >wrote: > >>>>>> > > > > >>>>>> > > > > > Basically bucketsCount may not equal number of regions for > >>>>>> the > >>>>>> > > > underlying > >>>>>> > > > > > table. > >>>>>> > > > > > >>>>>> > > > > True: e.g. when there's only one region that holds data for > >>>>>> the whole > >>>>>> > > > table > >>>>>> > > > > (not many records in table yet), distributed scan will fire > N > >>>>>> scans > >>>>>> > > > against > >>>>>> > > > > the same region. > >>>>>> > > > > On the other hand, in case there are huge number of regions > >>>>>> for > >>>>>> > single > >>>>>> > > > > table, each scan can span over multiple regions. > >>>>>> > > > > > >>>>>> > > > > > I need to deal with normal scan and "distributed scan" at > >>>>>> server > >>>>>> > > side. > >>>>>> > > > > > >>>>>> > > > > With current implementation "distributed" scan won't be > >>>>>> recognized as > >>>>>> > > > > something special on the server side. It will be an ordinary > >>>>>> scan. > >>>>>> > > Though > >>>>>> > > > > the number of scan will increase, given that the typical > >>>>>> situation is > >>>>>> > > > "many > >>>>>> > > > > regions for single table", the scans of the same > "distributed > >>>>>> scan" > >>>>>> > are > >>>>>> > > > > likely not to hit the same region. > >>>>>> > > > > > >>>>>> > > > > Not sure if I answered your questions here. Feel free to ask > >>>>>> more ;) > >>>>>> > > > > > >>>>>> > > > > Alex Baranau > >>>>>> > > > > ---- > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > >>>>>> Hadoop - > >>>>>> > > > HBase > >>>>>> > > > > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu < > yuzhih...@gmail.com> > >>>>>> wrote: > >>>>>> > > > > > >>>>>> > > > > > Alex: > >>>>>> > > > > > If you read this, you would know why I asked: > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679 > >>>>>> > > > > > > >>>>>> > > > > > I need to deal with normal scan and "distributed scan" at > >>>>>> server > >>>>>> > > side. > >>>>>> > > > > > Basically bucketsCount may not equal number of regions for > >>>>>> the > >>>>>> > > > underlying > >>>>>> > > > > > table. > >>>>>> > > > > > > >>>>>> > > > > > Cheers > >>>>>> > > > > > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau < > >>>>>> > > > alex.barano...@gmail.com > >>>>>> > > > > > >wrote: > >>>>>> > > > > > > >>>>>> > > > > > > Hi Ted, > >>>>>> > > > > > > > >>>>>> > > > > > > We currently use this tool in the scenario where data is > >>>>>> consumed > >>>>>> > > by > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the performance of > >>>>>> pure > >>>>>> > > > > "distributed > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect it to > be > >>>>>> close > >>>>>> > to > >>>>>> > > > > > simple > >>>>>> > > > > > > scan performance, or may be sometimes even faster > >>>>>> depending on > >>>>>> > your > >>>>>> > > > > data > >>>>>> > > > > > > access patterns. E.g. in case you write timeseries data > >>>>>> > > (sequential) > >>>>>> > > > > > which > >>>>>> > > > > > > is written into the single region at a time, then e.g. > if > >>>>>> you > >>>>>> > > access > >>>>>> > > > > > delta > >>>>>> > > > > > > for further processing/analysis (esp. if from not single > >>>>>> client) > >>>>>> > > > these > >>>>>> > > > > > > scans > >>>>>> > > > > > > are likely to hit the same region or couple of regions > at > >>>>>> a time, > >>>>>> > > > which > >>>>>> > > > > > may > >>>>>> > > > > > > perform worse comparing to many scans hitting data that > is > >>>>>> much > >>>>>> > > > better > >>>>>> > > > > > > spread over region servers. > >>>>>> > > > > > > > >>>>>> > > > > > > As for map-reduce job the approach should not affect > >>>>>> reading > >>>>>> > > > > performance > >>>>>> > > > > > at > >>>>>> > > > > > > all: it's just that there are bucketsCount times more > >>>>>> splits and > >>>>>> > > > hence > >>>>>> > > > > > > bucketsCount times more Map tasks. In many cases this > even > >>>>>> > improves > >>>>>> > > > > > overall > >>>>>> > > > > > > performance of the MR job since work is better > distributed > >>>>>> over > >>>>>> > > > cluster > >>>>>> > > > > > > (esp. in situation when the aim is to constantly process > >>>>>> the > >>>>>> > coming > >>>>>> > > > > delta > >>>>>> > > > > > > which usually resides in one or just couple of regions > >>>>>> depending > >>>>>> > on > >>>>>> > > > > > > processing frequency). > >>>>>> > > > > > > > >>>>>> > > > > > > If you can share details on your case, that will help to > >>>>>> > understand > >>>>>> > > > > what > >>>>>> > > > > > > effect(s) to expect from using this approach. > >>>>>> > > > > > > > >>>>>> > > > > > > Alex Baranau > >>>>>> > > > > > > ---- > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - > Nutch > >>>>>> - > >>>>>> > Hadoop > >>>>>> > > - > >>>>>> > > > > > HBase > >>>>>> > > > > > > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu < > >>>>>> yuzhih...@gmail.com> > >>>>>> > > wrote: > >>>>>> > > > > > > > >>>>>> > > > > > > > Interesting project, Alex. > >>>>>> > > > > > > > Since there're bucketsCount scanners compared to one > >>>>>> scanner > >>>>>> > > > > > originally, > >>>>>> > > > > > > > have you performed load testing to see the impact ? > >>>>>> > > > > > > > > >>>>>> > > > > > > > Thanks > >>>>>> > > > > > > > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau < > >>>>>> > > > > > alex.barano...@gmail.com > >>>>>> > > > > > > > >wrote: > >>>>>> > > > > > > > > >>>>>> > > > > > > > > Hello guys, > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > I'd like to introduce a new small java project/lib > >>>>>> around > >>>>>> > > HBase: > >>>>>> > > > > > > HBaseWD. > >>>>>> > > > > > > > > It > >>>>>> > > > > > > > > is aimed to help with distribution of the load > (across > >>>>>> > > > > regionservers) > >>>>>> > > > > > > > when > >>>>>> > > > > > > > > writing sequential (becasue of the row key nature) > >>>>>> records. > >>>>>> > It > >>>>>> > > > > > > implements > >>>>>> > > > > > > > > the solution which was discussed several times on > this > >>>>>> > mailing > >>>>>> > > > list > >>>>>> > > > > > > (e.g. > >>>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk). > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Please find the sources at > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's > >>>>>> > > > > > > > > also > >>>>>> > > > > > > > > a jar of current version for convenience). It is > very > >>>>>> easy to > >>>>>> > > > make > >>>>>> > > > > > use > >>>>>> > > > > > > of > >>>>>> > > > > > > > > it: e.g. I added it to one existing project with 1+2 > >>>>>> lines of > >>>>>> > > > code > >>>>>> > > > > > (one > >>>>>> > > > > > > > > where I write to HBase and 2 for configuring > MapReduce > >>>>>> job). > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Any feedback is highly appreciated! > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Please find below the short intro to the lib [1]. > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Alex Baranau > >>>>>> > > > > > > > > ---- > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - > >>>>>> Nutch - > >>>>>> > > > Hadoop > >>>>>> > > > > - > >>>>>> > > > > > > > HBase > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > [1] > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Description: > >>>>>> > > > > > > > > ------------ > >>>>>> > > > > > > > > HBaseWD stands for Distributing (sequential) Writes. > >>>>>> It was > >>>>>> > > > > inspired > >>>>>> > > > > > by > >>>>>> > > > > > > > > discussions on HBase mailing lists around the > problem > >>>>>> of > >>>>>> > > choosing > >>>>>> > > > > > > > between: > >>>>>> > > > > > > > > * writing records with sequential row keys (e.g. > >>>>>> time-series > >>>>>> > > data > >>>>>> > > > > > with > >>>>>> > > > > > > > row > >>>>>> > > > > > > > > key > >>>>>> > > > > > > > > built based on ts) > >>>>>> > > > > > > > > * using random unique IDs for records > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > First approach makes possible to perform fast range > >>>>>> scans > >>>>>> > with > >>>>>> > > > help > >>>>>> > > > > > of > >>>>>> > > > > > > > > setting > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates single > region > >>>>>> server > >>>>>> > > > > > > hot-spotting > >>>>>> > > > > > > > > problem upon writing data (as row keys go in > sequence > >>>>>> all > >>>>>> > > records > >>>>>> > > > > end > >>>>>> > > > > > > up > >>>>>> > > > > > > > > written into a single region at a time). > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Second approach aims for fastest writing performance > >>>>>> by > >>>>>> > > > > distributing > >>>>>> > > > > > > new > >>>>>> > > > > > > > > records over random regions but makes not possible > >>>>>> doing fast > >>>>>> > > > range > >>>>>> > > > > > > scans > >>>>>> > > > > > > > > against written data. > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > The suggested approach stays in the middle of the > two > >>>>>> above > >>>>>> > and > >>>>>> > > > > > proved > >>>>>> > > > > > > to > >>>>>> > > > > > > > > perform well by distributing records over the > cluster > >>>>>> during > >>>>>> > > > > writing > >>>>>> > > > > > > data > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD provides > >>>>>> very > >>>>>> > > simple > >>>>>> > > > > API > >>>>>> > > > > > to > >>>>>> > > > > > > > > work with which makes it perfect to use with > existing > >>>>>> code. > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage info as > they > >>>>>> aimed > >>>>>> > to > >>>>>> > > > act > >>>>>> > > > > as > >>>>>> > > > > > > > > example. > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Brief Usage Info (Examples): > >>>>>> > > > > > > > > ---------------------------- > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Distributing records with sequential keys which are > >>>>>> being > >>>>>> > > written > >>>>>> > > > > in > >>>>>> > > > > > up > >>>>>> > > > > > > > to > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets: > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > byte bucketsCount = (byte) 32; // distributing > into > >>>>>> 32 > >>>>>> > > buckets > >>>>>> > > > > > > > > RowKeyDistributor keyDistributor = > >>>>>> > > > > > > > > new > >>>>>> > > > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount); > >>>>>> > > > > > > > > for (int i = 0; i < 100; i++) { > >>>>>> > > > > > > > > Put put = new > >>>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey)); > >>>>>> > > > > > > > > ... // add values > >>>>>> > > > > > > > > hTable.put(put); > >>>>>> > > > > > > > > } > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Performing a range scan over written data > (internally > >>>>>> > > > > <bucketsCount> > >>>>>> > > > > > > > > scanners > >>>>>> > > > > > > > > executed): > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Scan scan = new Scan(startKey, stopKey); > >>>>>> > > > > > > > > ResultScanner rs = > >>>>>> DistributedScanner.create(hTable, scan, > >>>>>> > > > > > > > > keyDistributor); > >>>>>> > > > > > > > > for (Result current : rs) { > >>>>>> > > > > > > > > ... > >>>>>> > > > > > > > > } > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Performing mapreduce job over written data chunk > >>>>>> specified by > >>>>>> > > > Scan: > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Configuration conf = HBaseConfiguration.create(); > >>>>>> > > > > > > > > Job job = new Job(conf, "testMapreduceJob"); > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Scan scan = new Scan(startKey, stopKey); > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > TableMapReduceUtil.initTableMapperJob("table", > >>>>>> scan, > >>>>>> > > > > > > > > RowCounterMapper.class, > >>>>>> ImmutableBytesWritable.class, > >>>>>> > > > > > > Result.class, > >>>>>> > > > > > > > > job); > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > // Substituting standard TableInputFormat which > was > >>>>>> set in > >>>>>> > > > > > > > > // TableMapReduceUtil.initTableMapperJob(...) > >>>>>> > > > > > > > > > job.setInputFormatClass(WdTableInputFormat.class); > >>>>>> > > > > > > > > keyDistributor.addInfo(job.getConfiguration()); > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns: > >>>>>> > > > > > > > > ----------------------------------------- > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to support > >>>>>> custom row > >>>>>> > > key > >>>>>> > > > > > > > > distribution > >>>>>> > > > > > > > > approaches. To define custom row key distributing > >>>>>> logic just > >>>>>> > > > > > implement > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class which is > >>>>>> really very > >>>>>> > > > > simple: > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > public abstract class AbstractRowKeyDistributor > >>>>>> implements > >>>>>> > > > > > > > > Parametrizable { > >>>>>> > > > > > > > > public abstract byte[] getDistributedKey(byte[] > >>>>>> > > > originalKey); > >>>>>> > > > > > > > > public abstract byte[] getOriginalKey(byte[] > >>>>>> > adjustedKey); > >>>>>> > > > > > > > > public abstract byte[][] > >>>>>> getAllDistributedKeys(byte[] > >>>>>> > > > > > > originalKey); > >>>>>> > > > > > > > > ... // some utility methods > >>>>>> > > > > > > > > } > >>>>>> > > > > > > > > > >>>>>> > > > > > > > > >>>>>> > > > > > > > >>>>>> > > > > > > >>>>>> > > > > > >>>>>> > > > > >>>>>> > > > >>>>>> > > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >> > > >