[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650544#comment-14650544 ] Ian Michael Gumby commented on HBASE-12853: --- Wow, Rather than try to stay focused on the issue of the Jira, you talk about contributing to open source. I can tell you the answer, I can even explain it to you, but you still wouldn't get it. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649476#comment-14649476 ] Michael Segel commented on HBASE-12853: Andrew, As you point out, it was a trivial solution and that was the point I was trying to make, that you took the time to work on it. As I've said repeatedly, I can't provide patches because the risks outweigh the benefits. (Lets leave it at that.) I guess at the time I wrote this enhancement request, I could have raised this issue with a certain vendor's support team, then suggested that a certain person call a certain person to ask that this get done... but that would have been a waste of calling in a favor. Again, either the committers or community sees the benefits and merits in doing this... or you don't. It was a five minute thought that wasn't worth the effort of diagramming out on a white board that solved a problem. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649455#comment-14649455 ] Lars Hofhansl commented on HBASE-12853: --- Most committers have well paying jobs and won't risk leaving them either. The employer also would be exposed to the very same risk (amplified, because there's more money to make). I have personally many discussions with our legal team(s) about this. So I do know what I am talking about. Most people fail to calculate the cost of legal risk and assume it to be infinite. I get consulting gigs offered all the time _because_ I commit to open source (since I am employed I cannot accept such gigs, but that's not the point here). It's all about how you set it up with your customers. Sorry you feel this way. Contributing is what makes open source work. If everybody would think like you there would be no open source. In any case this is not the right place to discuss this. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649304#comment-14649304 ] Michael Segel commented on HBASE-12853: Lars, If I respond, I'll be called argumentative. If I don't respond, it will leave readers with the incorrect perception. Again, Apache does not indemnify the contributor, leaving you with risk. You need to balance that risk against the benefits of contributing. Its a lot simpler to say "Apache won't indemnify me..." than to continually having to write out long paragraphs as to why and what that really means. Either you get it or you don't. Most of the committers here don't run their own shop or have to deal with the business side of software. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648174#comment-14648174 ] Andrew Purtell commented on HBASE-12853: bq. Either you find value to the suggestion or not. That is your call. But please note that Andrew P. worked on https://issues.apache.org/jira/browse/HBASE-13044. (Also relatively trivial) Not sure I understand the relevance. For the record, I filed that issue after a brief encounter with Jim Scott of MapR over on the OpenTSDB list. He spoke of customers implementing coprocessors that exist solely to prevent loading of any other coprocessors, so I thought we could do something simple to make that unnecessary and volunteered time to do it. Strictly speaking, I didn't have to but the conversation was respectful and interesting and I felt like volunteering some of my evening that evening rather than spend it with family. The committer role at Apache is not about requiring individuals to implement unfunded mandates from random folks. On the other hand, we are expected to try and assess all contributions in the form of a patch in the most impartial manner possible. If for whatever reason you are not in a position to provide a patch, that's fine, but understand you are speaking to a community of volunteers who have work and personal lives and are already being super generous just for showing up here from time to time. You'll have to find a way to convince them they should volunteer their time to help you. Sometimes under the best of circumstances that just won't happen. An abrasive communication style - for example, repeated comments about "lack[ing] the patience to suffer fools" - dooms you to failure out of the gate. Don't be surprised at your lack of results. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647980#comment-14647980 ] Lars Hofhansl commented on HBASE-12853: --- bq. As I have stated repeatedly, I am unable to contribute to certain Apache projects unless Apache is willing to indemnify me. (Which they are not.) Don't be ridiculous. It is always your task to clear with all possible IP owners before you contribute anything under any license. If you have something to contribute show us the code or even just a spec, otherwise it's just useless noise; if not just leave it instead of now blaming the committers with specious excuses why you can't do it. I'm going to close this. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646617#comment-14646617 ] Michael Segel commented on HBASE-12853: @Anoop, Yes, that is correct. It was my misunderstanding on the client/server break. (I program to the APISs and don't look at the source code.) I believe I did mention this after your last post correcting my mistake. Again, this is pretty simple... you're overloading the scan() so that it first does a check to see if the underlying table is bucketed or not. A simple way to do this is to check the number of buckets. If its 0, then its not bucketed and you just run the scan like normal. If it is a non-negative, non-zero integer, you would then parallelize the scan. You would then need to wait until all of the result sets return before you can funnel the data in to a single result set to be returned to the user. Of course I'm assuming that each result set will start to send back results prior to completion of the ensuing scan. Note too that these will be range scans. One other side effect is that if the scan is a full table scan... things will get a bit messy. (We'll maybe not... ) > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646411#comment-14646411 ] Anoop Sam John commented on HBASE-12853: As per the discussion in the Jira comments, we can not do this as a server side feature. This will be a client side thing. Priority can be marked minor or major that is not the main thing IMHO. What matters is the a small doc abt the approach and patch. Many of us will be happy to review that when it comes. As far as some feature are value added for the team,we all are open for those. Are you going to work on this? If not there is no point in keeping it open. We can see any one else willing to take this up. If none better close it as later/wont implement. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646347#comment-14646347 ] Michael Segel commented on HBASE-12853: @Sean, As I have said before... Apache doesn't indemnify committers (actually its the reverse) and there is no upside for me to offset the risk. In a nutshell it would be pointless in having a discussion on why I used the term trivial and why I rated this as a low priority. BTW, there are 11 watchers... why don't you ask those watchers who are also committers and leaders of the HBase project, why they didn't raise the priority? I don't wish to seem rude, but if you're going to lecture someone, you had better realize that some will ignore you, others will mock you... To your point, this was the first JIRA that I raised. I assumed that those who volunteer their time would also take the time to assess the value of the suggestion. Clearly not. That was my mistake. To be honest, I lack the patience to suffer fools... > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel > Fix For: 2.0.0 > > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633793#comment-14633793 ] Michael Segel commented on HBASE-12853: Nothing new? Seriously? This is a trivial feature. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392726#comment-14392726 ] Michael Segel commented on HBASE-12853: Sorry, I thought that the HM had the META table cached in memory. Didn't think that the META was too large Ok, so then it looks like what I want to do is all client side then. The design is pretty straight forward. The number of buckets is fixed at the time of table creation. The row key is a composite key of bucket_id | rowkey and the bucket_id is derived from taking the modulus N of the first byte of the row key. (Giving you 0xFF(255) max buckets. ) Then when you want to fetch a single row given the rowkey, you can find the bucket and fetch the single row. If you need to do a scan, given the start row, you can then create N parallel threads and within each thread, start the scan by prepending the bucket_id | to the start rowkey. When returning the result set, you can then strip off the bucket_id | and take the MIN(value(n)) value(n) is the next row from each scanner, popping it off the stack. This will give you a single result that is guaranteed to still be within sort order. Its all client side and it abstracts the bucketing from the user/client code so that the same code will run against either table without any changes. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392727#comment-14392727 ] Michael Segel commented on HBASE-12853: Sorry, the value '(' n ')' gets translated in to a downvote (n). > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385112#comment-14385112 ] Anoop Sam John commented on HBASE-12853: Just one correction.. The client side has to contact the META (single) region to determine the regions and their location for the scan. So not HM. (If HM is acting as another RS and holding META region, then yes it goes to HM.) So where is META region sits matters. Hope am making it clear now. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384111#comment-14384111 ] Michael Segel commented on HBASE-12853: To add to my comment in response to Anoop, I wanted to abstract the concept of a scan from the application. Normally you'd do this on the server side of the client/server split, however... w.r.t HBase, this becomes a bit more difficult. Ok... So if I understand this. Client passes scanner object to instance of table in the table.scan(scanner) call. So still on the client side, the client's table object will then connect with the HMaster and determine which region and region server is required to start the table scan, and then the client connects directly to the region and starts the scan? What do you call the scanner object that's running on the region? > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384102#comment-14384102 ] Michael Segel commented on HBASE-12853: Ok, So then when a scanner object is passed from the client to the server the client will ask the HMaster for the region(s) that satisfy the scan, or just the first region? This would imply that when running a m/r that the m/r program will ask the HMaster for the regions and then will create a split for each region in the list and then each mapper task will initiate its own scan over a specific region? Ok... on one level for m/r that makes sense because you wouldn't want 1000 mappers trying to coordinate queries with the HMaster at the same time because it could become a bottleneck. On the other side, if you're using HBase as a database outside of Map/Reduce, you'd want to have a query engine that would abstract the underlying workings of a scan from the client. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363448#comment-14363448 ] Anoop Sam John commented on HBASE-12853: Pls note that in reads HBase Master wont do any co-ordination or so. The client talks directly with RS where its intended region resides. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329576#comment-14329576 ] Michael Segel commented on HBASE-12853: I should also add that this is one area that one must take caution in the design because if not done properly or cleanly, it will kill performance. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329574#comment-14329574 ] Michael Segel commented on HBASE-12853: The design seems straight forward, at least as to a starting point. (YMMV) The client will create a reference to a table and then instantiate a scanner object along with any associated filters. The client then passes this object to the server expecting a result set to be returned. On the server side, it seems that the HBase Master (active) gets the scan request and then starts to do the heavy lifting. By providing more intelligence to this process, its possible to do more than just allow for bucketed tables to abstract the buckets and act as if its a regular table. The key question is how to best redesign this initial entry point to allow for such extensibility. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291765#comment-14291765 ] Michael Segel commented on HBASE-12853: Before we go in to a design, I need to get a bit more information. As a practice, I don't review HBase source code and work from the exposed APIs. Of course looking at the HBase API these days is a bit of a CF since most of the APIs are deprecated referring to other deprecated classes / interfaces etc ... not to mention there a couple of different releases... So we start with a Connection instance which we get a instance of class Table for the given table. Ignoring put() for a moment, we have get() and getScanner() methods. What happens on the server side of the connection when the client calls getScanner() or get() ? Part of the issue is that a simple scanner won't work right unless you end up preprocessing it and treating it as a scanner but with a default (blank) set of filters. So while I can walk you through the logic and give you a resulting diagram, I need a committer who's familiar with the server side workings. Then it should be a pretty straight forward thing to implement. -Mike > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282337#comment-14282337 ] Michael Segel commented on HBASE-12853: Sure... Just a couple of things... 1) I would like to make sure I understand the split between client/server in HBase works the way I think it does. 2) I get some free time. (Day Job, conference talks, R&D, ...) This is one issue that is specific to HBase and doesn't conflict with any prior work I may have done. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281074#comment-14281074 ] Lars Hofhansl commented on HBASE-12853: --- Let's see a design :) > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280461#comment-14280461 ] Michael Segel commented on HBASE-12853: "An implemented one is OneBytePrefixKeySalter, where the prefix is hash(RowKey)%buckets" That's fine. But now if I have another client, I have to know that the table is bucketed. (Yes, I am refusing to use the term salt when talking about this... :-) And not only do you need to know that the table is bucketed, you need to know the number of buckets. You are also assuming that the individual is using a java application to query the data. What happens if they are not? And that they've got the Intel library. If its done server side all of that goes away. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280440#comment-14280440 ] Michael Segel commented on HBASE-12853: First, lets get away from using the term salted. Salts do have a specific meaning and its associated with cryptography. While we're clearly not talking about cryptography, it implies that the prefix is orthogonal to the data set and the number of salted values is bound by the width of the prefix. Using the term bucketing the table would be more appropriate because in this example, you're assigning a prefix from a round robin approach. I have to apologize, I don't play with HBase that much these days... my work is client driven. With respect to client/server it seems that the delineation between client and server appears to be a bit different from what I would expect from other databases. In HBase, the client creates a scan, and then has the hmaster will manage the scan and return a pointer to the result set? With respect to the client side code... you're missing the point. You want to abstract the bucketing from the client. So that the same scan will run against a bucketed table and an un-bucketed table. The only exposed difference is that the metadata for the table will specify the number of buckets which defaults to 1 (no bucketing) > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278320#comment-14278320 ] Liu Shaohui commented on HBASE-12853: - Intel hadoop team has opensourced a salted table implement at https://github.com/intel-hadoop/SaltedHTable. It is also a client-slide library and the code is very clean. Personally, a client-slide library is simple enough for salted table. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277488#comment-14277488 ] Lars Hofhansl commented on HBASE-12853: --- The coprocessors are per region, and you want the "salting" for spreading across regions. So you mean to have some region server contact other region server in order to execute a portion of a scan there? Phoenix does the parallelization on the client and then farms out the work to the various region servers, which then execute the requests with the help of per region coprocessors. Would be nice to completely hide this. We might have to invent something now for that. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277413#comment-14277413 ] Michael Segel commented on HBASE-12853: Lars, No it will be all server side. That's the beauty of it. The client won't know anything about the underlying differences. Today, you can easily do this client side and then you have the responsibility for managing the N scanners and merging the result set(s). The idea is to do this server side so that clients won't need to know any of the details. Again, Phoenix implies that it does something like this. However, having a tighter coupling to HBase would mean that there is no client side changes. Clients would have one API to get data from a regular table or one that used buckets. The only difference would be in the table definition and parameters for the table. Does that make sense? > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277341#comment-14277341 ] Lars Hofhansl commented on HBASE-12853: --- Thanks [~msegel]. It would most be client side code, right? I.e. prefixing keys before issuing the writes and performing the right fanning out upon scanning. I don't think that would need any server-side logic (a.k.a. coprocessors), but I might be wrong. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277024#comment-14277024 ] Michael Segel commented on HBASE-12853: Note that some of this may actually be in Phoenix so it could be redundant... http://phoenix.apache.org/salted.html Implies some of this... but does not go in to detail... > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276897#comment-14276897 ] Michael Segel commented on HBASE-12853: On second thought rather than try to automate the number of regions to be used in the prefix, it may just be easier to define a parameter that contains the number of parallel buckets. (Apologies for using a very loose terminology.) We could say buckets or parallelization factor. We may have 100 RS but only want to use a parallel factor of 10 which could be enough to alleviate the hot spotting. It also makes it easier if the size of the cluster is relatively dynamic with the adding and subtracting of RS. Also apologies if this concept has been already raised. > distributed write pattern to replace ad hoc 'salting' > - > > Key: HBASE-12853 > URL: https://issues.apache.org/jira/browse/HBASE-12853 > Project: HBase > Issue Type: New Feature >Reporter: Michael Segel >Priority: Minor > > In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is > that while 'salting' alleviated regional hot spotting, it increased the > complexity required to utilize the data. > Through the use of coprocessors, it should be possible to offer a method > which distributes the data on write across the cluster and then manages > reading the data returning a sort ordered result set, abstracting the > underlying process. > On table creation, a flag is set to indicate that this is a parallel table. > On insert in to the table, if the flag is set to true then a prefix is added > to the key. e.g. - or server # is an integer between 1 and the number of region servers defined. > On read (scan) for each region server defined, a separate scan is created > adding the prefix. Since each scan will be in sort order, its possible to > strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)