[jira] [Commented] (HBASE-7667) Support stripe compaction

2014-01-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885673#comment-13885673
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

the docs were committed to the book as part of HBASE-9854; sorry for late reply

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.98.0, 0.99.0

 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, 
 Using stripe compactions.pdf, stripe-cdf.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2014-01-27 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883479#comment-13883479
 ] 

Andrew Purtell commented on HBASE-7667:
---

Any doc updates available? 0.98.0RC1 is open.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.98.0, 0.99.0

 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, 
 Using stripe compactions.pdf, stripe-cdf.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-12-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843340#comment-13843340
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

With 98 coming so soon, probably not.[~stack] wdyt?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.98.0, 0.99.0

 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, 
 Using stripe compactions.pdf, stripe-cdf.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843355#comment-13843355
 ] 

stack commented on HBASE-7667:
--

Yeah.  Too late for 0.96.  We are trying to get back on to a bugs-only in 
point releases praxis -- unless there a citizen revolt.  Also need reason for 
folks to upgrade to 0.98!

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.98.0, 0.99.0

 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, 
 Using stripe compactions.pdf, stripe-cdf.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-12-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842713#comment-13842713
 ] 

Otis Gospodnetic commented on HBASE-7667:
-

Btw. is this going to get into any 0.96.x releases by any chance?  Thanks.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.98.0, 0.99.0

 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, 
 Using stripe compactions.pdf, stripe-cdf.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808236#comment-13808236
 ] 

stack commented on HBASE-7667:
--

bq. Yes, unless there's non-uniform key access there will be more, but smaller, 
compactions. 

As per [~srivas] supposition above I suppose.

bq. You probably don't want to turn it on by default, as it makes sense either 
for non-uniform data or for large regions.

So for what cases should we turn it on?

bq. The documents, one of them targeted at users (ans all of them out of date), 
are attached to this very JIRA

Pardon me.  I've reviewed a bunch of this feature -- docs and code -- and am 
just having trouble quantifying the benefit this slew of new code brings in.  
Sorry if I am being thick.

If I read the attached user doc, will it be clear (though it is out of date?)   
Let me try it.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808248#comment-13808248
 ] 

stack commented on HBASE-7667:
--

If I read the user doc., it says:

This improves  readperformance in common scenarios and greatly reduces 
variability, by avoiding large and/orinefficient compactions

If I read further, there is no clear message on when enabling stripe 
compactions makes sense.  Doc is missing a section on when NOT to use stripe 
compactions.

The doc. has detail but seems like a bunch no longer applies after recent 
reworkings (as you say above).

Do you have some stats on how it can improve life under certain workloads and 
what those workloads are?

My concern is that a bunch of code will go into hbase and it will sit there 
unused.  We have enough of that already.  I'd like to have some clear messaging 
around this feature, both how it can benefit, and also how a user could enable 
it and see the effects of its workings.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808404#comment-13808404
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

[~stack] I understand your concern. I will get to documenting it this or next 
week, so it should be able to get at least into 98.
As far as I know, there are some people who wanted to try it out,and also 
there's timestamp compaction jira which might be obviated... let's see if as 
experimental feature it can get adoption. It's pretty well isolated now, so it 
should be easy to remove later, or move into separate module out of the way.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806903#comment-13806903
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

I have run the maven tests and rebased all patches (no changes except a tiny 
one in compator)... if no objections I will commit HBASE-7679, HBASE-7680, 
HBASE-7967, HBASE-8000 to trunk today afternoon if there are no objections by 
then.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-28 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807290#comment-13807290
 ] 

stack commented on HBASE-7667:
--

So, stripe compactions does more i/o unless it is the time series use case?   I 
cannot turn this on by default?  Where do I go to read on benefits of this new 
addition?  Thanks [~sershe]

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807397#comment-13807397
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

Yes, unless there's non-uniform key access there will be more, but smaller, 
compactions. 
HBASE-8541 makes for less IO amplification, Enis is reviewing it, it will 
probably follow quickly.

You probably don't want to turn it on by default, as it makes sense either for 
non-uniform data or for large regions.

The documents, one of them targeted at users (ans all of them out of date), are 
attached to this very JIRA ;) 
The configuration was simplified quite a bit compared to the state of the doc.
Let me file a JIRA to document things in e.g. the book.
For the first release it will be positioned as an experimental feature...


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690757#comment-13690757
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

Size-based scheme works by splitting stripes when they grow big. This splitting 
is good for sequential sharded keys, because lower part of the split is written 
as one file, and doesn't receive new data (or doesn't receive a lot of it 
anyway), so it doesn't have to participate in compactions. If you have uniform 
data, splitting result in more rewriting and both stripes keep growing after 
the split.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690761#comment-13690761
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

So it's easier to configure (just say I want 500Mb-1Gb-... stripes) but in the 
net, results in more I/O during initial data population before region reaches 
stable size.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688945#comment-13688945
 ] 

stack commented on HBASE-7667:
--

Rereading the design doc and how-to-use.  They are very nice.  Can go into the 
book.

High-level, and I think you have suggested this yourself elsewhere, it'd be 
coolio if user didn't have to choose between size and count -- if it'd just 
figure itself based off incoming load.

I've seen case where a compaction produces a zero-length file (all deletes) so 
would that mess w/ this invariant: Compaction   mustproduce at  least  
 one file(seeHBASE-6059). or ...No stripe  can everbe 
 leftwith0   files...

I almost asked a few questions you'd already answered above in my previous read 
of the doc (smile).

How would region merge work?  We'd just drop all files into L0?  Sounds like 
we'd have to drop references if we are not to break snapshotting.

You think this true? stripescheme  useslarger  number  of  
files   than
default to  ensure  all compactions are small,  which   can 
affect  verywidescans.  Any measure of how much?

Should stripe be on by default?  Or have it as experimental for now until we 
get more data?

How to use doc is excellent (though too many configs).  Will review patch again 
next.



 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689954#comment-13689954
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

I'm looking at how to merge policies. Unfortunately splitting uses more I/O 
than not splitting (who would have though... :)), resulting in worse perf. 
Also, the system cannot really predict future data patterns, no more than 
region splitting can do it (at least not with a lot of complexity added), so 
hint flag for how to split would need to be provided.

W.r.t. producing files to contain metadata, that is unfortunately necessary. 
These files shouldn't have effect. Stripes with only expired files can be 
merged. I've taken a stab at auto-detecting stripes from file metadata, in 
general case it's very complex, in simplified realistic case it's just complex.

Merge will drop everything into L0, yes. This could be improved, but has to be 
done now anyway due to references, same with split, so no need to do it now.

On-by-default would require smart default settings.

Let me comment tomorrow on HBASE-7680, if I can make a size-count hybrid 
quickly I will post final patch without a lot of logic changes there, and 
hopefully we can commit 3 initial patches and build on top of that.


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690015#comment-13690015
 ] 

stack commented on HBASE-7667:
--

bq. splitting uses more I/O than not splitting

Sorry.  You mean stripes uses more i/o because we L0 first then rewrite into 
stripes?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-05-10 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654572#comment-13654572
 ] 

Jonathan Hsieh commented on HBASE-7667:
---

[~sershe], thanks for the graph.  Seeing that, I think a avg/std deviation 
could be an even simpler way of showing where this compaction approach 
demonstrates a win.  It looks like the variance of times will be significantly 
higher with default and it seems that avg time would be about the same.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-05-07 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651556#comment-13651556
 ] 

Elliott Clark commented on HBASE-7667:
--

Thanks for the doc.  Reading this tonight.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-05-05 Thread Raymond (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649312#comment-13649312
 ] 

Raymond commented on HBASE-7667:


great, more region lead to better load balance and good compaction effect, less 
region lead to easy management and fast failover, I think stripe (or 
sub-region) is a good trade-off. 
And I think stripe compaction is similar with Level compaction with L0+L1 only. 
Another difficult is about configuration, in big hbase cluster, there are so 
many applications, how to build suitable configuation for each one will be a 
huge challenge.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, 
 Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-04-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618967#comment-13618967
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

Currently only one compaction per store is allowed. The need to compact several 
stripes in parallel can probably be alleviated by just having less stripes? As 
future improvement it is possible to add. 

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-04-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619380#comment-13619380
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

Actually, judging by logs what can be done is triggering compaction thread if 
store can compact. In 25-stripe case I see gaps between compactions which are 
unnecessary, when the compaction only triggers on flush despite plenty of tiny 
stripe compactions being possible

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-30 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618189#comment-13618189
 ] 

Matt Corgan commented on HBASE-7667:


Sergey - i'm curious how are compactions of the stripes being scheduled/queued? 
 Does a region still make a single region-wide compaction request, and the 
compactor picks a single stripe?  Or can multiple stripes be in the compaction 
queue at once?  

Given that regions could be allowed to grow much larger with stripe compaction 
enabled it would probably be good to allow multiple stripes to compact in 
parallel.  Just a thought for another next step... you've probably considered 
it already.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617951#comment-13617951
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

I did a c1.xlarge test (with default, 3, 10 and 25 stripes, 2 times each). The 
results for different stripe configurations are very consistent across both 
runs.
Compared to m1.large test the positive effect of increasing number of stripes 
on write speed is less.

For this load, sweet spot appears to be around 10-12 stripes based on two 
tests. 3 stripes have large compactions similar to default (well, not as 
large); 25 stripes does too many small compactions, so select-compact loop 
cannot keep up with the number of files produced - on Iteration 2 test 
described in the doc at least some stripes in 25-stripe case always have 6-8 
small files (as they get compacted other stripes get more files). This appears 
to be the limiting factor on increasing the number of stripes. 
I think the main point is that, for count scheme, there's perf parity (writes 
are generally slightly slower, reads slightly faster), despite existing and 
fixable write amplification; and there's reduction of variability, which was 
the goal. I will try to devise a more realistic read workload, but I don't 
think it should change much given above.
For sequential data, with size-based stripe scheme there's reduction in 
compactions, as expected, despite even L0.

Next steps:
1) On existing data I want to correlate read/write perf with compactions. It is 
interesting that stripe scheme has slower writes in general, as Jimmy has noted 
- it touches read path but not anything at all on write path, so it is probably 
I/O related, or stresses some interaction between existing write and compaction 
paths.
2) Run tests for more realistic read workloads (and parallel read/writes), by 
not using LoadTestTool? Optional-ish.
3) Clean up integration test patch in HBASE-8000.
4) Review and commit? :)

5) Get rid of L0?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617956#comment-13617956
 ] 

Ted Yu commented on HBASE-7667:
---

bq. 5) Get rid of L0?
Can we do this frist ?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616654#comment-13616654
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

I will update the doc, although in this case m1.large moderate IO capacity 
works even better for the test, making it easier to simulate IO-constrained 
cluster on such small number of nodes/time frame.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616976#comment-13616976
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

Btw, the 3 next child JIRAs are rady for review. Please feel free to +1 them, I 
will only commit all 3 together and with integration test included.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614805#comment-13614805
 ] 

Andrew Purtell commented on HBASE-7667:
---

I'd recommend retesting with c1.xlarge instance types, this will get you a lot 
closer to real hardware IMHO. The IO capability of the c1.xlarge is high vs. 
only moderate for m1.large and the c1.xlarge has 8 vcores as opposed to _2 
only_ for the m1.large. The c1.xlarge will have 4 locally attached 
instance-store volumes while the m1.large has only 2 IIRC. Also, I didn't see 
it mentioned in the perf doc but you should use only the locally attached 
instance store volumes as datanode storage volumes to avoid variance introduced 
by EBS.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612653#comment-13612653
 ] 

M. C. Srivas commented on HBASE-7667:
-

@mcorgan and @stack:

The total i/o in terms of i/o bandwidth consumed is the same. But the disk iops 
are much, much worse. And disk iops are at a premium, and bg activity like 
compactions should consume as few as possible.

Let's say we split a region into a 100 sub-regions, such that each sub-region 
is in the few 10's of MB. If the data is written uniformly randomly, each 
sub-region will write out a store at approx the same time. That is, a RS will 
write 100x more files into HDFS (100x more random i/o on the local 
file-system). Next, all sub-regions will do a compaction at almost the same 
time, which is again 100x more read iops to read the old stores for merging.

One can try to stagger the compactions to avoid the sudden burst by 
incorporating, say, a queue of to-be-compacted-subregions. But while the 
sub-regions at the head of the queue will compact in time, the ones at the 
end of the queue will have many more store files to merge, and will use much 
more than their fair-share of iops (not to mention that the 
read-amplification in these sub-regions will be higher too). The iops profile 
will be worse than just 100x.


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612808#comment-13612808
 ] 

stack commented on HBASE-7667:
--

[~srivas]

bq. But the disk iops are much, much worse.

See Sergey's writeup.  We flush same as we always did writing a single file to 
the L0 tier.  It is later at compaction time -- i.e. NOT random i/o -- that 
we'd write a file per sub-region/stripe.  If the write is evenly distributed, 
we'd do the same overall i/o except with stripe compacting it would be done in 
smaller bite sizes ([~sershe] Would the compaction of stripes run in //?  
Hopefully, for the case Srivas describes, we'd progress serially through the 
stripes/sub-regions or at least it would be an option and then later, 
ergonomically, we'd recognize the even-loading case and add compaction 
accordingly)

You have a point that we will be making more files in the fs.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612847#comment-13612847
 ] 

Matt Corgan commented on HBASE-7667:


{quote}a RS will write 100x more files into HDFS (100x more random i/o on the 
local file-system){quote}I think this is a point of confusion.  A typical HBase 
file could be 4MB to 40GB, where those files are a series of 4KB (very small) 
underlying disk blocks.  Ignoring complexities of multiple tasks running 
simultaneously on the regionserver, only the first 4KB block of each file is a 
random write, while the following blocks are sequential writes.  The few extra 
random writes should be lost in the noise of all the other random IO requests 
happening.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612956#comment-13612956
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

bq. The first half of the sentence seems to be incomplete.
bq. I think I understand what you mean - such KVs would be written by previous 
writer.
bq. Missing author, date, and JIRA pointer?
bq. I think you need to say stripe == sub-range of the region key range.  You 
almost do.  Just do it explicitly.
bq. What does this mean andold boundaries  rarely, if  ever,   
moving.?  Give doc an edit?
bq. Say in doc that you mean storefile metadata else it is ambiguous.
bq. Not sure I follow here: This   compaction  is  performed   
whenthe number  of  L0  files   exceeds somethreshold   
and producesthe number  of  files   equivalent  to  
the number  of  stripes,withenforcedexisting
boundaries.
Fixed these.

bq. An interesting comment by LarsH recently was that maybe we should ship w 
/major compactions off; most folks don't delete
Hmm... in general I agree but we'll have to insert really good warnings 
everywhere. Can we detect if they delete? :)

bq. Missing is one a pointer at least to how it currently works (could just 
point at src file I'd say with its description of 'sigma' compactions) and a 
sentence on whats wrong w/ it

bq. Later I suppose we could have a combination of count-based and 
size-based if an edge stripe is N time bigger than any other, add a new 
stripe?
Yeah, it's mentioned in code comment somewhere. 

bq. I was wondering if you could make use of liang xie's bit of code for making 
keys for the block cache where he chooses a byte sequence that falls between 
the last key in the former block and the first in the next block but the key is 
shorter than either. but it doesn't make sense here I believe; your 
boundaries have to be hard actual keys given inserts are always coming in 
so nevermind this suggestion.
For boundary determination it does make sense; can you point at the code? After 
cursory look I cannot find it.

bq. You write the stripe info to the storefile.  I suppose it is up to the 
hosting region whether or not it chooses to respect those boundaries.  It could 
ignore them and just respect the seqnum and we'd have the old-style storefile 
handling, right?  (Oh, I see you allow for this -- good)
Yes.


bq. Thinking on L0 again, as has been discussed, we could have flushes skip L0 
and flush instead to stripes (one flush turns into N files, one per stripe) but 
even if we had this optimization, it looks like we'd still want the L0 option 
if only for bulk loaded files or for files whose metadata makes no sense to the 
current region context. • The aggregate   range   of  files   
going   in  mustbe  contiguous... Not sure I follow.  Hmm... could 
do with  going into a compaction
Yes, that was my thinking too.

bq. If the stripe  boundaries  are changed by  compaction, 
the entire  stripes withold boundaries  mustbe  
replaced ...What would bring this on? And then how would old boundaries get 
redone?  This one is a bit confusing.
Clarified. Basically one cannot have 3 files in (-inf, 3) and 3 in [3, inf), 
then take 3 and 2 respectively, and rewrite them with boundary 4, because then 
there will be a file with [3, inf) remaining that overlaps.



bq. I was going to suggest an optimization for later for the case that an L0 
fits fully inside a stripe, I was thinking you could just 'move' it into its 
respective stripe... but I suppose you can't do that because you need to write 
the metadata to put a file into a stripe...
Yeah. Also wouldn't expect it to be a common case.

bq. Would it help naming files for the stripe they belong too?  Would that 
help?  In other words do NOT write stripe data to the storefiles and just let 
the region in memory figure which stripe a file belongs too.  When we write, we 
write with say a L0 suffix.  When compacting we add S1, S2, etc suffix for 
stripe1, etc.  To figure what the boundaries of an S0 are, it'd be something 
the region knew.  On open of the store files, it could use the start and end 
keys that are currently in the file metadata to figure which stripe they fit in.
bq. Would be a bit looser.  Would allow moving a file between stripes with a 
rename only. The delete dropping section looks right.  I like the major 
compaction along a stripe only option. 
This could be done as future improvement. The implications of change of naming 
scheme for other parts of the systems need to be determined.
Also for all I know it might break snapshots (moving files does). And, code to 
figure ut stripes on the fly would be more complex.

bq. Foremptyranges,empty   files   are 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613004#comment-13613004
 ] 

stack commented on HBASE-7667:
--

 HBASE-7845 optimize hfile index key is the key/boundary determination work 
I was referring to (I don't think it applies here but adding the reference 
since you asked for it)

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613454#comment-13613454
 ] 

Lars Hofhansl commented on HBASE-7667:
--

bq. An interesting comment by LarsH recently was that maybe we should ship w 
/major compactions off; most folks don't delete

Hmm... I don't doubt that I said this, but I'm not sure that I agree :)  Many 
people do delete and just not removing the delete markers would be unexpected.


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf, Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611728#comment-13611728
 ] 

M. C. Srivas commented on HBASE-7667:
-

There is of course one major caveat with this approach. If data insertion is 
uniformly spread (ie, key is uniform random), this proposal performs much worse 
than the existing scheme.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611820#comment-13611820
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

There are two approaches discussed, one similar to having many small regions 
and one for sequential data. Which one do you mean? I am testing the first one 
with uniformly distributed keys now and it's somewhat slower than default case 
on average (on writes mostly) but has no big compaction associated latency 
spikes... I suspect if there was not so much compaction due to L0 the write 
slowness could also be alleviated.
I haven't tested the 2nd one yet (requires a more custom test, next week) but 
it's very specialized, for sequential data, so yes it is not good for common 
case.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611843#comment-13611843
 ] 

stack commented on HBASE-7667:
--

bq. There is of course one major caveat with this approach. If data insertion 
is uniformly spread (ie, key is uniform random), this proposal performs much 
worse than the existing scheme.

[~srivas] Why?  Won't it do same total i/o?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611844#comment-13611844
 ] 

stack commented on HBASE-7667:
--

{code}
On attached doc, it is lovely.

Missing author, date, and JIRA pointer?

An interesting comment by LarsH recently was that maybe we should ship w /major 
compactions off; most folks don't delete

Missing is one a pointer at least to how it currently works (could just point 
at src file I'd say with its description of 'sigma' compactions) and a sentence 
on whats wrong w/ it
or the problems it leads too when left run amok (you say it for major 
compactions but even w/o major compactions enabled, an i/o tsunami can hit and 
wipe us out

What does this mean andold boundaries  rarely, if  ever,   
moving.?  Give doc an edit?

I think you need to say stripe == sub-range of the region key range.  You 
almost do.  Just do it explicitly.

I see your extra justification for l0, the need to be able to bulk load.  It is 
kinda important that we continue to support that.  Good one.

Later I suppose we could have a combination of count-based and size-based 
if an edge stripe is N time bigger than any other, add a new stripe?

I was wondering if you could make use of liang xie's bit of code for making 
keys for the block cache where he chooses a byte sequence that falls between
the last key in the former block and the first in the next block but the key is 
shorter than either. but it doesn't make sense here I believe;
your boundaries have to be hard actual keys given inserts are always coming 
in so nevermind this suggestion.

You write the stripe info to the storefile.  I suppose it is up to the hosting 
region whether or not it chooses to respect those boundaries.  It
could ignore them and just respect the seqnum and we'd have the old-style 
storefile handling, right?  (Oh, I see you allow for this -- good)

Say in doc that you mean storefile metadata else it is ambiguous.

Thinking on L0 again, as has been discussed, we could have flushes skip L0 and 
flush instead to stripes (one flush turns into N files, one per stripe)
but even if we had this optimization, it looks like we'd still want the L0 
option if only for bulk loaded files or for files whose metadata makes
no sense to the current region context.

• The  aggregate   range   of  files   going   in  mustbe  
contiguous... Not sure I follow.  Hmm... could do with  going into a 
compaction

If the stripe  boundaries  are changed by  compaction, 
the entire  stripes withold boundaries  mustbe  
replaced ...What would bring this on?
And then how would old boundaries get redone?  This one is a bit confusing.

Get key before is a PITA

Not sure I follow here: This   compaction  is  performed   when
the number  of  L0  files   
exceeds somethreshold   and producesthe number  of  
files   equivalent  to  the number  
of  stripes,withenforcedexistingboundaries.

I was going to suggest an optimization for later for the case that an L0 fits 
fully inside a stripe, I was thinking you could just 'move' it into
its respective stripe... but I suppose you can't do that because you need to 
write the metadata to put a file into a stripe...

Would it help naming files for the stripe they belong too?  Would that help?  
In other words do NOT write stripe data to the storefiles and just
let the region in memory figure which stripe a file belongs too.  When we 
write, we write with say a L0 suffix.  When compacting we add S1, S2, 
etc suffix for stripe1, etc.  To figure what the boundaries of an S0 are, it'd 
be something the region knew.  On open of the store files, it could
use the start and end keys that are currently in the file metadata to figure 
which stripe they fit in.

Would be a bit looser.  Would allow moving a file between stripes with a rename 
only.

The delete dropping section looks right.  I like the major compaction along a 
stripe only option.

Foremptyranges,empty   files   are created.  Is this 
necessary?  Would be good to avoid doing this.


{code}

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611909#comment-13611909
 ] 

Matt Corgan commented on HBASE-7667:


{quote}If data insertion is uniformly spread (ie, key is uniform random), this 
proposal performs much worse than the existing scheme.{quote}I think the goal 
for uniformly random keys is to have the same amount of total work done but to 
stagger that work.  Instead of doing 1 big 24 GB compaction per day, it could 
do a 1 GB compaction each hour.

The savings/efficiency become more pronounced with less random keys, with the 
biggest savings for sequential keys.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611324#comment-13611324
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

Hmm... I am starting to think that we might want to consider getting rid of L0 
after all. Somehow it escaped me that L0 gives you x2 write amplification right 
there as all data has to be re-striped. Small files actually don't increase the 
number of files for gets/non-overlapping scans because each L0 file still 
counts for each stripe.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609873#comment-13609873
 ] 

Ted Yu commented on HBASE-7667:
---

Nice document, Sergey.

bq. This can obviously be improved since most (or all) stripes, see future 
improvements.
The first half of the sentence seems to be incomplete.

bq. Before starting a new writer, compactor ensures that all the KVs for the 
last row in the previous writer go to the previous writer.
I think I understand what you mean - such KVs would be written by previous 
writer.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578445#comment-13578445
 ] 

Jimmy Xiang commented on HBASE-7667:


Getting rid of L0 means memstore flushing will take longer and hold update/read 
longer?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578513#comment-13578513
 ] 

stack commented on HBASE-7667:
--

bq. Getting rid of L0 means memstore flushing will take longer and hold 
update/read longer?

Yes but on back side, less compactions (no need to compact on region open since 
no half files/references) and for many workloads, less compactions overall 
because only a subset of files will be picked up from L0 tier.

Can do work too to minimize how much we hold flushes.  Could be dumb and on 
flush, inline, examine key spread so can make ruling on where to insert 
boundaries (figuring boundaries would be for first flush only?  Or I suppose 
boundary making would be ongoing over the life of a region... four boundaries 
per region seems like a nice number to work with)... so this would be a 
scan of the 64MB memstore keys or, we could keep a running tally as we 
insert into the memstore so at flush time we knew where the boundaries were... 
could do stuff like //  request to NN to open the files to flush too.

In general, we want to add smarts around flushing (stripes, in-memory 
compacting, etc.) so eventually we are going to have this friction on flush 
(IMO).

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578037#comment-13578037
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

If the user copies the file, few things may happen. If he uses the non-stripe 
compaction, files will just get compacted into fewer files eventually. If all 
files are copied, stripe compactions are used and stripe config is different, 
SFM will load old stripes and eventually re-stripe the data to conform to new 
config, if necessary. That will depend on compactionpolicy implementation. If 
part of the files are copied (whatever sense that makes), either the above will 
still happen, or SFM will give up on metadata, and put all files to L0, 
re-striping them according to new config after some time.

SFM needs metadata from all files, but as far as I see from HStore that is 
already loaded, because other code makes use of metadata without special 
considerations. One thing is that with more files there will be more to open...

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578087#comment-13578087
 ] 

Matt Corgan commented on HBASE-7667:


Sergey - I'm curious what's the reasoning behind flushing memstore to a single 
L0 file rather than splitting the memstore into the stripes during each flush?  
Keep flushes faster, less files, etc?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578090#comment-13578090
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

My reasoning was - too many really tiny files, plus scope creep into memstore.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578099#comment-13578099
 ] 

stack commented on HBASE-7667:
--

Would it simplify the filesystem implementation if you did the split in memory 
(caveat the scope creep up into memstore) so no special L0 tier?  Regards files 
being too small, that is a different issue (e.g. a Toddea -- Todd+idea -- that 
I tripped over recently in an issue here is that rather than flush immediately, 
that we'd instead do a purge and compaction in memory before flushing to ensure 
the content large enough to make a file... but yeah, that'd be something else).

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578105#comment-13578105
 ] 

Matt Corgan commented on HBASE-7667:


Gotcha.  Agree about limiting scope.  If the special L0 tier turns out to be 
more difficult to implement than originally intended for whatever reason, might 
be worth evaluating splitting during flush.  Seems like the same number of 
files might get created anyway when you split the L0 file?  Or do you plan on 
doing some logical striping across the L1 boundary as Nicolas says above 
where the L0 files are never truly split?

Like Stack mentions, longer term I think we'll need to split memstore while in 
use, and those splits should probably have some alignment with these stripe 
boundaries.  For another day...

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578108#comment-13578108
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

L0 actually should be relatively easy to implement, it's the special cases 
about all the possible stripe boundaries that cause all the complexity.
L0 files will hopefully not be split individually, but as soon as we reach some 
number of files, similarly to level db algorithm.
But yeah with insta-stripe solution we could get rid of L0 and also get less 
files for reads. Could be future improvement.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578192#comment-13578192
 ] 

stack commented on HBASE-7667:
--

I'd like to get rid of L0 so can do splitting w/o resorting to file References 
and half hfiles (smile) but yes, could be future as you say Sergey.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576798#comment-13576798
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

The stripe boundaries are supposed to be an internal detail not visible to the 
user, the user only configured the scheme (#of stripes, or size-based stripes 
as described for time series data). 
What kind of stripe configuration do you have in mind?
After talking about splits yesterday I am planning to make it a little more 
flexible for splits, but still internal.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576826#comment-13576826
 ] 

stack commented on HBASE-7667:
--

bq. Currently Sergey puts the boundaries in metadata of store files. 

What is wrong w/ this?  It is the bounds of the keys the file contains?  (Is 
this not there already, the first and last key?)

What are the thoughts on figuring region stripe boundaries.  Will it be done by 
looking at the memstore content just before flush and dividing it into n files? 
 Or will it be done after the first flush on subsequent compactions by looking 
at content of L0?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576829#comment-13576829
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

In the N stripes scheme, boundaries will be determined at first L0 
compaction. I am intending to add parameter to config to make first L0 
compaction wait for more files to get better ones. Then, the stripes can be 
rebalanced but the hope is that it happens infrequently.
In the growing key range scheme (for time series above) stripe boundaries 
don't matter much, and are basically determined based on size. If nothing 
intervenes, by EOW or next week I hope to have preliminary compaction 
policy/compactor patch.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576836#comment-13576836
 ] 

Ted Yu commented on HBASE-7667:
---

bq. It is the bounds of the keys the file contains? (Is this not there already, 
the first and last key?)
Currently HFile provides the following methods:
{code}
byte[] getFirstRowKey();

byte[] getLastRowKey();
{code}
Stripe boundaries should be different from the above.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576845#comment-13576845
 ] 

stack commented on HBASE-7667:
--

[~ted_yu] You raise a 'concern' that is nebulous to shoot down a suggested 
implementation and then when asked why you do not answer the question asked.

[~sershe] Would it be easier looking at memstore than at content of L0?  Would 
have to do appropriate weighting... in that there will be hot spots in the 
keyspace and these should get more stripes.  Rejiggering the stripes joining up 
the infrequently used and allotting more stripes to the hot keyspace area 
sounds right.   Good stuff.

Maybe its worth writing up a doc at this stage.  This issue is full of 
goodness.  A doc could distill it out and make it easier on the digestion and 
easier to think about it.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576881#comment-13576881
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

Discussed with Ted, what was meant is configuring it like region splitting, 
which is not planned at this point.
Doc - probably needed... I will get to it eventually.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577311#comment-13577311
 ] 

Ted Yu commented on HBASE-7667:
---

Here is my understanding of the difference between existing store file metadata 
and the new stripe boundary metadata.
Existing store file metadata (first row key, last row key, etc) is intrinsic to 
the store file. i.e., their meaning doesn't change when the store file gets 
copied to another cluster.
However stripe boundary doesn't seem to carry the same characteristics.
Let's consider table A1 in cluster C1 and table A2 in cluster C2. They have the 
same schema and region boundaries. When store file F1 is copied from C1 to C2, 
user may switch to a different striping strategy. The reason could be that 
clusters C1 and C2 serve different patterns of load.
Embedding stripe boundary in store file may potentially cause confusion in 
cluster C2.

Another factor is that StoreFileManager needs to scan all store files to 
establish / validate stripe boundaries when region opens.

Please correct me if I am wrong.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575883#comment-13575883
 ] 

Nicolas Spiegelberg commented on HBASE-7667:


Some thoughts I had about this:

Overall, I think it's a good idea.  Seems like it's not crazy to add and would 
have multiple benefits.  Logical striping across the L1 boundary is a simple 
solution to both proactively handle splits and reduce compaction times.

Thoughts on this feature
1. Fixed configs : in the same way that we got a lot of stability by limiting 
the regions/server to a fixed number, we might want to similarly limit the 
number of stripes per region to 10 (or X) instead of every Y bytes.  This 
will help us understand the benefit we get from striping and it's easy to 
double the striping and chart the difference.
2. NameNode pressure :  Obviously, a 10x striping factor will cause 10x scaling 
of the FS.  Can we offset this by increasing the HDFS block size, since 
addBlock dominates at scale?  Really, unlike Hadoop, you have all of the HFile 
or none of it.  Missing a portion of the HFile currently invalidates the whole 
file.  You really need 1 HDFS block == 1 HFile.  However, we could probably 
just toy with increasing it by the striping factor right now and seeing if that 
balances things.
3. Open Times : I think this will be an issue, specifically on server start.  
Need to be careful here.
4. Major compaction : you can perform a major compaction (remove deletes) as 
long as you have [i,end) contiguous.  I don't think you'd need to involve L0 
files in an MC at all.  Save the complexity.  Furthermore, part of the reason 
why we created the tiered compaction is to prevent small/new files from 
participating in MC because of cache thrashing, poor minor compactions, and a 
handful of other reasons.

So, some thoughts on related pain points we seem to have that tie into this 
feature:
1. Reduce Cache thrashing : region moves kill us a lot of time because we have 
a cold cache.  There is a worry that more aggressive compactions mean more 
thrashing.  I think it will actual even this out better since right now a MC 
causes a lot of churn.  Just should be thinking about this if perf after the 
feature isn't what we desire.
2. Unnecessary IOPS : outside of this algorithm, we should just completely get 
rid of the requirement to compact after a split.  We have the block cache, so 
given a [start,end) in the file, we can easily tell our mid point for future 
splits.  There's little reason to aggressively churn in this way after 
splitting.
3. Poor locality : for grid topology setups, we should eventually make the 
striping algorithm a little more intelligent about picking our replicas.  If 
all stripes go to the same secondary  tertiary node, then splits have a very 
restricted set of servers to chose for datanode locality.


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575936#comment-13575936
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

bq. It may be a follow on to this jira, but having striper dynamically add 
stripes at the end of the region would let allow all the stripes before the 
last one go cold which is critical for avoiding hugely wasteful compactions 
of non-changing data
Actually, it can be added as part of the main work, HBASE-7679 (file 
management) code includes such capabilities. 
I wonder how, no matter the compactions, does region management work for such 
scenario. Wouldn't all the load always be on last region if you have TS keys?
Or, if you have artificial partitioning but query by TS, wouldn't all queries 
go to all servers?

bq.  To major compact a stripe, all L0 files, if any, can be split into 
stripes, then merge all files belonging to the stripe.
Can you explain more about the delete marker limitation?
Suppose in current compaction selection, I choose a set of files starting at 
the oldest file but not including all files.
Wouldn't that be enough to process delete markers that delete the updates 
within those files? Granted, I might not process all delete markers, but I 
don't have to see all files. E.g. if I only have 3 files with one entry for K 
each, K=V, delete K, K=V2, and I compact the first two, I can remove 
entries for K from them, right?

bq. 1. Fixed configs : in the same way that we got a lot of stability by 
limiting the regions/server to a fixed number, we might want to similarly limit 
the number of stripes per region to 10 (or X) instead of every Y bytes. This 
will help us understand the benefit we get from striping and it's easy to 
double the striping and chart the difference.
That is the original idea.

Thanks for other comments :)

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575951#comment-13575951
 ] 

stack commented on HBASE-7667:
--

In a striped region, if stripes = 2 and the key distribution is basically 
even, a split could be done w/o references, halfhfiles, and rewriting from 
parent to daughter; a split could just rename the parent files into the 
daughter regions.   It could make for split simplification and possibly make 
for some i/o savings.

This is a load of great stuff in this issue.  Best read I've had in a long time.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575995#comment-13575995
 ] 

Matt Corgan commented on HBASE-7667:


{quote}Open Times : I think this will be an issue, specifically on server 
start. Need to be careful here.{quote}Hopefully could be mitigated by making 
regions larger, like doubling region size and setting max 2 stripes/region.  
Theoretically should be able to have the same overall number of files as normal 
regions, or are there other factors at play?

{quote}Wouldn't all the load always be on last region if you have TS keys?  Or, 
if you have artificial partitioning but query by TS, wouldn't all queries go to 
all servers?{quote}An easy strategy for P partitions is to 
* prepend a single byte to each key where prefix=hash(row)%P
* pre-split the table into P regions
* tweak the balancer to evenly spread the tail partitions for each region
* writes get sprayed evenly to all tail partitions
* a single Get query will only hit one region since you know hash(row)%P 
beforehand
* you scan all P partitions using a P-way collating iterator
** so yes, scans go to all servers but presumably they are huge and would hit 
lots of data anyway
** because they are huge, a client that scans the partitions concurrently will 
be faster
* a big multi-Get will spray to the exact servers necessary, possibly all of 
them, but like scans may be faster because done in parallel

I'm not sure what most people are doing with time series data but this seems 
like a good approach to me.  You basically just choose arbitrarily large P.  An 
MD5 prefix is essentially P=2^128 (I wouldn't recommend pre-splitting at that 
granularity).


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576023#comment-13576023
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

bq. a split could just rename the parent files into the daughter regions
I was told this is impossible due to snapshots relying on files not moving 
between regions (or on references during splits?)
We just discussed this here, some improvements for splits should definitely be 
possible.
bq. so yes, scans go to all servers but presumably they are huge and would hit 
lots of data anyway
Yeah, I meant the scans, was assuming scans for TS data mostly. I see.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576044#comment-13576044
 ] 

Jimmy Xiang commented on HBASE-7667:


bq. Wouldn't that be enough to process delete markers that delete the updates 
within those files?

I think so, as long as the files not included are newer (like in L0).

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576296#comment-13576296
 ] 

Ted Yu commented on HBASE-7667:
---

I think we need to find the best way of storing stripe boundaries.

Currently Sergey puts the boundaries in metadata of store files. This is not 
flexible. Once this feature is released, users would request ways to configure 
stripes in different manners.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575449#comment-13575449
 ] 

Ted Yu commented on HBASE-7667:
---

bq. Sub-region is not a good name either though.
Other names we can consider: arena, range, realm, section, sector, zone.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-10 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575578#comment-13575578
 ] 

Jimmy Xiang commented on HBASE-7667:


Stripes don't have overlapping keyrange with other stripes.  So each stripe is 
just like a sub-region.  L0 files are special, could overlap with multiple 
stripes.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575583#comment-13575583
 ] 

Lars Hofhansl commented on HBASE-7667:
--

I see, then a major compaction needs to at least include all current L0 files 
(in any), right?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-10 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575589#comment-13575589
 ] 

Jimmy Xiang commented on HBASE-7667:


That's right.  To major compact a stripe, all L0 files, if any, can be split 
into stripes, then merge all files belonging to the stripe.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-09 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575206#comment-13575206
 ] 

Jimmy Xiang commented on HBASE-7667:


bq. Overall, I think the sub-region approach cuts the rope in the tug-of-war. 
It lets you have a smaller number of regions at the same time as having the 
most efficient compactions.

I agree. I was wondering if it will take longer to open a region if there are 
many hfiles.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575297#comment-13575297
 ] 

stack commented on HBASE-7667:
--

[~mcorgan] Nice write up.  So you are in favor of Sergey's project?

[~jxiang] If I were to guess, it will take longer than the case where we have 
monolithic files that cover the total region namespace as we currently have 
(Because there will be more files).  If rather than strip'ing inside a region, 
we instead had a region per stripe, my guess is that stripe'ing will take less 
time to open since less region machinations going on (less .regioninfos to 
open, less looking in fs for stuff to clean up since last file open, less 
listing of storefiles under families).

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-09 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575309#comment-13575309
 ] 

Matt Corgan commented on HBASE-7667:


{quote}So you are in favor of Sergey's project?{quote}oh yes, if you could not 
tell.  Thinking of an HBASE-7667 tatoo =)

One of the few major things hbase is missing in my opinion is the ability to 
load time-series through the normal api, rather than having to go off and write 
some separate bulk load code.  HBase currently takes a dump when you do that.  
Main culprits are HBASE-5479 and my comment in HBASE-3484 
(https://issues.apache.org/jira/browse/HBASE-3484?focusedCommentId=13410934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13410934).
  Even during normal operation as opposed to a one-off import of data, the 
inefficiencies are still happening, just at a less obvious pace.

It may be a follow on to this jira, but having striper dynamically add 
stripes at the end of the region would let allow all the stripes before the 
last one go cold which is critical for avoiding hugely wasteful compactions 
of non-changing data.  Ideally, it would be able to allocate small stripes as 
new data comes in (each flush?) and then later go on to merge older stripes to 
reduce hfile count (at major compaction time?).  With this in place on an N 
node cluster, you could partition your data with N or 2N regions using a hash 
prefix and basically let the regions grow infinitely large.  Currently I have 
to limit region size to ~2GB which results in hundreds of regions per node 
which is a bit of a management hassle because it's beyond human readable, and a 
bit wasteful with all the empty memstores among other things.

I do wonder if there's a more accurate name than stripe.  Stripes make me think 
of RAID stripes which is a different concept than sub-regions.  Sub-region is 
not a good name either though.

It would be cool if you could set a column family attribute like 
layout=TIME_SERIES which HBase could use to automatically pick the compaction 
strategy, split-point strategy, balancer strategy, and allow future niceties 
like using stronger compression on old data.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575312#comment-13575312
 ] 

Lars Hofhansl commented on HBASE-7667:
--

Stripes can have overlapping keyrange with other stripes, correct? I.e. if two 
L0 files are compacted with a L0-compation their each L0 is striped but since 
the L0-files could overlap, the stripes could too with other stripes from 
different L0 files.

A nice property in LevelDB is that only L0 files have overlapping keyspaces 
with other files, all level  L0 have no overlapping keys within a level and no 
file at level L overlaps more than 10 files at L+1.

Right now only major compactions can remove delete markers because only by 
looking at all data you can guarantee that you will see each KV that might be 
affected by the delete marker.
It's not clear to me how we get around this, unless we introduce a formal 
notion of levels and know which L+1 files overlap with a file at level L.

Would love to discuss more at the PowWow on the Feb 19th.


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-08 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575012#comment-13575012
 ] 

Jimmy Xiang commented on HBASE-7667:


A stripe is like a sub-region. That's a good idea.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-08 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575024#comment-13575024
 ] 

Matt Corgan commented on HBASE-7667:


Right now there is a tug-of-war between region size and number of regions.  

You want fewer regions for:
* server startup/shutdown
* minimizing RPC calls
* having fewer/bigger memstore flushes
* fewer open files per server

You want more regions for:
* spreading load among servers
* having more efficient compactions

{quote}spreading load among servers{quote}
Since BigTable was designed machines have grown tremendously, with many running 
24 cpus and 48+GB memory.  These machines can serve many regions even if each 
region is 30GB.  So we can achieve good load distribution even with enormous 
regions.

{quote}having more efficient compactions{quote}
But, huge regions are bad for compactions.  30GB compactino is still considered 
expensive.  Data in HBase is generally not perfectly evenly distributed across 
a table, or if it is you are not taking full advantage of HBase's sorted 
architecture.  Huge regions therefore have hot and cold stripes/sub-regions.  
If you compact a 30GB region, you are wasting a ton of time on the cold stripes.

Overall, I think the sub-region approach cuts the rope in the tug-of-war.  It 
lets you have a smaller number of regions at the same time as having the most 
efficient compactions.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent 

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562257#comment-13562257
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

What do you guys think about the general idea?


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562268#comment-13562268
 ] 

Matt Corgan commented on HBASE-7667:


So a stripe is like a sub-region?  In terms of compactions, it sounds like it 
serves the same purpose as splitting a table into regions, so you can compact a 
region that is hot while letting cold regions stay cold.

If this is the case and stripes are allowed to auto-split, it may be very 
beneficial for time series data.  If you had a region approaching 10gb with 
100mb stripes, the last stripe would keep splitting and the first 99 would 
never get touched.  The problem without stripes is that the first 9900mb keeps 
getting re-written even though it never changes.

Am i understanding it correctly that stripes in a region would act similarly to 
regions in a table?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562273#comment-13562273
 ] 

chunhui shen commented on HBASE-7667:
-

Interesting ideas.
With stripe compaction, we could support many files in one region, each file 
belongs to one stripe, and no overlap keys cross stripes except LO, is it right?

I think it is useful for the sequential write scenario.

bq.if we could move files between regions, no references would be needed)
Moving files would break snapshots, references are needed all the same

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562278#comment-13562278
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

Hmm, my thinking would be that number of stripes will be fixed and we would 
rebalance, but never split as such. Yes, the idea for sequential data plays 
very well into this, the code will probably be almost the same. With non-seq 
data, it would just try to achieve effect similar to level.


 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562286#comment-13562286
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

If there are no general objections I will try to start on this tomorrow, in 
preference to level... I am looking at some random HCM issues now so patch may 
be next week provided that HBASE-7516 and HBASE-7603 can go in.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562290#comment-13562290
 ] 

Matt Corgan commented on HBASE-7667:


I've been brainstorming something similar for splitting the memstore into 
stripes, mentioned in 
https://issues.apache.org/jira/browse/HBASE-3484?focusedCommentId=13410934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13410934

I think it's a good idea now that region sizes have become so large.  It's easy 
to have a few hot stripes in a region if it's 10GB, and not necessarily wrong 
from a primary-key design perspective.  It's often very wasteful to be 
compacting the whole region.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira