subject:"\[jira\] \[Commented\] \(HBASE\-7667\) Support stripe compaction"

[jira] [Commented] (HBASE-7667) Support stripe compaction

2014-01-29 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885673#comment-13885673
]

Sergey Shelukhin commented on HBASE-7667:
-

the docs were committed to the book as part of HBASE-9854; sorry for late reply

Support stripe compaction
-

Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction
perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe
compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe
compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf,
Using stripe compactions.pdf, stripe-cdf.pdf

So I was thinking about having many regions as the way to make compactions
more manageable, and writing the level db doc about how level db range
overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy,
Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication
factor.
And I suggest the following idea, let's call it stripe compactions. It's a
mix between level db ideas and having many small regions.
It allows us to have a subset of benefits of many regions (wrt reads and
compactions) without many of the drawbacks (managing and current
memstore/etc. limitation).
It also doesn't break seqNum-based file sorting for any one key.
It works like this.
The region key space is separated into configurable number of fixed-boundary
stripes (determined the first time we stripe the data, see below).
All the data from memstores is written to normal files with all keys present
(not striped), similar to L0 in LevelDb, or current files.
Compaction policy does 3 types of compactions.
First is L0 compaction, which takes all L0 files and breaks them down by
stripe. It may be optimized by adding more small files from different
stripes, but the main logical outcome is that there are no more L0 files and
all data is striped.
Second is exactly similar to current compaction, but compacting one single
stripe. In future, nothing prevents us from applying compaction rules and
compacting part of the stripe (e.g. similar to current policy with rations
and stuff, tiers, whatever), but for the first cut I'd argue let it major
compact the entire stripe. Or just have the ratio and no more complexity.
Finally, the third addresses the concern of the fixed boundaries causing
stripes to be very unbalanced.
It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the
results out with different boundaries.
There's a tradeoff here - if we always take 2 adjacent stripes, compactions
will be smaller but rebalancing will take ridiculous amount of I/O.
If we take many stripes we are essentially getting into the
epic-major-compaction problem again. Some heuristics will have to be in place.
In general, if, before stripes are determined, we initially let L0 grow
before determining the stripes, we will get better boundaries.
Also, unless unbalancing is really large we don't need to rebalance really.
Obviously this scheme (as well as level) is not applicable for all scenarios,
e.g. if timestamp is your key it completely falls apart.
The end result:
- many small compactions that can be spread out in time.
- reads still read from a small number of files (one stripe + L0).
- region splits become marvelously simple (if we could move files between
regions, no references would be needed).
Main advantage over Level (for HBase) is that default store can still open
the files and get correct results - there are no range overlap shenanigans.
It also needs no metadata, although we may record some for convenience.
It also would appear to not cause as much I/O.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2014-01-27 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883479#comment-13883479
]

Andrew Purtell commented on HBASE-7667:
---

Any doc updates available? 0.98.0RC1 is open.

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-12-09 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843340#comment-13843340
]

Sergey Shelukhin commented on HBASE-7667:
-

With 98 coming so soon, probably not.[~stack] wdyt?

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-12-09 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843355#comment-13843355
]

stack commented on HBASE-7667:
--

Yeah. Too late for 0.96. We are trying to get back on to a bugs-only in
point releases praxis -- unless there a citizen revolt. Also need reason for
folks to upgrade to 0.98!

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-12-08 Thread Otis Gospodnetic (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842713#comment-13842713
]

Otis Gospodnetic commented on HBASE-7667:
-

Btw. is this going to get into any 0.96.x releases by any chance? Thanks.

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-29 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808236#comment-13808236
]

stack commented on HBASE-7667:
--

bq. Yes, unless there's non-uniform key access there will be more, but smaller,
compactions.

As per [~srivas] supposition above I suppose.

bq. You probably don't want to turn it on by default, as it makes sense either
for non-uniform data or for large regions.

So for what cases should we turn it on?

bq. The documents, one of them targeted at users (ans all of them out of date),
are attached to this very JIRA

Pardon me. I've reviewed a bunch of this feature -- docs and code -- and am
just having trouble quantifying the benefit this slew of new code brings in.
Sorry if I am being thick.

If I read the attached user doc, will it be clear (though it is out of date?)
Let me try it.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf,
Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf,
Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf,
Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe
compactions.pdf, Using stripe compactions.pdf

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-29 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808248#comment-13808248
]

stack commented on HBASE-7667:
--

If I read the user doc., it says:

This improves readperformance in common scenarios and greatly reduces
variability, by avoiding large and/orinefficient compactions

If I read further, there is no clear message on when enabling stripe
compactions makes sense. Doc is missing a section on when NOT to use stripe
compactions.

The doc. has detail but seems like a bunch no longer applies after recent
reworkings (as you say above).

Do you have some stats on how it can improve life under certain workloads and
what those workloads are?

My concern is that a bunch of code will go into hbase and it will sit there
unused. We have enough of that already. I'd like to have some clear messaging
around this feature, both how it can benefit, and also how a user could enable
it and see the effects of its workings.

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-29 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808404#comment-13808404
]

Sergey Shelukhin commented on HBASE-7667:
-

[~stack] I understand your concern. I will get to documenting it this or next
week, so it should be able to get at least into 98.
As far as I know, there are some people who wanted to try it out,and also
there's timestamp compaction jira which might be obviated... let's see if as
experimental feature it can get adoption. It's pretty well isolated now, so it
should be easy to remove later, or move into separate module out of the way.

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-28 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806903#comment-13806903
]

Sergey Shelukhin commented on HBASE-7667:
-

I have run the maven tests and rebased all patches (no changes except a tiny
one in compator)... if no objections I will commit HBASE-7679, HBASE-7680,
HBASE-7967, HBASE-8000 to trunk today afternoon if there are no objections by
then.

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-28 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807290#comment-13807290
]

stack commented on HBASE-7667:
--

So, stripe compactions does more i/o unless it is the time series use case? I
cannot turn this on by default? Where do I go to read on benefits of this new
addition? Thanks [~sershe]

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-10-28 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807397#comment-13807397
]

Sergey Shelukhin commented on HBASE-7667:
-

Yes, unless there's non-uniform key access there will be more, but smaller,
compactions.
HBASE-8541 makes for less IO amplification, Enis is reviewing it, it will
probably follow quickly.

You probably don't want to turn it on by default, as it makes sense either for
non-uniform data or for large regions.

The documents, one of them targeted at users (ans all of them out of date), are
attached to this very JIRA ;)
The configuration was simplified quite a bit compared to the state of the doc.
Let me file a JIRA to document things in e.g. the book.
For the first release it will be positioned as an experimental feature...

Support stripe compaction
-

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-21 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690757#comment-13690757
]

Sergey Shelukhin commented on HBASE-7667:
-

Size-based scheme works by splitting stripes when they grow big. This splitting
is good for sequential sharded keys, because lower part of the split is written
as one file, and doesn't receive new data (or doesn't receive a lot of it
anyway), so it doesn't have to participate in compactions. If you have uniform
data, splitting result in more rewriting and both stripes keep growing after
the split.

Support stripe compaction
-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-21 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690761#comment-13690761
]

Sergey Shelukhin commented on HBASE-7667:
-

So it's easier to configure (just say I want 500Mb-1Gb-... stripes) but in the
net, results in more I/O during initial data population before region reaches
stable size.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-20 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688945#comment-13688945
]

stack commented on HBASE-7667:
--

Rereading the design doc and how-to-use. They are very nice. Can go into the
book.

High-level, and I think you have suggested this yourself elsewhere, it'd be
coolio if user didn't have to choose between size and count -- if it'd just
figure itself based off incoming load.

I've seen case where a compaction produces a zero-length file (all deletes) so
would that mess w/ this invariant: Compaction mustproduce at least
one file(seeHBASE-6059). or ...No stripe can everbe
leftwith0 files...

I almost asked a few questions you'd already answered above in my previous read
of the doc (smile).

How would region merge work? We'd just drop all files into L0? Sounds like
we'd have to drop references if we are not to break snapshotting.

You think this true? stripescheme useslarger number of
files than
default to ensure all compactions are small, which can
affect verywidescans. Any measure of how much?

Should stripe be on by default? Or have it as experimental for now until we
get more data?

How to use doc is excellent (though too many configs). Will review patch again
next.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-20 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689954#comment-13689954
]

Sergey Shelukhin commented on HBASE-7667:
-

I'm looking at how to merge policies. Unfortunately splitting uses more I/O
than not splitting (who would have though... :)), resulting in worse perf.
Also, the system cannot really predict future data patterns, no more than
region splitting can do it (at least not with a lot of complexity added), so
hint flag for how to split would need to be provided.

W.r.t. producing files to contain metadata, that is unfortunately necessary.
These files shouldn't have effect. Stripes with only expired files can be
merged. I've taken a stab at auto-detecting stripes from file metadata, in
general case it's very complex, in simplified realistic case it's just complex.

Merge will drop everything into L0, yes. This could be improved, but has to be
done now anyway due to references, same with split, so no need to do it now.

On-by-default would require smart default settings.

Let me comment tomorrow on HBASE-7680, if I can make a size-count hybrid
quickly I will post final patch without a lot of logic changes there, and
hopefully we can commit 3 initial patches and build on top of that.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-06-20 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690015#comment-13690015
]

stack commented on HBASE-7667:
--

bq. splitting uses more I/O than not splitting

Sorry. You mean stripes uses more i/o because we L0 first then rewrite into
stripes?

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-05-10 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654572#comment-13654572
]

Jonathan Hsieh commented on HBASE-7667:
---

[~sershe], thanks for the graph. Seeing that, I think a avg/std deviation
could be an even simpler way of showing where this compaction approach
demonstrates a win. It looks like the variance of times will be significantly
higher with default and it seems that avg time would be about the same.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-05-07 Thread Elliott Clark (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651556#comment-13651556
]

Elliott Clark commented on HBASE-7667:
--

Thanks for the doc. Reading this tonight.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-05-05 Thread Raymond (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649312#comment-13649312
]

Raymond commented on HBASE-7667:

great, more region lead to better load balance and good compaction effect, less
region lead to easy management and fast failover, I think stripe (or
sub-region) is a good trade-off.
And I think stripe compaction is similar with Level compaction with L0+L1 only.
Another difficult is about configuration, in big hbase cluster, there are so
many applications, how to build suitable configuation for each one will be a
huge challenge.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction
perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe
compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe
compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf,
Using stripe compactions.pdf

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-04-01 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618967#comment-13618967
]

Sergey Shelukhin commented on HBASE-7667:
-

Currently only one compaction per store is allowed. The need to compact several
stripes in parallel can probably be alleviated by just having less stripes? As
future improvement it is possible to add.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-04-01 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619380#comment-13619380
]

Sergey Shelukhin commented on HBASE-7667:
-

Actually, judging by logs what can be done is triggering compaction thread if
store can compact. In 25-stripe case I see gaps between compactions which are
unnecessary, when the compaction only triggers on flush despite plenty of tiny
stripe compactions being possible

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-30 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618189#comment-13618189
]

Matt Corgan commented on HBASE-7667:

Sergey - i'm curious how are compactions of the stripes being scheduled/queued?
Does a region still make a single region-wide compaction request, and the
compactor picks a single stripe? Or can multiple stripes be in the compaction
queue at once?

Given that regions could be allowed to grow much larger with stripe compaction
enabled it would probably be good to allow multiple stripes to compact in
parallel. Just a thought for another next step... you've probably considered
it already.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-29 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617951#comment-13617951
]

Sergey Shelukhin commented on HBASE-7667:
-

I did a c1.xlarge test (with default, 3, 10 and 25 stripes, 2 times each). The
results for different stripe configurations are very consistent across both
runs.
Compared to m1.large test the positive effect of increasing number of stripes
on write speed is less.

For this load, sweet spot appears to be around 10-12 stripes based on two
tests. 3 stripes have large compactions similar to default (well, not as
large); 25 stripes does too many small compactions, so select-compact loop
cannot keep up with the number of files produced - on Iteration 2 test
described in the doc at least some stripes in 25-stripe case always have 6-8
small files (as they get compacted other stripes get more files). This appears
to be the limiting factor on increasing the number of stripes.
I think the main point is that, for count scheme, there's perf parity (writes
are generally slightly slower, reads slightly faster), despite existing and
fixable write amplification; and there's reduction of variability, which was
the goal. I will try to devise a more realistic read workload, but I don't
think it should change much given above.
For sequential data, with size-based stripe scheme there's reduction in
compactions, as expected, despite even L0.

Next steps:
1) On existing data I want to correlate read/write perf with compactions. It is
interesting that stripe scheme has slower writes in general, as Jimmy has noted
- it touches read path but not anything at all on write path, so it is probably
I/O related, or stresses some interaction between existing write and compaction
paths.
2) Run tests for more realistic read workloads (and parallel read/writes), by
not using LoadTestTool? Optional-ish.
3) Clean up integration test patch in HBASE-8000.
4) Review and commit? :)

5) Get rid of L0?

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-29 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617956#comment-13617956
]

Ted Yu commented on HBASE-7667:
---

bq. 5) Get rid of L0?
Can we do this frist ?

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-28 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616654#comment-13616654
]

Sergey Shelukhin commented on HBASE-7667:
-

I will update the doc, although in this case m1.large moderate IO capacity
works even better for the test, making it easier to simulate IO-constrained
cluster on such small number of nodes/time frame.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-28 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616976#comment-13616976
]

Sergey Shelukhin commented on HBASE-7667:
-

Btw, the 3 next child JIRAs are rady for review. Please feel free to +1 them, I
will only commit all 3 together and with integration test included.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-26 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614805#comment-13614805
]

Andrew Purtell commented on HBASE-7667:
---

I'd recommend retesting with c1.xlarge instance types, this will get you a lot
closer to real hardware IMHO. The IO capability of the c1.xlarge is high vs.
only moderate for m1.large and the c1.xlarge has 8 vcores as opposed to _2
only_ for the m1.large. The c1.xlarge will have 4 locally attached
instance-store volumes while the m1.large has only 2 IIRC. Also, I didn't see
it mentioned in the perf doc but you should use only the locally attached
instance store volumes as datanode storage volumes to avoid variance introduced
by EBS.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread M. C. Srivas (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612653#comment-13612653
]

M. C. Srivas commented on HBASE-7667:
-

@mcorgan and @stack:

The total i/o in terms of i/o bandwidth consumed is the same. But the disk iops
are much, much worse. And disk iops are at a premium, and bg activity like
compactions should consume as few as possible.

Let's say we split a region into a 100 sub-regions, such that each sub-region
is in the few 10's of MB. If the data is written uniformly randomly, each
sub-region will write out a store at approx the same time. That is, a RS will
write 100x more files into HDFS (100x more random i/o on the local
file-system). Next, all sub-regions will do a compaction at almost the same
time, which is again 100x more read iops to read the old stores for merging.

One can try to stagger the compactions to avoid the sudden burst by
incorporating, say, a queue of to-be-compacted-subregions. But while the
sub-regions at the head of the queue will compact in time, the ones at the
end of the queue will have many more store files to merge, and will use much
more than their fair-share of iops (not to mention that the
read-amplification in these sub-regions will be higher too). The iops profile
will be worse than just 100x.

Support stripe compaction
-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612808#comment-13612808
]

stack commented on HBASE-7667:
--

[~srivas]

bq. But the disk iops are much, much worse.

See Sergey's writeup. We flush same as we always did writing a single file to
the L0 tier. It is later at compaction time -- i.e. NOT random i/o -- that
we'd write a file per sub-region/stripe. If the write is evenly distributed,
we'd do the same overall i/o except with stripe compacting it would be done in
smaller bite sizes ([~sershe] Would the compaction of stripes run in //?
Hopefully, for the case Srivas describes, we'd progress serially through the
stripes/sub-regions or at least it would be an option and then later,
ergonomically, we'd recognize the even-loading case and add compaction
accordingly)

You have a point that we will be making more files in the fs.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612847#comment-13612847
]

Matt Corgan commented on HBASE-7667:

{quote}a RS will write 100x more files into HDFS (100x more random i/o on the
local file-system){quote}I think this is a point of confusion. A typical HBase
file could be 4MB to 40GB, where those files are a series of 4KB (very small)
underlying disk blocks. Ignoring complexities of multiple tasks running
simultaneously on the regionserver, only the first 4KB block of each file is a
random write, while the following blocks are sequential writes. The few extra
random writes should be lost in the noise of all the other random IO requests
happening.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612956#comment-13612956
]

Sergey Shelukhin commented on HBASE-7667:
-

bq. The first half of the sentence seems to be incomplete.
bq. I think I understand what you mean - such KVs would be written by previous
writer.
bq. Missing author, date, and JIRA pointer?
bq. I think you need to say stripe == sub-range of the region key range. You
almost do. Just do it explicitly.
bq. What does this mean andold boundaries rarely, if ever,
moving.? Give doc an edit?
bq. Say in doc that you mean storefile metadata else it is ambiguous.
bq. Not sure I follow here: This compaction is performed
whenthe number of L0 files exceeds somethreshold
and producesthe number of files equivalent to
the number of stripes,withenforcedexisting
boundaries.
Fixed these.

bq. An interesting comment by LarsH recently was that maybe we should ship w
/major compactions off; most folks don't delete
Hmm... in general I agree but we'll have to insert really good warnings
everywhere. Can we detect if they delete? :)

bq. Missing is one a pointer at least to how it currently works (could just
point at src file I'd say with its description of 'sigma' compactions) and a
sentence on whats wrong w/ it

bq. Later I suppose we could have a combination of count-based and
size-based if an edge stripe is N time bigger than any other, add a new
stripe?
Yeah, it's mentioned in code comment somewhere.

bq. I was wondering if you could make use of liang xie's bit of code for making
keys for the block cache where he chooses a byte sequence that falls between
the last key in the former block and the first in the next block but the key is
shorter than either. but it doesn't make sense here I believe; your
boundaries have to be hard actual keys given inserts are always coming in
so nevermind this suggestion.
For boundary determination it does make sense; can you point at the code? After
cursory look I cannot find it.

bq. You write the stripe info to the storefile. I suppose it is up to the
hosting region whether or not it chooses to respect those boundaries. It could
ignore them and just respect the seqnum and we'd have the old-style storefile
handling, right? (Oh, I see you allow for this -- good)
Yes.

bq. Thinking on L0 again, as has been discussed, we could have flushes skip L0
and flush instead to stripes (one flush turns into N files, one per stripe) but
even if we had this optimization, it looks like we'd still want the L0 option
if only for bulk loaded files or for files whose metadata makes no sense to the
current region context. • The aggregate range of files
going in mustbe contiguous... Not sure I follow. Hmm... could
do with going into a compaction
Yes, that was my thinking too.

bq. If the stripe boundaries are changed by compaction,
the entire stripes withold boundaries mustbe
replaced ...What would bring this on? And then how would old boundaries get
redone? This one is a bit confusing.
Clarified. Basically one cannot have 3 files in (-inf, 3) and 3 in [3, inf),
then take 3 and 2 respectively, and rewrite them with boundary 4, because then
there will be a file with [3, inf) remaining that overlaps.

bq. I was going to suggest an optimization for later for the case that an L0
fits fully inside a stripe, I was thinking you could just 'move' it into its
respective stripe... but I suppose you can't do that because you need to write
the metadata to put a file into a stripe...
Yeah. Also wouldn't expect it to be a common case.

bq. Would it help naming files for the stripe they belong too? Would that
help? In other words do NOT write stripe data to the storefiles and just let
the region in memory figure which stripe a file belongs too. When we write, we
write with say a L0 suffix. When compacting we add S1, S2, etc suffix for
stripe1, etc. To figure what the boundaries of an S0 are, it'd be something
the region knew. On open of the store files, it could use the start and end
keys that are currently in the file metadata to figure which stripe they fit in.
bq. Would be a bit looser. Would allow moving a file between stripes with a
rename only. The delete dropping section looks right. I like the major
compaction along a stripe only option.
This could be done as future improvement. The implications of change of naming
scheme for other parts of the systems need to be determined.
Also for all I know it might break snapshots (moving files does). And, code to
figure ut stripes on the fly would be more complex.

bq. Foremptyranges,empty files are

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613004#comment-13613004
]

stack commented on HBASE-7667:
--

HBASE-7845 optimize hfile index key is the key/boundary determination work
I was referring to (I don't think it applies here but adding the reference
since you asked for it)

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-25 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613454#comment-13613454
]

Lars Hofhansl commented on HBASE-7667:
--

bq. An interesting comment by LarsH recently was that maybe we should ship w
/major compactions off; most folks don't delete

Hmm... I don't doubt that I said this, but I'm not sure that I agree :) Many
people do delete and just not removing the delete markers would be unexpected.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread M. C. Srivas (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611728#comment-13611728
]

M. C. Srivas commented on HBASE-7667:
-

There is of course one major caveat with this approach. If data insertion is
uniformly spread (ie, key is uniform random), this proposal performs much worse
than the existing scheme.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611820#comment-13611820
]

Sergey Shelukhin commented on HBASE-7667:
-

There are two approaches discussed, one similar to having many small regions
and one for sequential data. Which one do you mean? I am testing the first one
with uniformly distributed keys now and it's somewhat slower than default case
on average (on writes mostly) but has no big compaction associated latency
spikes... I suspect if there was not so much compaction due to L0 the write
slowness could also be alleviated.
I haven't tested the 2nd one yet (requires a more custom test, next week) but
it's very specialized, for sequential data, so yes it is not good for common
case.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611843#comment-13611843
]

stack commented on HBASE-7667:
--

bq. There is of course one major caveat with this approach. If data insertion
is uniformly spread (ie, key is uniform random), this proposal performs much
worse than the existing scheme.

[~srivas] Why? Won't it do same total i/o?

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611844#comment-13611844
]

stack commented on HBASE-7667:
--

{code}
On attached doc, it is lovely.

Missing author, date, and JIRA pointer?

An interesting comment by LarsH recently was that maybe we should ship w /major
compactions off; most folks don't delete

Missing is one a pointer at least to how it currently works (could just point
at src file I'd say with its description of 'sigma' compactions) and a sentence
on whats wrong w/ it
or the problems it leads too when left run amok (you say it for major
compactions but even w/o major compactions enabled, an i/o tsunami can hit and
wipe us out

What does this mean andold boundaries rarely, if ever,
moving.? Give doc an edit?

I think you need to say stripe == sub-range of the region key range. You
almost do. Just do it explicitly.

I see your extra justification for l0, the need to be able to bulk load. It is
kinda important that we continue to support that. Good one.

Later I suppose we could have a combination of count-based and size-based
if an edge stripe is N time bigger than any other, add a new stripe?

I was wondering if you could make use of liang xie's bit of code for making
keys for the block cache where he chooses a byte sequence that falls between
the last key in the former block and the first in the next block but the key is
shorter than either. but it doesn't make sense here I believe;
your boundaries have to be hard actual keys given inserts are always coming
in so nevermind this suggestion.

You write the stripe info to the storefile. I suppose it is up to the hosting
region whether or not it chooses to respect those boundaries. It
could ignore them and just respect the seqnum and we'd have the old-style
storefile handling, right? (Oh, I see you allow for this -- good)

Say in doc that you mean storefile metadata else it is ambiguous.

Thinking on L0 again, as has been discussed, we could have flushes skip L0 and
flush instead to stripes (one flush turns into N files, one per stripe)
but even if we had this optimization, it looks like we'd still want the L0
option if only for bulk loaded files or for files whose metadata makes
no sense to the current region context.

• The aggregate range of files going in mustbe
contiguous... Not sure I follow. Hmm... could do with going into a
compaction

If the stripe boundaries are changed by compaction,
the entire stripes withold boundaries mustbe
replaced ...What would bring this on?
And then how would old boundaries get redone? This one is a bit confusing.

Get key before is a PITA

Not sure I follow here: This compaction is performed when
the number of L0 files
exceeds somethreshold and producesthe number of
files equivalent to the number
of stripes,withenforcedexistingboundaries.

I was going to suggest an optimization for later for the case that an L0 fits
fully inside a stripe, I was thinking you could just 'move' it into
its respective stripe... but I suppose you can't do that because you need to
write the metadata to put a file into a stripe...

Would it help naming files for the stripe they belong too? Would that help?
In other words do NOT write stripe data to the storefiles and just
let the region in memory figure which stripe a file belongs too. When we
write, we write with say a L0 suffix. When compacting we add S1, S2,
etc suffix for stripe1, etc. To figure what the boundaries of an S0 are, it'd
be something the region knew. On open of the store files, it could
use the start and end keys that are currently in the file metadata to figure
which stripe they fit in.

Would be a bit looser. Would allow moving a file between stripes with a rename
only.

The delete dropping section looks right. I like the major compaction along a
stripe only option.

Foremptyranges,empty files are created. Is this
necessary? Would be good to avoid doing this.

{code}

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-23 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611909#comment-13611909
]

Matt Corgan commented on HBASE-7667:

{quote}If data insertion is uniformly spread (ie, key is uniform random), this
proposal performs much worse than the existing scheme.{quote}I think the goal
for uniformly random keys is to have the same amount of total work done but to
stagger that work. Instead of doing 1 big 24 GB compaction per day, it could
do a 1 GB compaction each hour.

The savings/efficiency become more pronounced with less random keys, with the
biggest savings for sequential keys.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-22 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611324#comment-13611324
]

Sergey Shelukhin commented on HBASE-7667:
-

Hmm... I am starting to think that we might want to consider getting rid of L0
after all. Somehow it escaped me that L0 gives you x2 write amplification right
there as all data has to be re-striped. Small files actually don't increase the
number of files for gets/non-overlapping scans because each L0 file still
counts for each stripe.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-03-21 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609873#comment-13609873
]

Ted Yu commented on HBASE-7667:
---

Nice document, Sergey.

bq. This can obviously be improved since most (or all) stripes, see future
improvements.
The first half of the sentence seems to be incomplete.

bq. Before starting a new writer, compactor ensures that all the KVs for the
last row in the previous writer go to the previous writer.
I think I understand what you mean - such KVs would be written by previous
writer.

Support stripe compaction
-

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-14 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578445#comment-13578445
]

Jimmy Xiang commented on HBASE-7667:

Getting rid of L0 means memstore flushing will take longer and hold update/read
longer?

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-14 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578513#comment-13578513
]

stack commented on HBASE-7667:
--

bq. Getting rid of L0 means memstore flushing will take longer and hold
update/read longer?

Yes but on back side, less compactions (no need to compact on region open since
no half files/references) and for many workloads, less compactions overall
because only a subset of files will be picked up from L0 tier.

Can do work too to minimize how much we hold flushes. Could be dumb and on
flush, inline, examine key spread so can make ruling on where to insert
boundaries (figuring boundaries would be for first flush only? Or I suppose
boundary making would be ongoing over the life of a region... four boundaries
per region seems like a nice number to work with)... so this would be a
scan of the 64MB memstore keys or, we could keep a running tally as we
insert into the memstore so at flush time we knew where the boundaries were...
could do stuff like // request to NN to open the files to flush too.

In general, we want to add smarts around flushing (stripes, in-memory
compacting, etc.) so eventually we are going to have this friction on flush
(IMO).

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578037#comment-13578037
]

Sergey Shelukhin commented on HBASE-7667:
-

If the user copies the file, few things may happen. If he uses the non-stripe
compaction, files will just get compacted into fewer files eventually. If all
files are copied, stripe compactions are used and stripe config is different,
SFM will load old stripes and eventually re-stripe the data to conform to new
config, if necessary. That will depend on compactionpolicy implementation. If
part of the files are copied (whatever sense that makes), either the above will
still happen, or SFM will give up on metadata, and put all files to L0,
re-striping them according to new config after some time.

SFM needs metadata from all files, but as far as I see from HStore that is
already loaded, because other code makes use of metadata without special
considerations. One thing is that with more files there will be more to open...

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578087#comment-13578087
]

Matt Corgan commented on HBASE-7667:

Sergey - I'm curious what's the reasoning behind flushing memstore to a single
L0 file rather than splitting the memstore into the stripes during each flush?
Keep flushes faster, less files, etc?

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578090#comment-13578090
]

Sergey Shelukhin commented on HBASE-7667:
-

My reasoning was - too many really tiny files, plus scope creep into memstore.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578099#comment-13578099
]

stack commented on HBASE-7667:
--

Would it simplify the filesystem implementation if you did the split in memory
(caveat the scope creep up into memstore) so no special L0 tier? Regards files
being too small, that is a different issue (e.g. a Toddea -- Todd+idea -- that
I tripped over recently in an issue here is that rather than flush immediately,
that we'd instead do a purge and compaction in memory before flushing to ensure
the content large enough to make a file... but yeah, that'd be something else).

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578105#comment-13578105
]

Matt Corgan commented on HBASE-7667:

Gotcha. Agree about limiting scope. If the special L0 tier turns out to be
more difficult to implement than originally intended for whatever reason, might
be worth evaluating splitting during flush. Seems like the same number of
files might get created anyway when you split the L0 file? Or do you plan on
doing some logical striping across the L1 boundary as Nicolas says above
where the L0 files are never truly split?

Like Stack mentions, longer term I think we'll need to split memstore while in
use, and those splits should probably have some alignment with these stripe
boundaries. For another day...

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578108#comment-13578108
]

Sergey Shelukhin commented on HBASE-7667:
-

L0 actually should be relatively easy to implement, it's the special cases
about all the possible stripe boundaries that cause all the complexity.
L0 files will hopefully not be split individually, but as soon as we reach some
number of files, similarly to level db algorithm.
But yeah with insta-stripe solution we could get rid of L0 and also get less
files for reads. Could be future improvement.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-13 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578192#comment-13578192
]

stack commented on HBASE-7667:
--

I'd like to get rid of L0 so can do splitting w/o resorting to file References
and half hfiles (smile) but yes, could be future as you say Sergey.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576798#comment-13576798
]

Sergey Shelukhin commented on HBASE-7667:
-

The stripe boundaries are supposed to be an internal detail not visible to the
user, the user only configured the scheme (#of stripes, or size-based stripes
as described for time series data).
What kind of stripe configuration do you have in mind?
After talking about splits yesterday I am planning to make it a little more
flexible for splits, but still internal.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576826#comment-13576826
]

stack commented on HBASE-7667:
--

bq. Currently Sergey puts the boundaries in metadata of store files.

What is wrong w/ this? It is the bounds of the keys the file contains? (Is
this not there already, the first and last key?)

What are the thoughts on figuring region stripe boundaries. Will it be done by
looking at the memstore content just before flush and dividing it into n files?
Or will it be done after the first flush on subsequent compactions by looking
at content of L0?

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576829#comment-13576829
]

Sergey Shelukhin commented on HBASE-7667:
-

In the N stripes scheme, boundaries will be determined at first L0
compaction. I am intending to add parameter to config to make first L0
compaction wait for more files to get better ones. Then, the stripes can be
rebalanced but the hope is that it happens infrequently.
In the growing key range scheme (for time series above) stripe boundaries
don't matter much, and are basically determined based on size. If nothing
intervenes, by EOW or next week I hope to have preliminary compaction
policy/compactor patch.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576836#comment-13576836
]

Ted Yu commented on HBASE-7667:
---

bq. It is the bounds of the keys the file contains? (Is this not there already,
the first and last key?)
Currently HFile provides the following methods:
{code}
byte[] getFirstRowKey();

byte[] getLastRowKey();
{code}
Stripe boundaries should be different from the above.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576845#comment-13576845
]

stack commented on HBASE-7667:
--

[~ted_yu] You raise a 'concern' that is nebulous to shoot down a suggested
implementation and then when asked why you do not answer the question asked.

[~sershe] Would it be easier looking at memstore than at content of L0? Would
have to do appropriate weighting... in that there will be hot spots in the
keyspace and these should get more stripes. Rejiggering the stripes joining up
the infrequently used and allotting more stripes to the hot keyspace area
sounds right. Good stuff.

Maybe its worth writing up a doc at this stage. This issue is full of
goodness. A doc could distill it out and make it easier on the digestion and
easier to think about it.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576881#comment-13576881
]

Sergey Shelukhin commented on HBASE-7667:
-

Discussed with Ted, what was meant is configuring it like region splitting,
which is not planned at this point.
Doc - probably needed... I will get to it eventually.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-12 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577311#comment-13577311
]

Ted Yu commented on HBASE-7667:
---

Here is my understanding of the difference between existing store file metadata
and the new stripe boundary metadata.
Existing store file metadata (first row key, last row key, etc) is intrinsic to
the store file. i.e., their meaning doesn't change when the store file gets
copied to another cluster.
However stripe boundary doesn't seem to carry the same characteristics.
Let's consider table A1 in cluster C1 and table A2 in cluster C2. They have the
same schema and region boundaries. When store file F1 is copied from C1 to C2,
user may switch to a different striping strategy. The reason could be that
clusters C1 and C2 serve different patterns of load.
Embedding stripe boundary in store file may potentially cause confusion in
cluster C2.

Another factor is that StoreFileManager needs to scan all store files to
establish / validate stripe boundaries when region opens.

Please correct me if I am wrong.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Nicolas Spiegelberg (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575883#comment-13575883
]

Nicolas Spiegelberg commented on HBASE-7667:

Some thoughts I had about this:

Overall, I think it's a good idea. Seems like it's not crazy to add and would
have multiple benefits. Logical striping across the L1 boundary is a simple
solution to both proactively handle splits and reduce compaction times.

Thoughts on this feature
1. Fixed configs : in the same way that we got a lot of stability by limiting
the regions/server to a fixed number, we might want to similarly limit the
number of stripes per region to 10 (or X) instead of every Y bytes. This
will help us understand the benefit we get from striping and it's easy to
double the striping and chart the difference.
2. NameNode pressure : Obviously, a 10x striping factor will cause 10x scaling
of the FS. Can we offset this by increasing the HDFS block size, since
addBlock dominates at scale? Really, unlike Hadoop, you have all of the HFile
or none of it. Missing a portion of the HFile currently invalidates the whole
file. You really need 1 HDFS block == 1 HFile. However, we could probably
just toy with increasing it by the striping factor right now and seeing if that
balances things.
3. Open Times : I think this will be an issue, specifically on server start.
Need to be careful here.
4. Major compaction : you can perform a major compaction (remove deletes) as
long as you have [i,end) contiguous. I don't think you'd need to involve L0
files in an MC at all. Save the complexity. Furthermore, part of the reason
why we created the tiered compaction is to prevent small/new files from
participating in MC because of cache thrashing, poor minor compactions, and a
handful of other reasons.

So, some thoughts on related pain points we seem to have that tie into this
feature:
1. Reduce Cache thrashing : region moves kill us a lot of time because we have
a cold cache. There is a worry that more aggressive compactions mean more
thrashing. I think it will actual even this out better since right now a MC
causes a lot of churn. Just should be thinking about this if perf after the
feature isn't what we desire.
2. Unnecessary IOPS : outside of this algorithm, we should just completely get
rid of the requirement to compact after a split. We have the block cache, so
given a [start,end) in the file, we can easily tell our mid point for future
splits. There's little reason to aggressively churn in this way after
splitting.
3. Poor locality : for grid topology setups, we should eventually make the
striping algorithm a little more intelligent about picking our replicas. If
all stripes go to the same secondary tertiary node, then splits have a very
restricted set of servers to chose for datanode locality.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575936#comment-13575936
 ] 

Sergey Shelukhin commented on HBASE-7667:
-

bq. It may be a follow on to this jira, but having striper dynamically add 
stripes at the end of the region would let allow all the stripes before the 
last one go cold which is critical for avoiding hugely wasteful compactions 
of non-changing data
Actually, it can be added as part of the main work, HBASE-7679 (file 
management) code includes such capabilities. 
I wonder how, no matter the compactions, does region management work for such 
scenario. Wouldn't all the load always be on last region if you have TS keys?
Or, if you have artificial partitioning but query by TS, wouldn't all queries 
go to all servers?

bq.  To major compact a stripe, all L0 files, if any, can be split into 
stripes, then merge all files belonging to the stripe.
Can you explain more about the delete marker limitation?
Suppose in current compaction selection, I choose a set of files starting at 
the oldest file but not including all files.
Wouldn't that be enough to process delete markers that delete the updates 
within those files? Granted, I might not process all delete markers, but I 
don't have to see all files. E.g. if I only have 3 files with one entry for K 
each, K=V, delete K, K=V2, and I compact the first two, I can remove 
entries for K from them, right?

bq. 1. Fixed configs : in the same way that we got a lot of stability by 
limiting the regions/server to a fixed number, we might want to similarly limit 
the number of stripes per region to 10 (or X) instead of every Y bytes. This 
will help us understand the benefit we get from striping and it's easy to 
double the striping and chart the difference.
That is the original idea.

Thanks for other comments :)

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575951#comment-13575951
]

stack commented on HBASE-7667:
--

In a striped region, if stripes = 2 and the key distribution is basically
even, a split could be done w/o references, halfhfiles, and rewriting from
parent to daughter; a split could just rename the parent files into the
daughter regions. It could make for split simplification and possibly make
for some i/o savings.

This is a load of great stuff in this issue. Best read I've had in a long time.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575995#comment-13575995
]

Matt Corgan commented on HBASE-7667:

{quote}Open Times : I think this will be an issue, specifically on server
start. Need to be careful here.{quote}Hopefully could be mitigated by making
regions larger, like doubling region size and setting max 2 stripes/region.
Theoretically should be able to have the same overall number of files as normal
regions, or are there other factors at play?

{quote}Wouldn't all the load always be on last region if you have TS keys? Or,
if you have artificial partitioning but query by TS, wouldn't all queries go to
all servers?{quote}An easy strategy for P partitions is to
* prepend a single byte to each key where prefix=hash(row)%P
* pre-split the table into P regions
* tweak the balancer to evenly spread the tail partitions for each region
* writes get sprayed evenly to all tail partitions
* a single Get query will only hit one region since you know hash(row)%P
beforehand
* you scan all P partitions using a P-way collating iterator
** so yes, scans go to all servers but presumably they are huge and would hit
lots of data anyway
** because they are huge, a client that scans the partitions concurrently will
be faster
* a big multi-Get will spray to the exact servers necessary, possibly all of
them, but like scans may be faster because done in parallel

I'm not sure what most people are doing with time series data but this seems
like a good approach to me. You basically just choose arbitrarily large P. An
MD5 prefix is essentially P=2^128 (I wouldn't recommend pre-splitting at that
granularity).

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576023#comment-13576023
]

Sergey Shelukhin commented on HBASE-7667:
-

bq. a split could just rename the parent files into the daughter regions
I was told this is impossible due to snapshots relying on files not moving
between regions (or on references during splits?)
We just discussed this here, some improvements for splits should definitely be
possible.
bq. so yes, scans go to all servers but presumably they are huge and would hit
lots of data anyway
Yeah, I meant the scans, was assuming scans for TS data mostly. I see.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576044#comment-13576044
]

Jimmy Xiang commented on HBASE-7667:

bq. Wouldn't that be enough to process delete markers that delete the updates
within those files?

I think so, as long as the files not included are newer (like in L0).

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-11 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576296#comment-13576296
]

Ted Yu commented on HBASE-7667:
---

I think we need to find the best way of storing stripe boundaries.

Currently Sergey puts the boundaries in metadata of store files. This is not
flexible. Once this feature is released, users would request ways to configure
stripes in different manners.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-10 Thread Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575449#comment-13575449
]

Ted Yu commented on HBASE-7667:
---

bq. Sub-region is not a good name either though.
Other names we can consider: arena, range, realm, section, sector, zone.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-10 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575578#comment-13575578
]

Jimmy Xiang commented on HBASE-7667:

Stripes don't have overlapping keyrange with other stripes. So each stripe is
just like a sub-region. L0 files are special, could overlap with multiple
stripes.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-10 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575583#comment-13575583
]

Lars Hofhansl commented on HBASE-7667:
--

I see, then a major compaction needs to at least include all current L0 files
(in any), right?

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-10 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575589#comment-13575589
]

Jimmy Xiang commented on HBASE-7667:

That's right. To major compact a stripe, all L0 files, if any, can be split
into stripes, then merge all files belonging to the stripe.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-09 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575206#comment-13575206
]

Jimmy Xiang commented on HBASE-7667:

bq. Overall, I think the sub-region approach cuts the rope in the tug-of-war.
It lets you have a smaller number of regions at the same time as having the
most efficient compactions.

I agree. I was wondering if it will take longer to open a region if there are
many hfiles.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-09 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575297#comment-13575297
]

stack commented on HBASE-7667:
--

[~mcorgan] Nice write up. So you are in favor of Sergey's project?

[~jxiang] If I were to guess, it will take longer than the case where we have
monolithic files that cover the total region namespace as we currently have
(Because there will be more files). If rather than strip'ing inside a region,
we instead had a region per stripe, my guess is that stripe'ing will take less
time to open since less region machinations going on (less .regioninfos to
open, less looking in fs for stuff to clean up since last file open, less
listing of storefiles under families).

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-09 Thread Matt Corgan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575309#comment-13575309
 ] 

Matt Corgan commented on HBASE-7667:


{quote}So you are in favor of Sergey's project?{quote}oh yes, if you could not 
tell.  Thinking of an HBASE-7667 tatoo =)

One of the few major things hbase is missing in my opinion is the ability to 
load time-series through the normal api, rather than having to go off and write 
some separate bulk load code.  HBase currently takes a dump when you do that.  
Main culprits are HBASE-5479 and my comment in HBASE-3484 
(https://issues.apache.org/jira/browse/HBASE-3484?focusedCommentId=13410934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13410934).
  Even during normal operation as opposed to a one-off import of data, the 
inefficiencies are still happening, just at a less obvious pace.

It may be a follow on to this jira, but having striper dynamically add 
stripes at the end of the region would let allow all the stripes before the 
last one go cold which is critical for avoiding hugely wasteful compactions 
of non-changing data.  Ideally, it would be able to allocate small stripes as 
new data comes in (each flush?) and then later go on to merge older stripes to 
reduce hfile count (at major compaction time?).  With this in place on an N 
node cluster, you could partition your data with N or 2N regions using a hash 
prefix and basically let the regions grow infinitely large.  Currently I have 
to limit region size to ~2GB which results in hundreds of regions per node 
which is a bit of a management hassle because it's beyond human readable, and a 
bit wasteful with all the empty memstores among other things.

I do wonder if there's a more accurate name than stripe.  Stripes make me think 
of RAID stripes which is a different concept than sub-regions.  Sub-region is 
not a good name either though.

It would be cool if you could set a column family attribute like 
layout=TIME_SERIES which HBase could use to automatically pick the compaction 
strategy, split-point strategy, balancer strategy, and allow future niceties 
like using stronger compression on old data.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-09 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575312#comment-13575312
]

Lars Hofhansl commented on HBASE-7667:
--

Stripes can have overlapping keyrange with other stripes, correct? I.e. if two
L0 files are compacted with a L0-compation their each L0 is striped but since
the L0-files could overlap, the stripes could too with other stripes from
different L0 files.

A nice property in LevelDB is that only L0 files have overlapping keyspaces
with other files, all level L0 have no overlapping keys within a level and no
file at level L overlaps more than 10 files at L+1.

Right now only major compactions can remove delete markers because only by
looking at all data you can guarantee that you will see each KV that might be
affected by the delete marker.
It's not clear to me how we get around this, unless we introduce a formal
notion of levels and know which L+1 files overlap with a file at level L.

Would love to discuss more at the PowWow on the Feb 19th.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-08 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575012#comment-13575012
]

Jimmy Xiang commented on HBASE-7667:

A stripe is like a sub-region. That's a good idea.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-02-08 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575024#comment-13575024
]

Matt Corgan commented on HBASE-7667:

Right now there is a tug-of-war between region size and number of regions.

You want fewer regions for:
* server startup/shutdown
* minimizing RPC calls
* having fewer/bigger memstore flushes
* fewer open files per server

You want more regions for:
* spreading load among servers
* having more efficient compactions

{quote}spreading load among servers{quote}
Since BigTable was designed machines have grown tremendously, with many running
24 cpus and 48+GB memory. These machines can serve many regions even if each
region is 30GB. So we can achieve good load distribution even with enormous
regions.

{quote}having more efficient compactions{quote}
But, huge regions are bad for compactions. 30GB compactino is still considered
expensive. Data in HBase is generally not perfectly evenly distributed across
a table, or if it is you are not taking full advantage of HBase's sorted
architecture. Huge regions therefore have hot and cold stripes/sub-regions.
If you compact a 30GB region, you are wasting a ton of time on the cold stripes.

Overall, I think the sub-region approach cuts the rope in the tug-of-war. It
lets you have a smaller number of regions at the same time as having the most
efficient compactions.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

--
This message is automatically generated by JIRA.
If you think it was sent

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562257#comment-13562257
]

Sergey Shelukhin commented on HBASE-7667:
-

What do you guys think about the general idea?

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562268#comment-13562268
]

Matt Corgan commented on HBASE-7667:

So a stripe is like a sub-region? In terms of compactions, it sounds like it
serves the same purpose as splitting a table into regions, so you can compact a
region that is hot while letting cold regions stay cold.

If this is the case and stripes are allowed to auto-split, it may be very
beneficial for time series data. If you had a region approaching 10gb with
100mb stripes, the last stripe would keep splitting and the first 99 would
never get touched. The problem without stripes is that the first 9900mb keeps
getting re-written even though it never changes.

Am i understanding it correctly that stripes in a region would act similarly to
regions in a table?

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread chunhui shen (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562273#comment-13562273
]

chunhui shen commented on HBASE-7667:
-

Interesting ideas.
With stripe compaction, we could support many files in one region, each file
belongs to one stripe, and no overlap keys cross stripes except LO, is it right?

I think it is useful for the sequential write scenario.

bq.if we could move files between regions, no references would be needed)
Moving files would break snapshots, references are needed all the same

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562278#comment-13562278
]

Sergey Shelukhin commented on HBASE-7667:
-

Hmm, my thinking would be that number of stripes will be fixed and we would
rebalance, but never split as such. Yes, the idea for sequential data plays
very well into this, the code will probably be almost the same. With non-seq
data, it would just try to achieve effect similar to level.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562286#comment-13562286
]

Sergey Shelukhin commented on HBASE-7667:
-

If there are no general objections I will try to start on this tomorrow, in
preference to level... I am looking at some random HCM issues now so patch may
be next week provided that HBASE-7516 and HBASE-7603 can go in.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Matt Corgan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562290#comment-13562290
]

Matt Corgan commented on HBASE-7667:

I've been brainstorming something similar for splitting the memstore into
stripes, mentioned in
https://issues.apache.org/jira/browse/HBASE-3484?focusedCommentId=13410934page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13410934

I think it's a good idea now that region sizes have become so large. It's easy
to have a few hot stripes in a region if it's 10GB, and not necessarily wrong
from a primary-key design perspective. It's often very wasteful to be
compacting the whole region.

Support stripe compaction
-

Key: HBASE-7667
URL: https://issues.apache.org/jira/browse/HBASE-7667
Project: HBase
Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

79 matches

Mail list logo