re: "A suggestion is perhaps to take all those example/explanation and add them to the book for future reference."
Absolutely! I've been watching this thread with great interest. On 1/14/12 4:30 PM, "Mikael Sitruk" <[email protected]> wrote: >Wow, thank you very much for all those precious explanations, pointers and >examples. It's a lot to ingest... I will try them (at least what i can >with >0.90.4 (yes i'm upgrading from 0.90.1 to 0.90.4)) and keep you informed. >BTW I'm already using compression (GZ), the current data is randomized so >I >don't have so much gain as you mentioned ( i think i'm around 30% only). >It seems that BF is one of the major thing i need to look up with the >compaction.ratio, and i need a different setting for my different CF. (one >CF has small set of column and each update will change 50% --> ROWCOL, the >second CF has always a new column per update --> ROW) >I'm not keeping more than one version neither, and you wrote this is not a >point query. > >A suggestion is perhaps to take all those example/explanation and add them >to the book for future reference. > >Regards, >Mikael.S > > >On Sat, Jan 14, 2012 at 4:06 AM, Nicolas Spiegelberg ><[email protected]>wrote: > >> >I'm sorry but i don't understand, of course i have a disk and network >> >saturation and the flush stop to flush because he is waiting for >> >compaction >> >to finish. Since this a major compaction was triggered - all the >> >stores (large number) present on the disks (7 disk per RS) will be >> >grabbed >> >for major compact, and the I/O is affected. Network is also affected >>since >> >all are major compacting at the same time and replicating files on same >> >time (1GB network). >> >> When you have an IO problem, there are multiple pieces at play that you >> can adjust: >> >> Write: HLog, Flush, Compaction >> Read: Point Query, Scan >> >> If your writes are far more than your reads, then you should relax one >>of >> the write pieces. >> - HLog: You can't really adjust HLog IO outside of key compression >> (HBASE-4608) >> - Flush: You can adjust your compression. None->LZO == 5x compression. >> LZO->GZ == 2x compression. Both are at the expense of CPU. HBASE-4241 >> minimizes flush IO significantly in the update-heavy use case (discussed >> this in the last email). >> - Compaction: You can lower the compaction ratio to minimize the amount >>of >> rewrites over time. That's why I suggested changing the ratio from 1.2 >>-> >> 0.25. This gives a ~50% IO reduction (blog post on this forthcoming @ >> http://www.facebook.com/UsingHBase ). >> >> However, you may have a lot more reads than you think. For example, >>let's >> say read:write ratio is 1:10, so significantly read dominated. Without >> any of the optimizations I listed in the previous email, your real read >> ratio is multiplied by the StoreFile count (because you naively read all >> StoreFiles). So let say, during congestion, you have 20 StoreFiles. >> 1*20:10 means that you're now 2:1 read dominated. You need features to >> reduce the number of StoreFiles you scan when the StoreFile count is >>high. >> >> - Point Query: bloom filters (HBASE-1200, HBASE-2794), lazy seek >> (HBASE-4465), and seek optimizations (HBASE-4433, HBASE-4434, >>HBASE-4469, >> HBASE-4532) >> - Scan: not as many optimizations here. Mostly revolve around proper >> usage & seek-next optimization when using filters. Don't have JIRA >>numbers >> here, but probably half-dozen small tweaks were added to 0.92. >> >> >I don't have an increment workload (the workload either update columns >>on >> >a >> >CF or add column on a CF for the same key), so how those patch will >>help? >> >> Increment & read->update workload end up roughly picking up the same >> optimizations. Adding a column to an existing row is no different than >> adding a new row as far as optimizations are concerned because there's >> nothing to de-dupe. >> >> >I don't say this is a bad thing, this is just an observation from our >> >test, >> >HBase will slow down the flush in case too many store file are present, >> >and >> >will add pressure on GC and memory affecting performance. >> >The update workload does not send all the row content for a certain >>key so >> >only partial data is written, in order to get all the row i presume >>that >> >reading the newest Store is not enough ("all" stores need to be read >> >collecting the more up to date field a rebuild a full row), or i'm >>missing >> >something? >> >> Reading all row columns is the same as doing a scan. You're not doing a >> point query if you don't specify the exact key (columns) you're looking >> for. Setting versions to unlimited, then getting all versions of a >> particular ROW+COL would also be considered a scan vs a point query as >>far >> as optimizations are concerned. >> >> >1. If i did not set a specific property for bloom filter (BF), does it >> >means that i'm not using them (the book only refer to BF with regards >>to >> >CF)? >> >> By default, bloom filters are disabled, so you need to enable them to >>get >> the optimizations. This is by design. Bloom Filters trade off cache >> space for low-overhead probabilistic queries. Default is 8-bytes per >> bloom entry (key) & 1% false positive rate. You can use 'bin/hbase >> org.apache.hadoop.hbase.io.hfile.HFile' (look at help, then -f to >>specify >> a StoreFile and then use -m for meta info) to see your StoreFile's >>average >> KV size. If size(KV) == 100 bytes, then blooms use 8% of the space in >> cache, which is better than loading the StoreFile block only to get a >>miss. >> >> Whether to use a ROW or ROWCOL bloom filter depends on your write & read >> pattern. If you read the entire row at a time, use a ROW bloom. If you >> point query, ROW or ROWCOL are both options. If you write all columns >>for >> a row at the same time, definitely use a ROW bloom. If you have a small >> column range and you update them at different rates/times, then a ROWCOL >> bloom filter may be more helpful. ROWCOL is really useful if a scan >>query >> for a ROW will normally return results, but a point query for a ROWCOL >>may >> have a high miss rate. A perfect example is storing unique hash-values >> for a user on disk. You'd use 'user' as the row & the hash as the >>column. >> Most instances, the hash won't be a duplicate, so a ROWCOL bloom would >>be >> better. >> >> >3. How can we ensure that compaction will not suck too much I/O if we >> >cannot control major compaction? >> >> TCP Congestion Control will ensure that a single TCP socket won't >>consume >> too much bandwidth, so that part of compactions is automatically >>handled. >> The part that you need to handle is the number of simultaneous TCP >>sockets >> (currently 1 until multi-threaded compactions) & the aggregate data >>volume >> transferred over time. As I said, this is controlled by >>compaction.ratio. >> If temporary high StoreFile counts cause you to bottleneck, slight >> latency variance is an annoyance of the current compaction algorithm but >> the underlying problem you should be looking at solving is the system's >> inability to filter out the unnecessary StoreFiles. >> >>
