HBASE-4465 is not needed for correctness. Personally I'd rather release 0.94 sooner rather than backporting non-trivial patches.
I realize I am guilty of this myself (see HBASE-4838... although that was an important correctness fix) -- Lars ________________________________ From: Ted Yu <yuzhih...@gmail.com> To: dev@hbase.apache.org Cc: Mikael Sitruk <mikael.sit...@gmail.com> Sent: Thursday, January 12, 2012 2:09 PM Subject: Re: Major Compaction Concerns Thanks for the tips, Nicolas. About lazy seek, if you were referring to HBASE-4465, that was only integrated into TRUNK and 0.89-fb. I was thinking about backporting it to 0.92 Cheers On Thu, Jan 12, 2012 at 1:44 PM, Nicolas Spiegelberg <nspiegelb...@fb.com>wrote: > Mikael, > > >The system is an OLTP system, with strict latency and throughput > >requirements, regions are pre-splitted and throughput is controlled. > > > >The system has heavy load period for few hours, during heavy load i mean > >high proportion insert/update and small proportion of read. > > I'm not sure about the production status of your system, but you sound > like you have critical need for dozens of optimization features coming out > in 0.92 and even some trunk patches. In particular, update speed has been > drastically improved due to lazy seek. Although you can get incremental > wins with a different compaction features, you will get exponential wins > from looking at other features right now. > > >we fall in the memstore flush throttling ( > >will wait 90000 ms before flushing the memstore) retaining more logs, > >triggering more flush that can't be flushed.... adding pressure on the > >system memory (memstore is not flushed on time) > > Filling up the logs faster than you can flush normally indicates that you > have disk or network saturation. If you have an increment workload, I > know there are a number of patches in 0.92 that will drastically reduce > your flush size (1: read memstore before going to disk, 2: don't flush all > versions). You don't have a compaction problem, you have a write/read > problem. > > In 0.92, you can try setting your compaction.ratio down (0.25 is a good > start) to increase the StoreFile count to slow reads but save Network IO > on write. This setting is very similar to the defaults suggested in the > BigTable paper. However, this is only going to cut your Network IO in > half. The LevelDB or BigTable algorithm can reduce your outlier StoreFile > count, but they wouldn't be able to cut this IO volume down much either. > > >Please remember i'm on 0.90.1 so when major compaction is running minor is > >blocked, when a memstore for a column family is flushed all other memstore > >(for other) column family are also (no matter if they are smaller or not). > >As you already wrote, the best way is to manage compaction, and it is what > >i tried to do. > > Per-storefile compactions & multi-threaded compactions were added 0.92 to > address this problem. However, a high StoreFile count is not necessarily > a bad thing. For an update workload, you only have to read the newest > StoreFile and lazy seek optimizes your situation a lot (again 0.92). > > >Regarding the compaction plug-ability needs. > >Let suppose that the data you are inserting in different column family has > >a different pattern, for example on CF1 (column family #1) you update > >fields in the same row key while in CF2 you add each time new fields or > >CF2 has new row and older rows are never updated won't you use different > >algorithms for compacting these CF? > > There are mostly 3 different workloads that require different > optimizations (not necessarily compaction-related): > 1. Read old data. Should properly use bloom filters to filter out > StoreFiles > 2. R+W. Will really benefit from lazy seeks & cache on write (0.92). Far > more than a compaction algorithm > 3. Write mostly. Don't really care about compactions here. Just don't > want them to be sucking too much IO > > >Finally the schema design is guided by the ACID property of a row, we have > >2 CF only both CF holds a different volume of data even if they are > >Updated approximately with the same amount of data (cell updated vs cell > >created). > > Note that 0.90 only had row-based write atomicity. HBASE-2856 is > necessary for row-based read atomicity across column families. > >