[ https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979161#comment-14979161 ]
Vladimir Rodionov commented on HBASE-14468: ------------------------------------------- [~enis] * sanityCheck is OK, will do that * Auto-disable the major compactions, and set the blocking store files if they are not set? - OK * Allow splits? Not sure. Will think about this. {quote} Can we use HStore.removeUnneededFiles() or storeEngine.getStoreFileManager() which already implements the is expired logic so that there is no duplication there? {quote} What duplication? FCP does not expire /purge files, HStore takes care of them. > Compaction improvements: FIFO compaction policy > ----------------------------------------------- > > Key: HBASE-14468 > URL: https://issues.apache.org/jira/browse/HBASE-14468 > Project: HBase > Issue Type: Improvement > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, > HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, > HBASE-14468-v6.patch > > > h2. FIFO Compaction > h3. Introduction > FIFO compaction policy selects only files which have all cells expired. The > column family MUST have non-default TTL. > Essentially, FIFO compactor does only one job: collects expired store files. > I see many applications for this policy: > # use it for very high volume raw data which has low TTL and which is the > source of another data (after additional processing). Example: Raw > time-series vs. time-based rollup aggregates and compacted time-series. We > collect raw time-series and store them into CF with FIFO compaction policy, > periodically we run task which creates rollup aggregates and compacts > time-series, the original raw data can be discarded after that. > # use it for data which can be kept entirely in a a block cache (RAM/SSD). > Say we have local SSD (1TB) which we can use as a block cache. No need for > compaction of a raw data at all. > Because we do not do any real compaction, we do not use CPU and IO (disk and > network), we do not evict hot data from a block cache. The result: improved > throughput and latency both write and read. > See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style > h3. To enable FIFO compaction policy > For table: > {code} > HTableDescriptor desc = new HTableDescriptor(tableName); > > desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, > FIFOCompactionPolicy.class.getName()); > {code} > For CF: > {code} > HColumnDescriptor desc = new HColumnDescriptor(family); > > desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, > FIFOCompactionPolicy.class.getName()); > {code} > Make sure, that table has disabled region splits (either by setting > explicitly DisabledRegionSplitPolicy or by setting > ConstantSizeRegionSplitPolicy and very large max region size). You will have > to increase to a very large number store's blocking file number : > *hbase.hstore.blockingStoreFiles* as well. > > h3. Limitations > Do not use FIFO compaction if : > * Table/CF has MIN_VERSION > 0 > * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL) -- This message was sent by Atlassian JIRA (v6.3.4#6332)