Adding Data-At-Rest compression support to Ceph

Igor Fedotov Tue, 22 Sep 2015 10:05:15 -0700

Hi guys,

I can find some talks about adding compression support to Ceph. Let meshare some thoughts and proposals on that too.

First of all I’d like to consider several major implementation optionsseparately. IMHO this makes sense since they have differentapplicability, value and implementation specifics. Besides that lessparts are easier for both understanding and implementation.

* Data-At-Rest Compression. This is about compressing basic datavolume kept by the Ceph backing tier. The main reason for that is datastore costs reduction. One can find similar approach introduced byErasure Coding Pool implementation - cluster capacity increases (i.e.storage cost reduces) at the expense of additional computations. This isespecially effective when combined with the high-performance cache tier.* Intermediate Data Compression. This case is about applyingcompression for intermediate data like system journals, caches etc. Theintention is to improve expensive storage resource utilization (e.g.solid state drives or RAM ). At the same time the idea to applycompression ( feature that undoubtedly introduces additional overhead )to the crucial heavy-duty components probably looks contradictory.* Exchange Data Сompression. This one to be applied to messagestransported between client and storage cluster components as well asinternal cluster traffic. The rationale for that might be the desire toimprove cluster run-time characteristics, e.g. limited data bandwidthcaused by the network or storage devices throughput. The potentialdrawback is client overburdening - client computation resources mightbecome a bottleneck since they take most of compression/decompression tasks.

Obviously it would be great to have support for all the above cases,e.g. object compression takes place at the client and cluster componentshandle that naturally during the object life-cycle. Unfortunatelysignificant complexities arise on this way. Most of them are related topartial object access, both reading and writing. It looks like hugedevelopment ( redesigning, refactoring and new code development ) andtesting efforts are required on this way. It’s hard to estimate thevalue of such aggregated support at the current moment too.Thus the approach I’m suggesting is to drive the progress eventually andconsider cases separately. At the moment my proposal is to addData-At-Rest compression to Erasure Coded pools as the most definite onefrom both implementation and value points of view.


How we can do that.

Ceph Cluster Architecture suggests two-tier storage model for productionusage. Cache tier built on high-performance expensive storage devicesprovides performance. Storage tier with low-cost less-efficient devicesprovides cost-effectiveness and capacity. Cache tier is supposed to useordinary data replication while storage one can use erasure coding (EC)for effective and reliable data keeping. EC provides less store costswith the same reliability comparing to data replication approach at theexpenses of additional computations. Thus Ceph already has some tradeoff between capacity and computation efforts. Actually Data-At-Restcompression is exactly about the same. Moreover one can tie EC andData-At-Rest compression together to achieve even better storageeffectiveness.

There are two possible ways on adding Data-At-Rest compression:
  *  Use data compression built into a file system beyond the Ceph.
  *  Add compression to Ceph OSD.

At first glance Option 1. looks pretty attractive but there are somedrawbacks for this approach. Here they are:* File System lock-in. BTRFS is the only file system supportingtransparent compression among ones recommended for Ceph usage.Moreover AFAIK it’s still not recommended for productionusage, see:

http://ceph.com/docs/master/rados/configuration/filesystem-recommendations/

* Limited flexibility - one can use compression methods andpolicies supported by FS only.* Data compression depends on volume or mount point properties (andis bound to OSD). Without additional support Ceph lacks the ability tohave different compression policies for different pools residing at thesame OSD.* File Compression Control isn’t standardized among file systems.If (or when) new compression-equipped File System appears Ceph mightrequire corresponding changes to handle that properly.


Having compression at OSD helps to eliminate these drawbacks.

As mentioned above Data-At-Rest compression purposes are pretty the sameas for Erasure Coding. It looks quite easy to add compression support toEC pools. This way one can have even more storage space for higher CPU load.

Additional Pros for combining compression and erasure coding are:

* Both EC and compression have complexities in partial writing. ECpools don’t have partial write support (data append only) and thesolution for that is cache tier insertion. Thus we can transparentlyreuse the same approach in case of compression.* Compression becomes a pool property thus Ceph users will havedirect control what pools to apply compression with.* Original write performance isn’t impacted by the compression fortwo-tier model - write data goes to the cache uncompressed and there isno corresponding compression latency. Actual compression happens inbackground when backing storage filling takes place.* There is an additional benefit in network bandwidth saving whenprimary OSD performs a compression as resulting object shards forreplication are less.* Data-at-rest compression can also bring an additional performanceimprovement for HDD-based storage. Reducing the amount of data writtento slow media can provide a net performance improvement even taking intoaccount the compression overhead.


Some implementation notes:

The suggested approach is to perform data compression prior to ErasureCoding to reduce data portion passed to coding and avoid the need tointroduce additional means to disable EC-generated chunks compression.Data-At-Rest compression should support plugin architecture to enablemultiple compression backends.Compression engine should mark stored objects with some tags to indicateif compression took place and what algorithm was used.To avoid (reduce) backing storage CPU overload caused bycompression/decompression ( e.g. this can happen during massive reads )we can introduce additional means to detect such situations andtemporary disable compression for current write requests. Since there isway to mark objects as compressed/uncompressed this produces almost noissues for future handling. Hardware compression support usage, e.g.Intel QuickAssist can be an additional helper for this issue.


Any thoughts?

Thanks,
Igor.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adding Data-At-Rest compression support to Ceph

Reply via email to