We don't use native java serialization for anything but the on-disk BitSets in our bloom filters (because those are deserialized once at startup, so the overhead doesn't matter), btw.
We're talking about adding compression after https://issues.apache.org/jira/browse/CASSANDRA-674. On Sat, Feb 20, 2010 at 3:12 PM, Tatu Saloranta <tsalora...@gmail.com> wrote: > On Fri, Feb 19, 2010 at 11:44 AM, Weijun Li <weiju...@gmail.com> wrote: >> I see. How much is the overhead of java serialization? Does it slow down the >> system a lot? It seems to be a tradeoff between CPU usage and memory. > > This should be relatively easy to measure, as a stand-alone thing. Or > maybe even from profiler stack traces > If native Java serialization is used, there may be more efficient > alternatives, depending on data -- default serialization is highly > inefficient for small object graphs (like individual objects), but ok > for larger graphs; this because much of class metadata is included, > result is very self-contained. > Beyond default serialization, there are more efficient general-purpose > Java serialization frameworks; like Kryo or fast(est) json-based > serializers (jackson); see > [http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking] > for some idea on alternatives. > > In fact: one interesting idea would be to further trade some CPU for > less memory by using fast compression (like LZF). I hope to experiment > with this idea some time in future. But challenge is that this would > help most with clustered scheme (compressing more than one distinct > item), which is much trickier to make work. Compression does ok with > individual items, but real boost comes from redundancy between similar > items. > > -+ Tatu +- >