We don't use native java serialization for anything but the on-disk
BitSets in our bloom filters (because those are deserialized once at
startup, so the overhead doesn't matter), btw.

We're talking about adding compression after
https://issues.apache.org/jira/browse/CASSANDRA-674.

On Sat, Feb 20, 2010 at 3:12 PM, Tatu Saloranta <tsalora...@gmail.com> wrote:
> On Fri, Feb 19, 2010 at 11:44 AM, Weijun Li <weiju...@gmail.com> wrote:
>> I see. How much is the overhead of java serialization? Does it slow down the
>> system a lot? It seems to be a tradeoff between CPU usage and memory.
>
> This should be relatively easy to measure, as a stand-alone thing. Or
> maybe even from profiler stack traces
>  If native Java serialization is used, there may be more efficient
> alternatives, depending on data -- default serialization is highly
> inefficient for small object graphs (like individual objects), but ok
> for larger graphs; this because much of class metadata is included,
> result is very self-contained.
> Beyond default serialization, there are more efficient general-purpose
> Java serialization frameworks; like Kryo or fast(est) json-based
> serializers (jackson); see
> [http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking]
> for some idea on alternatives.
>
> In fact: one interesting idea would be to further trade some CPU for
> less memory by using fast compression (like LZF). I hope to experiment
> with this idea some time in future. But challenge is that this would
> help most with clustered scheme (compressing more than one distinct
> item), which is much trickier to make work. Compression does ok with
> individual items, but real boost comes from redundancy between similar
> items.
>
> -+ Tatu +-
>

Reply via email to