On Fri, Feb 19, 2010 at 11:44 AM, Weijun Li <weiju...@gmail.com> wrote: > I see. How much is the overhead of java serialization? Does it slow down the > system a lot? It seems to be a tradeoff between CPU usage and memory.
This should be relatively easy to measure, as a stand-alone thing. Or maybe even from profiler stack traces If native Java serialization is used, there may be more efficient alternatives, depending on data -- default serialization is highly inefficient for small object graphs (like individual objects), but ok for larger graphs; this because much of class metadata is included, result is very self-contained. Beyond default serialization, there are more efficient general-purpose Java serialization frameworks; like Kryo or fast(est) json-based serializers (jackson); see [http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking] for some idea on alternatives. In fact: one interesting idea would be to further trade some CPU for less memory by using fast compression (like LZF). I hope to experiment with this idea some time in future. But challenge is that this would help most with clustered scheme (compressing more than one distinct item), which is much trickier to make work. Compression does ok with individual items, but real boost comes from redundancy between similar items. -+ Tatu +-