Eno Thereska created KAFKA-3973:
-----------------------------------
Summary: Investigate feasibility of caching bytes vs. records
Key: KAFKA-3973
URL: https://issues.apache.org/jira/browse/KAFKA-3973
Project: Kafka
Issue Type: Sub-task
Components: streams
Reporter: Eno Thereska
Assignee: Bill Bejeck
Currently the cache stores and accounts for records, not bytes or objects. This
investigation would be around measuring any performance overheads that come
from storing bytes or objects. As an outcome we should know whether 1) we
should store bytes or 2) we should store objects.
If we store objects, the cache still needs to know their size (so that it can
know if the object fits in the allocated cache space, e.g., if the cache is
100MB and the object is 10MB, we'd have space for 10 such objects). The
investigation needs to figure out how to find out the size of the object
efficiently in Java.
If we store bytes, then we are serialising an object into bytes before caching
it, i.e., we take a serialisation cost. The investigation needs measure how bad
this cost can be especially for the case when all objects fit in cache (and
thus any extra serialisation cost would show).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)