We should keep in mind two very important things. First, binary object
format is not storage format by design. Second, real space consumption
heavily depend on configuration (backups, page size, indexes, compression,
etc.). For this reason by estimating sizes of binary objects user would
estimate nothing. This would only confuse users.

Thus I vote for the second solution. As far as cons, I find them
unconvincing. It is neither slow - node starts in a matter of seconds, data
load is seconds/minutes, nor it require a lot of memory - small sample of
data would be enough.

On Tue, Sep 12, 2017 at 7:29 PM, Anton Vinogradov <[email protected]> wrote:

> Igniters,
>
> Since we're developing some kind of storage system it's pretty interesting
> how effectively it stores data.
>
> I propose to develop some Estimator allows to count how much space is
> needed to keep any data.
>
> For example:
> 1) You have classes A,B and C with known fields and data distribution over
> this fields.
> 2) You know that you have to keep 1M of A, 2M of B and 45K of C.
>
> We can perform estimation in two different approaches:
>
> 1) Estimate how much space is needed to keep data in binary format.
> So, we should
> - Create some instances
> - Marshall them to binary format
> - Count sum(sizes)
> - Multiply
>
> Pros:
> - Fast.
> - No need to start Ignite nodes.
> - Can be used as some kind of benchmarking tool for BinaryMarshaller.
> Once you improve something at BinaryMarshaller you'll see profit at
> BinarySizeEstimator results.
>
> Cons:
> - Estimation result will be different from real cluster memory consumption
> and can be used only as preliminary assessment.
>
> 2) Estimate how much space is needed to keep data in real cluster.
> So, we should
> - Configure and start small cluster. Set page size, cache types and amount,
> backups, nodes count, etc.
> - Create a lot of instances (1/1000, 1/10 or even 1/1 of expected)
> - Count pages size
>
> Pros:
> - Can be used as pre-production tuning tool.
>
> Cons:
> - Slow.
> - Required to start Ignite nodes and a lot of free memory.
>
>
> I think we need both, but I propose to start with first approach -
> BinarySizeEstimator (https://issues.apache.org/jira/browse/IGNITE-6300)
>
> Thoughts?
>

Reply via email to