We should keep in mind two very important things. First, binary object format is not storage format by design. Second, real space consumption heavily depend on configuration (backups, page size, indexes, compression, etc.). For this reason by estimating sizes of binary objects user would estimate nothing. This would only confuse users.
Thus I vote for the second solution. As far as cons, I find them unconvincing. It is neither slow - node starts in a matter of seconds, data load is seconds/minutes, nor it require a lot of memory - small sample of data would be enough. On Tue, Sep 12, 2017 at 7:29 PM, Anton Vinogradov <[email protected]> wrote: > Igniters, > > Since we're developing some kind of storage system it's pretty interesting > how effectively it stores data. > > I propose to develop some Estimator allows to count how much space is > needed to keep any data. > > For example: > 1) You have classes A,B and C with known fields and data distribution over > this fields. > 2) You know that you have to keep 1M of A, 2M of B and 45K of C. > > We can perform estimation in two different approaches: > > 1) Estimate how much space is needed to keep data in binary format. > So, we should > - Create some instances > - Marshall them to binary format > - Count sum(sizes) > - Multiply > > Pros: > - Fast. > - No need to start Ignite nodes. > - Can be used as some kind of benchmarking tool for BinaryMarshaller. > Once you improve something at BinaryMarshaller you'll see profit at > BinarySizeEstimator results. > > Cons: > - Estimation result will be different from real cluster memory consumption > and can be used only as preliminary assessment. > > 2) Estimate how much space is needed to keep data in real cluster. > So, we should > - Configure and start small cluster. Set page size, cache types and amount, > backups, nodes count, etc. > - Create a lot of instances (1/1000, 1/10 or even 1/1 of expected) > - Count pages size > > Pros: > - Can be used as pre-production tuning tool. > > Cons: > - Slow. > - Required to start Ignite nodes and a lot of free memory. > > > I think we need both, but I propose to start with first approach - > BinarySizeEstimator (https://issues.apache.org/jira/browse/IGNITE-6300) > > Thoughts? >
