In my opinion, writing nulls or default values on the wire or in-memory is plain wasteful. I agree with Vladimir that schema should be constant, but internally we should not store the default values at all.
It sounds like a relatively simple task to implement. Do we have a ticket for it? D. On Mon, Oct 31, 2016 at 1:00 PM, Vladimir Ozerov <voze...@gridgain.com> wrote: > Igor, > > Good catch. Probably some MAX value could help us here. > > On Mon, Oct 31, 2016 at 9:17 PM, Igor Sapego <isap...@gridgain.com> wrote: > > > Valentin, > > > > -1 was just an example. I've checked - currently we use all possible > range > > of offset values. > > So if we are going to use suggested approach then we need to reserve some > > value and > > adjust serialization/deserialization algorithms. > > > > Best Regards, > > Igor > > > > On Mon, Oct 31, 2016 at 8:46 PM, Valentin Kulichenko < > > valentin.kuliche...@gmail.com> wrote: > > > > > Makes sense to me, but not sure about -1 in particular. Is this offset > > > relative to object start position? What values can it have? > > > > > > -Val > > > > > > On Mon, Oct 31, 2016 at 10:38 AM, Igor Sapego <isap...@gridgain.com> > > > wrote: > > > > > >> Vladimir, > > >> > > >> How about some reserved value? I.e -1 offset means a default/null > value > > >> should be used? > > >> > > >> Best Regards, > > >> Igor > > >> > > >> On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov < > voze...@gridgain.com> > > >> wrote: > > >> > > >>> Valya, > > >>> > > >>> Do you have any ideas how to implement this? We write field offsets > in > > >>> the > > >>> footer. If field is not written, then what should be used for its > > offset? > > >>> > > >>> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko < > > >>> valentin.kuliche...@gmail.com> wrote: > > >>> > > >>> > Vladimir, > > >>> > > > >>> > These are good points, but I'm not suggesting to change the schema. > > If > > >>> one > > >>> > writes five fields, the schema should have five fields in any case, > > >>> > regardless of values. I only suggest to change the internal > > >>> representation > > >>> > of the object and do not save fields with default values in the > byte > > >>> array > > >>> > as we don't really need them there. > > >>> > > > >>> > -Val > > >>> > > > >>> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov < > > >>> voze...@gridgain.com> > > >>> > wrote: > > >>> > > > >>> >> Valya, > > >>> >> > > >>> >> I have several concerns: > > >>> >> 1) Correctness: hasField() will not work properly. But probably we > > can > > >>> >> fix that by adding this info to schema. > > >>> >> 2) Performance: we have lots optimizations which depend on either > > >>> >> "stable" object schema, or low number of schemas. We will > > effectively > > >>> turn > > >>> >> them off. > > >>> >> But what concerns me even more, is that we may end up in enormous > > >>> number > > >>> >> of schemas. E.g. consider an object with 10 number fields. If all > > >>> fields > > >>> >> could be zero, we may end up in something like 2^10 schemas. > > >>> >> > > >>> >> Vladimir. > > >>> >> > > >>> >> 29 окт. 2016 г. 0:37 пользователь "Valentin Kulichenko" < > > >>> >> valentin.kuliche...@gmail.com> написал: > > >>> >> > > >>> >> Vova, > > >>> >>> > > >>> >>> Why do we need to write zeros and nulls in the first place? > What's > > >>> the > > >>> >>> value of having them in the byte array? > > >>> >>> > > >>> >>> -Val > > >>> >>> > > >>> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov < > > >>> voze...@gridgain.com> > > >>> >>> wrote: > > >>> >>> > > >>> >>>> Valya, > > >>> >>>> > > >>> >>>> Currently null value is written as one byte, while zero value of > > >>> long > > >>> >>>> type is written as 9 bytes. I want to improve that and write > zeros > > >>> as one > > >>> >>>> byte as well. > > >>> >>>> > > >>> >>>> As per var-length encoding, I am strongly against it. It saves > IO > > >>> and > > >>> >>>> memory at the cost of CPU. If we encode numbers in this way we > > will > > >>> >>>> slowdown SQL (which is already not very fast, to be honest). > > Because > > >>> >>>> instead of a single read memory read, we will have to perform > > >>> multiple > > >>> >>>> reads and then apply some mechanics to restore original value. > We > > >>> already > > >>> >>>> have such problem with Strings - Java stores them as UTF-16, but > > we > > >>> encode > > >>> >>>> them as UTF-8. As a result every read of a string field in SQL > > >>> results in > > >>> >>>> decoding overhead. > > >>> >>>> > > >>> >>>> Vladimir. > > >>> >>>> > > >>> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko < > > >>> >>>> valentin.kuliche...@gmail.com> wrote: > > >>> >>>> > > >>> >>>>> Cross-posting this to dev list. > > >>> >>>>> > > >>> >>>>> Vladimir, > > >>> >>>>> > > >>> >>>>> To be honest, I don't see much difference between null values > for > > >>> >>>>> objects and zero values for primitives. From BinaryObject > > semantics > > >>> >>>>> standpoint, both are default values for corresponding types. > > These > > >>> values > > >>> >>>>> will be returned from the BinaryObject.field() method > regardless > > >>> of whether > > >>> >>>>> we actually save then in the byte array or not. Having said > that, > > >>> why don't > > >>> >>>>> we just skip them during write? > > >>> >>>>> > > >>> >>>>> You optimization will be still useful though, because there are > > >>> often > > >>> >>>>> a lot of ints and longs that are not zeros, but still small and > > >>> can fit 1-2 > > >>> >>>>> bytes. We already added such compaction in direct message > > >>> marshaling and it > > >>> >>>>> reduced overall traffic by around 30%. > > >>> >>>>> > > >>> >>>>> -Val > > >>> >>>>> > > >>> >>>>> > > >>> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov < > > >>> voze...@gridgain.com > > >>> >>>>> > wrote: > > >>> >>>>> > > >>> >>>>>> Hi, > > >>> >>>>>> > > >>> >>>>>> I am not very concerned with null fields overhead, because > > >>> usually it > > >>> >>>>>> won't be significant. However, there is a problem with zeros. > > >>> User object > > >>> >>>>>> might have lots of int/long zeros, this is not uncommon. And > > each > > >>> zero will > > >>> >>>>>> consume 4-8 additional bytes. We probably will implement > special > > >>> >>>>>> optimization which will write such fields in special compact > > >>> format. > > >>> >>>>>> > > >>> >>>>>> Vladimir. > > >>> >>>>>> > > >>> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko < > > >>> >>>>>> valentin.kuliche...@gmail.com> wrote: > > >>> >>>>>> > > >>> >>>>>>> Hi, > > >>> >>>>>>> > > >>> >>>>>>> Yes, null values consume memory. I believe this can be > > optimized, > > >>> >>>>>>> but I > > >>> >>>>>>> haven't seen issues with this so far. Unless you have > hundreds > > of > > >>> >>>>>>> fields > > >>> >>>>>>> most of which are nulls (very rare case), the overhead is > > >>> minimal. > > >>> >>>>>>> > > >>> >>>>>>> -Val > > >>> >>>>>>> > > >>> >>>>>>> > > >>> >>>>>>> > > >>> >>>>>>> -- > > >>> >>>>>>> View this message in context: http://apache-ignite-users.705 > > >>> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html > > >>> >>>>>>> Sent from the Apache Ignite Users mailing list archive at > > >>> Nabble.com. > > >>> >>>>>>> > > >>> >>>>>> > > >>> >>>>>> > > >>> >>>>> > > >>> >>>> > > >>> >>> > > >>> > > > >>> > > >> > > >> > > > > > >