I like the idea, especially because it also would apply across the board.
So you propose to build the binary object and to apply dictionary based 
compression on top.

I could quickly generate a bunch of binary objects from the tests and apply 
java compress/deflate with a dictionary based on the BinaryUtils elements.
To compare with the null compaction and the varint.


-----Original Message-----
From: Ilya Kasnacheev <ilya.kasnach...@gmail.com> 
Sent: Monday, May 25, 2020 12:05 PM
To: dev <dev@ignite.apache.org>
Subject: Re: IGNITE-6499 Compact NULL fields

Caution, this email may be from a sender outside Wolters Kluwer. Verify the 
sender and know the content is safe.

Hello!

My take is the following: if conserving memory is needed at all, then we better 
invest in compression (such as dictionary-based row compression) rather than 
implementing varint, compact nulls, etc.

Dictionary-based compression can easily tackle varints, null patterns while 
also compressing strings and repeated values and even things we would never 
think out on our own.

It also has low complexity of our own code, no compatibility issues (people 
store binary objects in 3rd party storage, they do indeed) and low incidence of 
bugs.

Regards,
--
Ilya Kasnacheev


пн, 25 мая 2020 г. в 12:51, Hostettler, Steve <
steve.hostett...@wolterskluwer.com>:

> I went for a simpler approach (only with null mask( and yes the gain 
> is high for smaller object but low otherwise. I gain between 5-20% on 
> my objects. But to me it is the step stone to easily implement other 
> optimisations like varint and schemaless without using raw. Trying to 
> solve the latest unit tests to give you a better idea. If not worth 
> then let's not do it but it is worth a try I think.
>
>
> -----Original Message-----
> From: Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> Sent: Monday, May 25, 2020 11:48 AM
> To: dev <dev@ignite.apache.org>
> Subject: Re: IGNITE-6499 Compact NULL fields
>
> Caution, this email may be from a sender outside Wolters Kluwer. 
> Verify the sender and know the content is safe.
>
> Hello!
>
> I can't help myself but wonder how large of a benefit will it give.  I 
> have checked the ticket description, it looks the proposed scheme is 
> elaborate and benefit for non-extreme binary objects rather tiny.
>
> WDYT?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 22:54, steve.hostett...@gmail.com <
> steve.hostett...@gmail.com>:
>
> > Hello igniters,
> >
> > while I would like to help on the calcite because H2 optimiser (or 
> > the lack
> > thereof) is really killing us, I think that it would be wiser to 
> > start by contributing on something easier.
> >
> > Therefore I will tackle another problem that we have which is the 
> > memory consumption. I stumbled upon this IEP
> >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > Bo 
> > bject%2Bformat%2Bimprovements&amp;data=02%7C01%7CSteve.Hostettler%40
> > wo 
> > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > fa
> > 89c3553b2da2c17%7C0%7C0%7C637259968758509764&amp;sdata=ZNFJ5gqEXRv5K
> > R3
> > HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3D&amp;reserved=0
> > <
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FIEP-2%253A%2BBinary%2
> > Bo 
> > bject%2Bformat%2Bimprovements&amp;data=02%7C01%7CSteve.Hostettler%40
> > wo 
> > lterskluwer.com%7C7568148487434617407b08d80090b1f2%7C8ac76c91e7f141f
> > fa
> > 89c3553b2da2c17%7C0%7C0%7C637259968758509764&amp;sdata=ZNFJ5gqEXRv5K
> > R3 HJUfYZ4rmnGwCiFVGg4IrWTORT2k%3D&amp;reserved=0>
> >
> > that is about optimising the binary marshaller.
> >
> > The low hanging fruit seemed to be the null compaction so I decided 
> > to start with it. Though I am sure I do see some hidden complexity.
> >
> > Here a couple of questions:
> > - Can I assign myself IGNITE-6499 and attach a patch?
> > - Who can I contact to help with the review. In the following page 
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribute
> > &a
> > mp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C75681484874
> > 34
> > 617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637
> > 25 
> > 9968758519763&amp;sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03F
> > Q%
> > 3D&amp;reserved=0
> > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc
> > wi 
> > ki.apache.org%2Fconfluence%2Fdisplay%2FIGNITE%2FHow%2Bto%2BContribut
> > e&
> > amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C7568148487
> > 43
> > 4617407b08d80090b1f2%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C63
> > 72 
> > 59968758519763&amp;sdata=1Uzz8DLO%2B9nd1FPQ14efFeL35QsYE6tT3BvhIKf03
> > FQ %3D&amp;reserved=0> there is no one assigned for marshalling.
> >
> > On the details:
> > The compression is disabled by default as it is not compatible with 
> > objects previously marshalled.
> >
> > My approach was to go a bit beyond the JIRA. No only do I remove the 
> > indexes to null fields in the footer, I also remove the 0x65 in the 
> > objects. I did not remove them fro the collections and arrays 
> > because they are using absolute positioning.
> >
> > I gain between 5% to 20% depending of my test cases. Obviously the 
> > smaller the object and the higher the number of nulls, the higher 
> > the compression rate.
> >
> > Based on that I can quite easily add var int compression which is
> > IGNITE-6418 and should significantly increase the compression rate 
> > with a lot of integers and longs when only using small numbers.
> >
> > Next step is to add JMH micro-benchmark to check the impact in terms 
> > of performances.
> >
> >
> > Example on a simple object w/ null compaction
> >
> > Length=55 FooterPosition=50
> > 0x67 // ValueType
> > 0x01 // FormatVersion
> > 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 
> > compactFooter=true
> > 0x78 0x66 0xbe 0x44 //TypeId
> > 0xf9 0xcd 0x07 0x57 //Hashcode
> > 0x37 0x00 0x00 0x00 //Length
> > 0x3d 0xa8 0x15 0xe4 //SchemaId
> > 0x32 0x00 0x00 0x00 //Footer position = 50
> > 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 
> > 0x00
> > 0x00
> > 0x61 0x62 0x63 0x09 0x03 0x00 0x00 0x00 0x61 0x62 0x63 Footer 
> > length=5
> > 0x18 0x1d 0x22 0x2a 0x47
> >
> > and w/o null compaction
> > Length=60 FooterPosition=53
> > 0x67 // ValueType
> > 0x01 // FormatVersion
> > 0x2b 0x00 //Flags userType=true hasSchema=true offset=1 
> > compactFooter=true
> > 0x78 0x66 0xbe 0x44 //TypeId
> > 0xa4 0x43 0x0e 0xf5 //Hashcode
> > 0x3c 0x00 0x00 0x00 //Length
> > 0x3d 0xa8 0x15 0xe4 //SchemaId
> > 0x35 0x00 0x00 0x00 //Footer position = 53
> > 0x03 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x09 0x03 0x00 
> > 0x00
> > 0x00
> > 0x61 0x62 0x63 0x65 0x65 0x65 0x09 0x03 0x00 0x00 0x00 0x61 0x62 
> > 0x63 Footer length=7
> > 0x18 0x1d 0x22 0x2a 0x2b 0x2c 0x2d
> >
> >
> >
> >
> > --
> > Sent from:
> >
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> e-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.
> Hostettler%40wolterskluwer.com%7C4a067fbb24ee43da986308d8009325b7%7C8a
> c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637259979282744761&amp;sdata=
> jEkZk0ihvnuPO4Z60Uoh16ST%2Bw51mKHeAUl1EICF4eE%3D&amp;reserved=0
> >
>

Reply via email to