I think O(N) reasoning does not make a real sense here since N is always small, lets not fool ourselves. To my mind operation cost of cache access (with all busy locks...), hashCode/equals and stuff like that has much bigger impact here. Do we still have a pluggable marshaller? Can my approach be implemented separately?
Sergi 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <alexey.goncha...@gmail.com>: > Currently an index-enabled serialized object form has the following layout > (simplified): > > [object fields][field1Offset,field1Length, > field2Offset,field2Length,...,fieldNOffset,fieldNLength] > > where fields order is determined upon the first object serialization and > stored in metadata cache which is available on all nodes. Thus, the field > lookup is performed as follows: > > fieldName -> fieldIndex (metadata lookup, O(1)), fieldIndex->fieldOffset in > footer (O(1)), fieldOffset->fieldValue (O(1)). > > BTW, I am finalizing the branch with marshaller changes and will send this > for a preliminary review soon. > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <atri.j...@gmail.com>: > > > Keep in mind that JSONB's performance comes from the fact that it uses > > server encoding, is binary represented and can have GIN indexes on top of > > it. Not sure if Ignite's marshalling approach keeps those features as > well. > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < > sergi.vlady...@gmail.com> > > wrote: > > > > > HSTORE and JSONB appeared to have similar format in Postgresql (because > > > they was developed by the same people). They noticed that they switched > > off > > > of using key length sorting because they sometimes need lexicographical > > > order but this is irrelevant for us. > > > > > > Sergi > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <atri.j...@gmail.com>: > > > > > > > Are you referring to JSONB here? > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < > > > sergi.vlady...@gmail.com> > > > > wrote: > > > > > > > > > Guys, specially Alexey G. > > > > > > > > > > I've attended PostgreSQL conference and there was a talk about > > > > unstructured > > > > > data format. > > > > > They had an interesting idea of serialized layout close enough to > > ours, > > > > I'm > > > > > not sure how much this is different from our approach and if we can > > use > > > > > some ideas from it but anywaus it looks really promising to me and > I > > > want > > > > > to share. > > > > > > > > > > The structure basically is the following: > > > > > > > > > > [key headers] [keys] [values] > > > > > > > > > > Key headers are [key offset, key length] so they are of a fixed > > length. > > > > > > > > > > The cool idea here is that keys and respectively the key headers > > sorted > > > > by > > > > > (key length, key) so that you can do a lookup first by fast picking > > key > > > > of > > > > > the needed length without looking at keys at all and then pick an > > exact > > > > > key. Both searches can be done with fast scan if there are small > > number > > > > of > > > > > keys and binary search for a larger number of keys. > > > > > > > > > > Alexey G., could you please compare this to our new marshalling > > > approach > > > > > you are about to merge? > > > > > BTW, it would be nice if you will describe it in details here. > > > > > > > > > > Sergi > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > Atri > > > > *l'apprenant* > > > > > > > > > > > > > > > -- > > Regards, > > > > Atri > > *l'apprenant* > > >