Re: Unstructured object format.

Dmitriy Setrakyan Wed, 22 Jul 2015 00:48:02 -0700

On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[email protected]> wrote:


> So does that mean that local hashmap is not controlled with all the heavy
> locks that are present around the cache?
>

Yes Atri, you are right. The data is stored in local hash map to avoid
touching a distributed cache whenever serializing objects.


> On 22 Jul 2015 07:31, "Alexey Goncharuk" <[email protected]>
> wrote:
>
> > Metadata cache access is backed by a local hash map, so the real cost is
> a
> > String object hashcode which is cached in the String object and a hashmap
> > lookup by an integer key.
> >
> > On the other hand, the marshaller is still pluggable and after the ticket
> > is completed, it should be fairly easy to implement this approach and
> > compare performance.
> >
> > --Alexey
> >
> > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[email protected]>:
> >
> > > I think O(N) reasoning does not make a real sense here since N is
> always
> > > small, lets not fool ourselves.
> > > To my mind operation cost of cache access (with all busy locks...),
> > > hashCode/equals and stuff like that has much bigger impact here.
> > > Do we still have a pluggable marshaller? Can my approach be implemented
> > > separately?
> > >
> > > Sergi
> > >
> > >
> > >
> > >
> > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <[email protected]
> >:
> > >
> > > > Currently an index-enabled serialized object form has the following
> > > layout
> > > > (simplified):
> > > >
> > > > [object fields][field1Offset,field1Length,
> > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
> > > >
> > > > where fields order is determined upon the first object serialization
> > and
> > > > stored in metadata cache which is available on all nodes. Thus, the
> > field
> > > > lookup is performed as follows:
> > > >
> > > > fieldName -> fieldIndex (metadata lookup, O(1)),
> > fieldIndex->fieldOffset
> > > in
> > > > footer (O(1)), fieldOffset->fieldValue (O(1)).
> > > >
> > > > BTW, I am finalizing the branch with marshaller changes and will send
> > > this
> > > > for a preliminary review soon.
> > > >
> > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[email protected]>:
> > > >
> > > > > Keep in mind that JSONB's performance comes from the fact that it
> > uses
> > > > > server encoding, is binary represented and can have GIN indexes on
> > top
> > > of
> > > > > it. Not sure if Ignite's marshalling approach keeps those features
> as
> > > > well.
> > > > >
> > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
> > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > HSTORE and JSONB appeared to have similar format in Postgresql
> > > (because
> > > > > > they was developed by the same people). They noticed that they
> > > switched
> > > > > off
> > > > > > of using key length sorting because they sometimes need
> > > lexicographical
> > > > > > order but this is irrelevant for us.
> > > > > >
> > > > > > Sergi
> > > > > >
> > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[email protected]>:
> > > > > >
> > > > > > > Are you referring to JSONB here?
> > > > > > >
> > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
> > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Guys, specially Alexey G.
> > > > > > > >
> > > > > > > > I've attended PostgreSQL conference and there was a talk
> about
> > > > > > > unstructured
> > > > > > > > data format.
> > > > > > > > They had an interesting idea of serialized layout close
> enough
> > to
> > > > > ours,
> > > > > > > I'm
> > > > > > > > not sure how much this is different from our approach and if
> we
> > > can
> > > > > use
> > > > > > > > some ideas from it but anywaus it looks really promising to
> me
> > > and
> > > > I
> > > > > > want
> > > > > > > > to share.
> > > > > > > >
> > > > > > > > The structure basically is the following:
> > > > > > > >
> > > > > > > > [key headers] [keys] [values]
> > > > > > > >
> > > > > > > > Key headers are [key offset, key length] so they are of a
> fixed
> > > > > length.
> > > > > > > >
> > > > > > > > The cool idea here is that keys and respectively the key
> > headers
> > > > > sorted
> > > > > > > by
> > > > > > > > (key length, key) so that you can do a lookup first by fast
> > > picking
> > > > > key
> > > > > > > of
> > > > > > > > the needed length without looking at keys at all and then
> pick
> > an
> > > > > exact
> > > > > > > > key. Both searches can be done with fast scan if there are
> > small
> > > > > number
> > > > > > > of
> > > > > > > > keys and binary search for a larger number of keys.
> > > > > > > >
> > > > > > > > Alexey G., could you please compare this to our new
> marshalling
> > > > > > approach
> > > > > > > > you are about to merge?
> > > > > > > > BTW, it would be nice if you will describe it in details
> here.
> > > > > > > >
> > > > > > > > Sergi
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > Atri
> > > > > > > *l'apprenant*
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Atri
> > > > > *l'apprenant*
> > > > >
> > > >
> > >
> >
>

Re: Unstructured object format.

Reply via email to