Re: Unstructured object format.

Atri Sharma Wed, 22 Jul 2015 03:14:46 -0700

+1 as well.

I wonder if it makes sense to further enhance the header fields to have
more metadata readily available (so that we can support more JSONB like
operators in a much efficient manner).


On Wed, Jul 22, 2015 at 3:37 PM, Atri Sharma <[email protected]> wrote:

> +1 from me too
>
> On Wed, Jul 22, 2015 at 3:34 PM, Sergi Vladykin <[email protected]>
> wrote:
>
>> Ok, looks good to me.
>>
>> Sergi
>>
>> 2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan <[email protected]>:
>>
>> > On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <[email protected]>
>> wrote:
>> >
>> > > So does that mean that local hashmap is not controlled with all the
>> heavy
>> > > locks that are present around the cache?
>> > >
>> >
>> > Yes Atri, you are right. The data is stored in local hash map to avoid
>> > touching a distributed cache whenever serializing objects.
>> >
>> >
>> > > On 22 Jul 2015 07:31, "Alexey Goncharuk" <[email protected]>
>> > > wrote:
>> > >
>> > > > Metadata cache access is backed by a local hash map, so the real
>> cost
>> > is
>> > > a
>> > > > String object hashcode which is cached in the String object and a
>> > hashmap
>> > > > lookup by an integer key.
>> > > >
>> > > > On the other hand, the marshaller is still pluggable and after the
>> > ticket
>> > > > is completed, it should be fairly easy to implement this approach
>> and
>> > > > compare performance.
>> > > >
>> > > > --Alexey
>> > > >
>> > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <[email protected]
>> >:
>> > > >
>> > > > > I think O(N) reasoning does not make a real sense here since N is
>> > > always
>> > > > > small, lets not fool ourselves.
>> > > > > To my mind operation cost of cache access (with all busy
>> locks...),
>> > > > > hashCode/equals and stuff like that has much bigger impact here.
>> > > > > Do we still have a pluggable marshaller? Can my approach be
>> > implemented
>> > > > > separately?
>> > > > >
>> > > > > Sergi
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <
>> > [email protected]
>> > > >:
>> > > > >
>> > > > > > Currently an index-enabled serialized object form has the
>> following
>> > > > > layout
>> > > > > > (simplified):
>> > > > > >
>> > > > > > [object fields][field1Offset,field1Length,
>> > > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
>> > > > > >
>> > > > > > where fields order is determined upon the first object
>> > serialization
>> > > > and
>> > > > > > stored in metadata cache which is available on all nodes. Thus,
>> the
>> > > > field
>> > > > > > lookup is performed as follows:
>> > > > > >
>> > > > > > fieldName -> fieldIndex (metadata lookup, O(1)),
>> > > > fieldIndex->fieldOffset
>> > > > > in
>> > > > > > footer (O(1)), fieldOffset->fieldValue (O(1)).
>> > > > > >
>> > > > > > BTW, I am finalizing the branch with marshaller changes and will
>> > send
>> > > > > this
>> > > > > > for a preliminary review soon.
>> > > > > >
>> > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <[email protected]>:
>> > > > > >
>> > > > > > > Keep in mind that JSONB's performance comes from the fact
>> that it
>> > > > uses
>> > > > > > > server encoding, is binary represented and can have GIN
>> indexes
>> > on
>> > > > top
>> > > > > of
>> > > > > > > it. Not sure if Ignite's marshalling approach keeps those
>> > features
>> > > as
>> > > > > > well.
>> > > > > > >
>> > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
>> > > > > > [email protected]>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > HSTORE and JSONB appeared to have similar format in
>> Postgresql
>> > > > > (because
>> > > > > > > > they was developed by the same people). They noticed that
>> they
>> > > > > switched
>> > > > > > > off
>> > > > > > > > of using key length sorting because they sometimes need
>> > > > > lexicographical
>> > > > > > > > order but this is irrelevant for us.
>> > > > > > > >
>> > > > > > > > Sergi
>> > > > > > > >
>> > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <[email protected]
>> >:
>> > > > > > > >
>> > > > > > > > > Are you referring to JSONB here?
>> > > > > > > > >
>> > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
>> > > > > > > > [email protected]>
>>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Guys, specially Alexey G.
>> > > > > > > > > >
>> > > > > > > > > > I've attended PostgreSQL conference and there was a talk
>> > > about
>> > > > > > > > > unstructured
>> > > > > > > > > > data format.
>> > > > > > > > > > They had an interesting idea of serialized layout close
>> > > enough
>> > > > to
>> > > > > > > ours,
>> > > > > > > > > I'm
>> > > > > > > > > > not sure how much this is different from our approach
>> and
>> > if
>> > > we
>> > > > > can
>> > > > > > > use
>> > > > > > > > > > some ideas from it but anywaus it looks really
>> promising to
>> > > me
>> > > > > and
>> > > > > > I
>> > > > > > > > want
>> > > > > > > > > > to share.
>> > > > > > > > > >
>> > > > > > > > > > The structure basically is the following:
>> > > > > > > > > >
>> > > > > > > > > > [key headers] [keys] [values]
>> > > > > > > > > >
>> > > > > > > > > > Key headers are [key offset, key length] so they are of
>> a
>> > > fixed
>> > > > > > > length.
>> > > > > > > > > >
>> > > > > > > > > > The cool idea here is that keys and respectively the key
>> > > > headers
>> > > > > > > sorted
>> > > > > > > > > by
>> > > > > > > > > > (key length, key) so that you can do a lookup first by
>> fast
>> > > > > picking
>> > > > > > > key
>> > > > > > > > > of
>> > > > > > > > > > the needed length without looking at keys at all and
>> then
>> > > pick
>> > > > an
>> > > > > > > exact
>> > > > > > > > > > key. Both searches can be done with fast scan if there
>> are
>> > > > small
>> > > > > > > number
>> > > > > > > > > of
>> > > > > > > > > > keys and binary search for a larger number of keys.
>> > > > > > > > > >
>> > > > > > > > > > Alexey G., could you please compare this to our new
>> > > marshalling
>> > > > > > > > approach
>> > > > > > > > > > you are about to merge?
>> > > > > > > > > > BTW, it would be nice if you will describe it in details
>> > > here.
>> > > > > > > > > >
>> > > > > > > > > > Sergi
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Regards,
>> > > > > > > > >
>> > > > > > > > > Atri
>> > > > > > > > > *l'apprenant*
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Regards,
>> > > > > > >
>> > > > > > > Atri
>> > > > > > > *l'apprenant*
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>



-- 
Regards,

Atri
*l'apprenant*

Re: Unstructured object format.

Reply via email to