Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov Wed, 21 Nov 2018 04:14:43 -0800

Denis,

In theory data conversion could be avoided in certain cases. E.g. consider
a case of loading data through streamer. We know the cache, we know it's
metadata and row format. So instead of doing "user object" -> "binary
object" -> "row", we can do "user object" -> "row".


On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <[email protected]>
wrote:

> Vladimir,
>
> Thank you for the clarification. I didn't see this distinction first.
>
> I meant using customizable formats for all serialization, not only for
> storage.
> The idea behind my proposal is to avoid data conversion, when loading data
> into Ignite.
> It will complicate usage of thin clients though, so I'm not sure, that it
> will make users happier.
>
> But anyway, the same approach may be used for storage only.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <[email protected]>:
>
> > Denis,
> >
> > Could you please clarify - are you talking about storage, e.g. how
> objects
> > are stored in Ignite, or about serialization as a whole? I'd like to
> better
> > understand whether the use case you described is relevant to my idea of
> > splitting binary objects from underlying storage format.
> > My vision was that we can use current BinaryObject protocol (with
> whatever
> > optimizations needed), as a common format for communication between nodes
> > and a common serialization protocol. This is very handy because all
> > participants (Java, С++, .NET, all sorts of thin clients) are able to
> work
> > with it. So if I have a "Person" class in Java I can read it in any other
> > platform without any additional configuration. But when it comes to
> > *storage*, then we may introduce pluggable row format interface which
> will
> > apply any necessary transformations. So if someone wants to store objects
> > in Avro/Protobuf, and ready to configure and implement it (generate
> > classes, implementa field extraction logic, etc.) - then just implement
> > that interface. They key is that this implementation will only be needed
> in
> > Java, not in a dozen of platform we support.
> >
> > But when it comes to how to store object in a cache
> >
> > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <[email protected]
> >
> > wrote:
> >
> > > People often ask about possibility to store their data in that format,
> > that
> > > they use in their applications.
> > > If you use Avro everywhere in your application, then why not store data
> > in
> > > the same format in Ignite?
> > > So, how about making an interface, that would enlist all operations we
> > > need,
> > > and use this interface everywhere without relying on any specific
> > > implementation.
> > > *BinaryObject* looks like a suitable interface, but the only
> > > implementation, that you can get from Ignite
> > > is *BinaryObjectImpl*.
> > > I think, we should make Ignite extendible and provide capability to
> > specify
> > > your own data format
> > > by implementing the corresponding interfaces.
> > > So, if you like JSONB or Protobuf or whatever else, you could enable a
> > > module for the corresponding
> > > format, and use it for storing the data.
> > >
> > > Denis
> > >
> > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <[email protected]
> >:
> > >
> > > > I'd like @Vyacheslav Daradur approach.
> > > >
> > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > UnsafeRow is a concrete InternalRow that represents a mutable
> internal
> > > > raw-memory (and hence unsafe) binary row format.
> > > >
> > > > P.S. If somebody is interested in this apporach, I could share more
> > > > information
> > > >
> > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> [email protected]
> > >:
> > > >
> > > > > I really like Protobuf format. It is probably not what we need for
> > O(1)
> > > > > fields access,
> > > > > but for compact data representation we can derive lots from there.
> > > > >
> > > > > Also IMO, restricting field type change is absolutely sane idea.
> > > > > The correct way to evolve schema in common case is to add new
> fields
> > > and
> > > > > gradually
> > > > > deprecate the old ones, if you can skip default/null fields in
> binary
> > > > > format this approach
> > > > > will not introduce any noticeable performance/size overhead.
> > > > >
> > > > > Sergi
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> > [email protected]
> > > >:
> > > > >
> > > > > > I think, one of a possible way to reduce overhead and TCO - SQL
> > > Scheme
> > > > > > approach.
> > > > > >
> > > > > > That assumes that metadata will be stored separately from
> > serialized
> > > > > > data to reduce size.
> > > > > > In this case, the most advantages of Binary Objects like access
> in
> > > > > > O(1) and access without deserialization may be achieved.
> > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > > [email protected]
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Alexey,
> > > > > > >
> > > > > > > Binary Objects only.
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > > > [email protected]
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Do we discuss here Core features only or the roadmap for all
> > > > > > components?
> > > > > > > >
> > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > > > [email protected]
> > > > > >:
> > > > > > > >
> > > > > > > > > Igniters,
> > > > > > > > >
> > > > > > > > > It is very likely that Apache Ignite 3.0 will be released
> > next
> > > > > year.
> > > > > > So
> > > > > > > > we
> > > > > > > > > need to start thinking about major product improvements.
> I'd
> > > like
> > > > > to
> > > > > > > > start
> > > > > > > > > with binary objects.
> > > > > > > > >
> > > > > > > > > Currently they are one of the main limiting factors for the
> > > > > product.
> > > > > > They
> > > > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > > > Ignite
> > > > > > > > > comparing to other vendors. They are slow - not suitable
> for
> > > SQL
> > > > at
> > > > > > all.
> > > > > > > > >
> > > > > > > > > I would like to ask all of you who worked with binary
> objects
> > > to
> > > > > > share
> > > > > > > > your
> > > > > > > > > feedback and ideas, so that we understand how they should
> > look
> > > > like
> > > > > > in AI
> > > > > > > > > 3.0. This is a brain storm - let's accumulate ideas first
> and
> > > > > > minimize
> > > > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > > > >
> > > > > > > > > 1) Historical background
> > > > > > > > >
> > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> > > > started
> > > > > > > > working
> > > > > > > > > on .NET and CPP clients. During design we had several ideas
> > in
> > > > > mind:
> > > > > > > > > - ability to read object fields in O(1) without
> > deserialization
> > > > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > > > >
> > > > > > > > > Since then a number of other concepts were mixed to the
> > > cocktail:
> > > > > > > > > - Affinity key fields
> > > > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > > > - Binary Object as storage format
> > > > > > > > >
> > > > > > > > > 2) My proposals
> > > > > > > > >
> > > > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > > > Binary Objects are terrible candidates for storage. Too
> fat,
> > > too
> > > > > > slow.
> > > > > > > > > Efficient storage typically has <10 bytes overhead per row
> > (no
> > > > > > metadata,
> > > > > > > > no
> > > > > > > > > length, no hash code, etc), allow supper-fast field access,
> > > > support
> > > > > > > > > different string formats (ASCII, UTF-8, etc), support
> > different
> > > > > > temporal
> > > > > > > > > types (date, time, timestamp, timestamp with timezone,
> etc),
> > > and
> > > > > > store
> > > > > > > > > these types as efficiently as possible.
> > > > > > > > >
> > > > > > > > > What we need is to introduce an interface which will
> convert
> > a
> > > > pair
> > > > > > of
> > > > > > > > > key-value objects into a row. This row will be used to
> store
> > > data
> > > > > > and to
> > > > > > > > > get fields from it. Care about memory consumption, need SQL
> > and
> > > > > > strict
> > > > > > > > > schema - use one format. Need flexibility and prefer
> > key-value
> > > > > > access -
> > > > > > > > use
> > > > > > > > > another format which will store binary objects unchanged
> > > (current
> > > > > > > > > behavior).
> > > > > > > > >
> > > > > > > > > interface DataRowFormat {
> > > > > > > > >     DataRow create(Object key, Object value); // primitives
> > or
> > > > > binary
> > > > > > > > > objects
> > > > > > > > >     DataRowMetadata metadata();
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > > > Affinity rules are governed by cache, not type. We should
> > > remove
> > > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > > >
> > > > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > > > I do not know why we did that in the first place. This
> > > > restriction
> > > > > > > > prevents
> > > > > > > > > type evolution and confuses users.
> > > > > > > > >
> > > > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > > > fixed-length
> > > > > > > > fields,
> > > > > > > > > put fixed-length fields before variable-length.
> > > > > > > > > Motivation: to save space.
> > > > > > > > >
> > > > > > > > > What else? Please share your ideas.
> > > > > > > > >
> > > > > > > > > Vladimir.
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [IMPORTANT] Future of Binary Objects

Reply via email to