Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov Wed, 21 Nov 2018 01:57:23 -0800

Denis,

Could you please clarify - are you talking about storage, e.g. how objects
are stored in Ignite, or about serialization as a whole? I'd like to better
understand whether the use case you described is relevant to my idea of
splitting binary objects from underlying storage format.
My vision was that we can use current BinaryObject protocol (with whatever
optimizations needed), as a common format for communication between nodes
and a common serialization protocol. This is very handy because all
participants (Java, С++, .NET, all sorts of thin clients) are able to work
with it. So if I have a "Person" class in Java I can read it in any other
platform without any additional configuration. But when it comes to
*storage*, then we may introduce pluggable row format interface which will
apply any necessary transformations. So if someone wants to store objects
in Avro/Protobuf, and ready to configure and implement it (generate
classes, implementa field extraction logic, etc.) - then just implement
that interface. They key is that this implementation will only be needed in
Java, not in a dozen of platform we support.


But when it comes to how to store object in a cache

On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <[email protected]>
wrote:

> People often ask about possibility to store their data in that format, that
> they use in their applications.
> If you use Avro everywhere in your application, then why not store data in
> the same format in Ignite?
> So, how about making an interface, that would enlist all operations we
> need,
> and use this interface everywhere without relying on any specific
> implementation.
> *BinaryObject* looks like a suitable interface, but the only
> implementation, that you can get from Ignite
> is *BinaryObjectImpl*.
> I think, we should make Ignite extendible and provide capability to specify
> your own data format
> by implementing the corresponding interfaces.
> So, if you like JSONB or Protobuf or whatever else, you could enable a
> module for the corresponding
> format, and use it for storing the data.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <[email protected]>:
>
> > I'd like @Vyacheslav Daradur approach.
> >
> > Maybe somebody could have a look at UnsafeRow in Spark
> >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > UnsafeRow is a concrete InternalRow that represents a mutable internal
> > raw-memory (and hence unsafe) binary row format.
> >
> > P.S. If somebody is interested in this apporach, I could share more
> > information
> >
> > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <[email protected]>:
> >
> > > I really like Protobuf format. It is probably not what we need for O(1)
> > > fields access,
> > > but for compact data representation we can derive lots from there.
> > >
> > > Also IMO, restricting field type change is absolutely sane idea.
> > > The correct way to evolve schema in common case is to add new fields
> and
> > > gradually
> > > deprecate the old ones, if you can skip default/null fields in binary
> > > format this approach
> > > will not introduce any noticeable performance/size overhead.
> > >
> > > Sergi
> > >
> > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <[email protected]
> >:
> > >
> > > > I think, one of a possible way to reduce overhead and TCO - SQL
> Scheme
> > > > approach.
> > > >
> > > > That assumes that metadata will be stored separately from serialized
> > > > data to reduce size.
> > > > In this case, the most advantages of Binary Objects like access in
> > > > O(1) and access without deserialization may be achieved.
> > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> [email protected]
> > >
> > > > wrote:
> > > > >
> > > > > Hi Alexey,
> > > > >
> > > > > Binary Objects only.
> > > > >
> > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Do we discuss here Core features only or the roadmap for all
> > > > components?
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > [email protected]
> > > >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > > year.
> > > > So
> > > > > > we
> > > > > > > need to start thinking about major product improvements. I'd
> like
> > > to
> > > > > > start
> > > > > > > with binary objects.
> > > > > > >
> > > > > > > Currently they are one of the main limiting factors for the
> > > product.
> > > > They
> > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > Ignite
> > > > > > > comparing to other vendors. They are slow - not suitable for
> SQL
> > at
> > > > all.
> > > > > > >
> > > > > > > I would like to ask all of you who worked with binary objects
> to
> > > > share
> > > > > > your
> > > > > > > feedback and ideas, so that we understand how they should look
> > like
> > > > in AI
> > > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > > minimize
> > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > >
> > > > > > > 1) Historical background
> > > > > > >
> > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> > started
> > > > > > working
> > > > > > > on .NET and CPP clients. During design we had several ideas in
> > > mind:
> > > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > >
> > > > > > > Since then a number of other concepts were mixed to the
> cocktail:
> > > > > > > - Affinity key fields
> > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > - Binary Object as storage format
> > > > > > >
> > > > > > > 2) My proposals
> > > > > > >
> > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > Binary Objects are terrible candidates for storage. Too fat,
> too
> > > > slow.
> > > > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > > metadata,
> > > > > > no
> > > > > > > length, no hash code, etc), allow supper-fast field access,
> > support
> > > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > > temporal
> > > > > > > types (date, time, timestamp, timestamp with timezone, etc),
> and
> > > > store
> > > > > > > these types as efficiently as possible.
> > > > > > >
> > > > > > > What we need is to introduce an interface which will convert a
> > pair
> > > > of
> > > > > > > key-value objects into a row. This row will be used to store
> data
> > > > and to
> > > > > > > get fields from it. Care about memory consumption, need SQL and
> > > > strict
> > > > > > > schema - use one format. Need flexibility and prefer key-value
> > > > access -
> > > > > > use
> > > > > > > another format which will store binary objects unchanged
> (current
> > > > > > > behavior).
> > > > > > >
> > > > > > > interface DataRowFormat {
> > > > > > >     DataRow create(Object key, Object value); // primitives or
> > > binary
> > > > > > > objects
> > > > > > >     DataRowMetadata metadata();
> > > > > > > }
> > > > > > >
> > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > Affinity rules are governed by cache, not type. We should
> remove
> > > > > > > "affintiyFieldName" from metadata.
> > > > > > >
> > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > I do not know why we did that in the first place. This
> > restriction
> > > > > > prevents
> > > > > > > type evolution and confuses users.
> > > > > > >
> > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > fixed-length
> > > > > > fields,
> > > > > > > put fixed-length fields before variable-length.
> > > > > > > Motivation: to save space.
> > > > > > >
> > > > > > > What else? Please share your ideas.
> > > > > > >
> > > > > > > Vladimir.
> > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> > >
> >
>

Re: [IMPORTANT] Future of Binary Objects

Reply via email to