Re: [IMPORTANT] Future of Binary Objects

Vladimir Ozerov Fri, 23 Nov 2018 03:28:33 -0800

Ok, let's agree on the fact that we would like to make schema change rules
less restrictive. But how less - is separate topic. Use case which annoys
me the most is DROP/ADD COLUMN commands.


On Thu, Nov 22, 2018 at 12:25 PM Sergi Vladykin <[email protected]>
wrote:

> If we are developing a product for users, we already guessing what is right
> and what is wrong for them. So let's avoid these sophistic statements.
>
> In the end it is always our responsibility to provide a balanced set of
> trade-offs between
> usability, performance and safety.
>
> Let me repeat, I'm not against any possible type conversions, but I'm
> strongly against binary incompatible ones.
> If we always store List.of(1) as 1 and make them binary interchangeable,
> I'm OK with that.
>
> And still for good practices I'd suggest to look at what Protobuf allows
> and what not:
> https://developers.google.com/protocol-buffers/docs/proto3#updating
>
> Sergi
>
> чт, 22 нояб. 2018 г. в 11:04, Vladimir Ozerov <[email protected]>:
>
> > Sergi,
> >
> > I think we should not guess for users what is right or wrong for them. It
> > is up to user to decide what is valid. For example, consider a user who
> > operates on a list of Integers, and to optimize memory consumption he
> > decide to save in the same field either List<Integer>, or plain Integer
> in
> > case only single element exists. Another example - a kind of data lake or
> > data cleansing application, which may receive the same field in different
> > forms. E.g. age in the form of Integer or String. Does it work for user
> or
> > not? We do not know. Will he need to migrate the whole data set? We do
> not
> > know either.
> >
> > The only place in the product where we case is SQL. But in this case
> > instead of adding checks on binary level, we should validate data on
> cache
> > level. In fact, Ignite already works this way. E.g. nullability checks
> are
> > performed on cache level rather than binary. All we need is to move all
> > checks to cache level from binary level.
> >
> >
> > On Thu, Nov 22, 2018 at 9:41 AM Sergi Vladykin <[email protected]
> >
> > wrote:
> >
> > > It may be OK to extend compatible field types (like from Int to Long).
> > >
> > > In Protobuf for example this is allowed just because there is no
> > difference
> > > between Int and Long in binary format: they all are equally varlen
> > encoded
> > > and Longs just will occupy up to 9 bytes, while Ints up to 5.
> > >
> > > But for every other case, where binary representation is type
> dependent,
> > I
> > > would be against. This will either require to migrate the whole dataset
> > to
> > > a new model (which is always risky, since you may need to rollback to
> > > previous version of your code) or it will require type
> checks/conversions
> > > for each field access, which is a hard to reason complication and
> > possible
> > > performance penalty.
> > >
> > > Sergi
> > >
> > >
> > >
> > > чт, 22 нояб. 2018 г. в 09:23, Vladimir Ozerov <[email protected]>:
> > >
> > > > Denis,
> > > >
> > > > Several examples:
> > > > 1) DEFAULT values - in SQL you may avoid storing default value in the
> > > table
> > > > and store it in metadata instead. Not applicable for BinaryObject
> > because
> > > > the same binary object may be saved to two SQL tables with different
> > > > defaults
> > > > 2) DATE and other temporal types - in SQL you want to store it in
> > special
> > > > format to be able to extract date parts quickly (typically - 11
> bytes).
> > > But
> > > > in Java and some other languages the best format is plain long. this
> is
> > > why
> > > > we use it BinaryObject
> > > > 3) String charset - in SQL you may choose different charsets for
> > > different
> > > > tables. E.g. UTF-8 for one, ASCII for another. In BinaryObject we
> store
> > > > everything in UTF-8, and this is fine for most cases, well ... except
> > of
> > > > SQL :-)
> > > >
> > > > The key thing here is that you cannot define a format which will be
> > good
> > > > for both SQL, and native API. They are very different. This is why I
> > > > propose to define additional interface on cache level defining how to
> > > store
> > > > values, which will be very different from binary objects.
> > > >
> > > > Vladimir.
> > > >
> > > > On Thu, Nov 22, 2018 at 3:32 AM Denis Magda <[email protected]>
> wrote:
> > > >
> > > > > Vladimir,
> > > > >
> > > > > Could you educate me a little bit, why the current format is bad
> for
> > > SQL
> > > > > and why another one is more suitable?
> > > > >
> > > > > Also, if we introduce the new format then why would we keep the
> > binary
> > > > one?
> > > > > Is the new format just a next version of the binary one.
> > > > >
> > > > > 2.3) Remove restrictions on changing field type
> > > > > > I do not know why we did that in the first place. This
> restriction
> > > > > prevents
> > > > > > type evolution and confuses users.
> > > > >
> > > > >
> > > > > That is a hot requirement shared by those who use Ignite SQL in
> > > > production.
> > > > > +1.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > year.
> > > So
> > > > > we
> > > > > > need to start thinking about major product improvements. I'd like
> > to
> > > > > start
> > > > > > with binary objects.
> > > > > >
> > > > > > Currently they are one of the main limiting factors for the
> > product.
> > > > They
> > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> Ignite
> > > > > > comparing to other vendors. They are slow - not suitable for SQL
> at
> > > > all.
> > > > > >
> > > > > > I would like to ask all of you who worked with binary objects to
> > > share
> > > > > your
> > > > > > feedback and ideas, so that we understand how they should look
> like
> > > in
> > > > AI
> > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > minimize
> > > > > > critics. Then we will work on ideas in separate topics.
> > > > > >
> > > > > > 1) Historical background
> > > > > >
> > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> started
> > > > > working
> > > > > > on .NET and CPP clients. During design we had several ideas in
> > mind:
> > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > >
> > > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > > - Affinity key fields
> > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > - Binary Object as storage format
> > > > > >
> > > > > > 2) My proposals
> > > > > >
> > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > > slow.
> > > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > > metadata,
> > > > > no
> > > > > > length, no hash code, etc), allow supper-fast field access,
> support
> > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > > temporal
> > > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > > store
> > > > > > these types as efficiently as possible.
> > > > > >
> > > > > > What we need is to introduce an interface which will convert a
> pair
> > > of
> > > > > > key-value objects into a row. This row will be used to store data
> > and
> > > > to
> > > > > > get fields from it. Care about memory consumption, need SQL and
> > > strict
> > > > > > schema - use one format. Need flexibility and prefer key-value
> > > access -
> > > > > use
> > > > > > another format which will store binary objects unchanged (current
> > > > > > behavior).
> > > > > >
> > > > > > interface DataRowFormat {
> > > > > >     DataRow create(Object key, Object value); // primitives or
> > binary
> > > > > > objects
> > > > > >     DataRowMetadata metadata();
> > > > > > }
> > > > > >
> > > > > > 2.2) Remove affinity field from metadata
> > > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > > "affintiyFieldName" from metadata.
> > > > > >
> > > > > > 2.3) Remove restrictions on changing field type
> > > > > > I do not know why we did that in the first place. This
> restriction
> > > > > prevents
> > > > > > type evolution and confuses users.
> > > > > >
> > > > > > 2.4) Use bitmaps for "null" and default values and for
> fixed-length
> > > > > fields,
> > > > > > put fixed-length fields before variable-length.
> > > > > > Motivation: to save space.
> > > > > >
> > > > > > What else? Please share your ideas.
> > > > > >
> > > > > > Vladimir.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [IMPORTANT] Future of Binary Objects

Reply via email to