Ok, let's agree on the fact that we would like to make schema change rules less restrictive. But how less - is separate topic. Use case which annoys me the most is DROP/ADD COLUMN commands.
On Thu, Nov 22, 2018 at 12:25 PM Sergi Vladykin <[email protected]> wrote: > If we are developing a product for users, we already guessing what is right > and what is wrong for them. So let's avoid these sophistic statements. > > In the end it is always our responsibility to provide a balanced set of > trade-offs between > usability, performance and safety. > > Let me repeat, I'm not against any possible type conversions, but I'm > strongly against binary incompatible ones. > If we always store List.of(1) as 1 and make them binary interchangeable, > I'm OK with that. > > And still for good practices I'd suggest to look at what Protobuf allows > and what not: > https://developers.google.com/protocol-buffers/docs/proto3#updating > > Sergi > > чт, 22 нояб. 2018 г. в 11:04, Vladimir Ozerov <[email protected]>: > > > Sergi, > > > > I think we should not guess for users what is right or wrong for them. It > > is up to user to decide what is valid. For example, consider a user who > > operates on a list of Integers, and to optimize memory consumption he > > decide to save in the same field either List<Integer>, or plain Integer > in > > case only single element exists. Another example - a kind of data lake or > > data cleansing application, which may receive the same field in different > > forms. E.g. age in the form of Integer or String. Does it work for user > or > > not? We do not know. Will he need to migrate the whole data set? We do > not > > know either. > > > > The only place in the product where we case is SQL. But in this case > > instead of adding checks on binary level, we should validate data on > cache > > level. In fact, Ignite already works this way. E.g. nullability checks > are > > performed on cache level rather than binary. All we need is to move all > > checks to cache level from binary level. > > > > > > On Thu, Nov 22, 2018 at 9:41 AM Sergi Vladykin <[email protected] > > > > wrote: > > > > > It may be OK to extend compatible field types (like from Int to Long). > > > > > > In Protobuf for example this is allowed just because there is no > > difference > > > between Int and Long in binary format: they all are equally varlen > > encoded > > > and Longs just will occupy up to 9 bytes, while Ints up to 5. > > > > > > But for every other case, where binary representation is type > dependent, > > I > > > would be against. This will either require to migrate the whole dataset > > to > > > a new model (which is always risky, since you may need to rollback to > > > previous version of your code) or it will require type > checks/conversions > > > for each field access, which is a hard to reason complication and > > possible > > > performance penalty. > > > > > > Sergi > > > > > > > > > > > > чт, 22 нояб. 2018 г. в 09:23, Vladimir Ozerov <[email protected]>: > > > > > > > Denis, > > > > > > > > Several examples: > > > > 1) DEFAULT values - in SQL you may avoid storing default value in the > > > table > > > > and store it in metadata instead. Not applicable for BinaryObject > > because > > > > the same binary object may be saved to two SQL tables with different > > > > defaults > > > > 2) DATE and other temporal types - in SQL you want to store it in > > special > > > > format to be able to extract date parts quickly (typically - 11 > bytes). > > > But > > > > in Java and some other languages the best format is plain long. this > is > > > why > > > > we use it BinaryObject > > > > 3) String charset - in SQL you may choose different charsets for > > > different > > > > tables. E.g. UTF-8 for one, ASCII for another. In BinaryObject we > store > > > > everything in UTF-8, and this is fine for most cases, well ... except > > of > > > > SQL :-) > > > > > > > > The key thing here is that you cannot define a format which will be > > good > > > > for both SQL, and native API. They are very different. This is why I > > > > propose to define additional interface on cache level defining how to > > > store > > > > values, which will be very different from binary objects. > > > > > > > > Vladimir. > > > > > > > > On Thu, Nov 22, 2018 at 3:32 AM Denis Magda <[email protected]> > wrote: > > > > > > > > > Vladimir, > > > > > > > > > > Could you educate me a little bit, why the current format is bad > for > > > SQL > > > > > and why another one is more suitable? > > > > > > > > > > Also, if we introduce the new format then why would we keep the > > binary > > > > one? > > > > > Is the new format just a next version of the binary one. > > > > > > > > > > 2.3) Remove restrictions on changing field type > > > > > > I do not know why we did that in the first place. This > restriction > > > > > prevents > > > > > > type evolution and confuses users. > > > > > > > > > > > > > > > That is a hot requirement shared by those who use Ignite SQL in > > > > production. > > > > > +1. > > > > > > > > > > -- > > > > > Denis > > > > > > > > > > On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov < > > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > Igniters, > > > > > > > > > > > > It is very likely that Apache Ignite 3.0 will be released next > > year. > > > So > > > > > we > > > > > > need to start thinking about major product improvements. I'd like > > to > > > > > start > > > > > > with binary objects. > > > > > > > > > > > > Currently they are one of the main limiting factors for the > > product. > > > > They > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache > Ignite > > > > > > comparing to other vendors. They are slow - not suitable for SQL > at > > > > all. > > > > > > > > > > > > I would like to ask all of you who worked with binary objects to > > > share > > > > > your > > > > > > feedback and ideas, so that we understand how they should look > like > > > in > > > > AI > > > > > > 3.0. This is a brain storm - let's accumulate ideas first and > > > minimize > > > > > > critics. Then we will work on ideas in separate topics. > > > > > > > > > > > > 1) Historical background > > > > > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we > started > > > > > working > > > > > > on .NET and CPP clients. During design we had several ideas in > > mind: > > > > > > - ability to read object fields in O(1) without deserialization > > > > > > - interoperabillty between Java, .NET and CPP. > > > > > > > > > > > > Since then a number of other concepts were mixed to the cocktail: > > > > > > - Affinity key fields > > > > > > - Strict typing for existing fields (aka metadata) > > > > > > - Binary Object as storage format > > > > > > > > > > > > 2) My proposals > > > > > > > > > > > > 2.1) Introduce "Data Row Format" interface > > > > > > Binary Objects are terrible candidates for storage. Too fat, too > > > slow. > > > > > > Efficient storage typically has <10 bytes overhead per row (no > > > > metadata, > > > > > no > > > > > > length, no hash code, etc), allow supper-fast field access, > support > > > > > > different string formats (ASCII, UTF-8, etc), support different > > > > temporal > > > > > > types (date, time, timestamp, timestamp with timezone, etc), and > > > store > > > > > > these types as efficiently as possible. > > > > > > > > > > > > What we need is to introduce an interface which will convert a > pair > > > of > > > > > > key-value objects into a row. This row will be used to store data > > and > > > > to > > > > > > get fields from it. Care about memory consumption, need SQL and > > > strict > > > > > > schema - use one format. Need flexibility and prefer key-value > > > access - > > > > > use > > > > > > another format which will store binary objects unchanged (current > > > > > > behavior). > > > > > > > > > > > > interface DataRowFormat { > > > > > > DataRow create(Object key, Object value); // primitives or > > binary > > > > > > objects > > > > > > DataRowMetadata metadata(); > > > > > > } > > > > > > > > > > > > 2.2) Remove affinity field from metadata > > > > > > Affinity rules are governed by cache, not type. We should remove > > > > > > "affintiyFieldName" from metadata. > > > > > > > > > > > > 2.3) Remove restrictions on changing field type > > > > > > I do not know why we did that in the first place. This > restriction > > > > > prevents > > > > > > type evolution and confuses users. > > > > > > > > > > > > 2.4) Use bitmaps for "null" and default values and for > fixed-length > > > > > fields, > > > > > > put fixed-length fields before variable-length. > > > > > > Motivation: to save space. > > > > > > > > > > > > What else? Please share your ideas. > > > > > > > > > > > > Vladimir. > > > > > > > > > > > > > > > > > > > > >
