Denis, In theory data conversion could be avoided in certain cases. E.g. consider a case of loading data through streamer. We know the cache, we know it's metadata and row format. So instead of doing "user object" -> "binary object" -> "row", we can do "user object" -> "row".
On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <[email protected]> wrote: > Vladimir, > > Thank you for the clarification. I didn't see this distinction first. > > I meant using customizable formats for all serialization, not only for > storage. > The idea behind my proposal is to avoid data conversion, when loading data > into Ignite. > It will complicate usage of thin clients though, so I'm not sure, that it > will make users happier. > > But anyway, the same approach may be used for storage only. > > Denis > > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <[email protected]>: > > > Denis, > > > > Could you please clarify - are you talking about storage, e.g. how > objects > > are stored in Ignite, or about serialization as a whole? I'd like to > better > > understand whether the use case you described is relevant to my idea of > > splitting binary objects from underlying storage format. > > My vision was that we can use current BinaryObject protocol (with > whatever > > optimizations needed), as a common format for communication between nodes > > and a common serialization protocol. This is very handy because all > > participants (Java, С++, .NET, all sorts of thin clients) are able to > work > > with it. So if I have a "Person" class in Java I can read it in any other > > platform without any additional configuration. But when it comes to > > *storage*, then we may introduce pluggable row format interface which > will > > apply any necessary transformations. So if someone wants to store objects > > in Avro/Protobuf, and ready to configure and implement it (generate > > classes, implementa field extraction logic, etc.) - then just implement > > that interface. They key is that this implementation will only be needed > in > > Java, not in a dozen of platform we support. > > > > But when it comes to how to store object in a cache > > > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <[email protected] > > > > wrote: > > > > > People often ask about possibility to store their data in that format, > > that > > > they use in their applications. > > > If you use Avro everywhere in your application, then why not store data > > in > > > the same format in Ignite? > > > So, how about making an interface, that would enlist all operations we > > > need, > > > and use this interface everywhere without relying on any specific > > > implementation. > > > *BinaryObject* looks like a suitable interface, but the only > > > implementation, that you can get from Ignite > > > is *BinaryObjectImpl*. > > > I think, we should make Ignite extendible and provide capability to > > specify > > > your own data format > > > by implementing the corresponding interfaces. > > > So, if you like JSONB or Protobuf or whatever else, you could enable a > > > module for the corresponding > > > format, and use it for storing the data. > > > > > > Denis > > > > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <[email protected] > >: > > > > > > > I'd like @Vyacheslav Daradur approach. > > > > > > > > Maybe somebody could have a look at UnsafeRow in Spark > > > > > > > > > > > > > > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java > > > > UnsafeRow is a concrete InternalRow that represents a mutable > internal > > > > raw-memory (and hence unsafe) binary row format. > > > > > > > > P.S. If somebody is interested in this apporach, I could share more > > > > information > > > > > > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin < > [email protected] > > >: > > > > > > > > > I really like Protobuf format. It is probably not what we need for > > O(1) > > > > > fields access, > > > > > but for compact data representation we can derive lots from there. > > > > > > > > > > Also IMO, restricting field type change is absolutely sane idea. > > > > > The correct way to evolve schema in common case is to add new > fields > > > and > > > > > gradually > > > > > deprecate the old ones, if you can skip default/null fields in > binary > > > > > format this approach > > > > > will not introduce any noticeable performance/size overhead. > > > > > > > > > > Sergi > > > > > > > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur < > > [email protected] > > > >: > > > > > > > > > > > I think, one of a possible way to reduce overhead and TCO - SQL > > > Scheme > > > > > > approach. > > > > > > > > > > > > That assumes that metadata will be stored separately from > > serialized > > > > > > data to reduce size. > > > > > > In this case, the most advantages of Binary Objects like access > in > > > > > > O(1) and access without deserialization may be achieved. > > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > > Hi Alexey, > > > > > > > > > > > > > > Binary Objects only. > > > > > > > > > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev < > > > > > [email protected] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Do we discuss here Core features only or the roadmap for all > > > > > > components? > > > > > > > > > > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov < > > > > [email protected] > > > > > >: > > > > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > > > It is very likely that Apache Ignite 3.0 will be released > > next > > > > > year. > > > > > > So > > > > > > > > we > > > > > > > > > need to start thinking about major product improvements. > I'd > > > like > > > > > to > > > > > > > > start > > > > > > > > > with binary objects. > > > > > > > > > > > > > > > > > > Currently they are one of the main limiting factors for the > > > > > product. > > > > > > They > > > > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache > > > > Ignite > > > > > > > > > comparing to other vendors. They are slow - not suitable > for > > > SQL > > > > at > > > > > > all. > > > > > > > > > > > > > > > > > > I would like to ask all of you who worked with binary > objects > > > to > > > > > > share > > > > > > > > your > > > > > > > > > feedback and ideas, so that we understand how they should > > look > > > > like > > > > > > in AI > > > > > > > > > 3.0. This is a brain storm - let's accumulate ideas first > and > > > > > > minimize > > > > > > > > > critics. Then we will work on ideas in separate topics. > > > > > > > > > > > > > > > > > > 1) Historical background > > > > > > > > > > > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we > > > > started > > > > > > > > working > > > > > > > > > on .NET and CPP clients. During design we had several ideas > > in > > > > > mind: > > > > > > > > > - ability to read object fields in O(1) without > > deserialization > > > > > > > > > - interoperabillty between Java, .NET and CPP. > > > > > > > > > > > > > > > > > > Since then a number of other concepts were mixed to the > > > cocktail: > > > > > > > > > - Affinity key fields > > > > > > > > > - Strict typing for existing fields (aka metadata) > > > > > > > > > - Binary Object as storage format > > > > > > > > > > > > > > > > > > 2) My proposals > > > > > > > > > > > > > > > > > > 2.1) Introduce "Data Row Format" interface > > > > > > > > > Binary Objects are terrible candidates for storage. Too > fat, > > > too > > > > > > slow. > > > > > > > > > Efficient storage typically has <10 bytes overhead per row > > (no > > > > > > metadata, > > > > > > > > no > > > > > > > > > length, no hash code, etc), allow supper-fast field access, > > > > support > > > > > > > > > different string formats (ASCII, UTF-8, etc), support > > different > > > > > > temporal > > > > > > > > > types (date, time, timestamp, timestamp with timezone, > etc), > > > and > > > > > > store > > > > > > > > > these types as efficiently as possible. > > > > > > > > > > > > > > > > > > What we need is to introduce an interface which will > convert > > a > > > > pair > > > > > > of > > > > > > > > > key-value objects into a row. This row will be used to > store > > > data > > > > > > and to > > > > > > > > > get fields from it. Care about memory consumption, need SQL > > and > > > > > > strict > > > > > > > > > schema - use one format. Need flexibility and prefer > > key-value > > > > > > access - > > > > > > > > use > > > > > > > > > another format which will store binary objects unchanged > > > (current > > > > > > > > > behavior). > > > > > > > > > > > > > > > > > > interface DataRowFormat { > > > > > > > > > DataRow create(Object key, Object value); // primitives > > or > > > > > binary > > > > > > > > > objects > > > > > > > > > DataRowMetadata metadata(); > > > > > > > > > } > > > > > > > > > > > > > > > > > > 2.2) Remove affinity field from metadata > > > > > > > > > Affinity rules are governed by cache, not type. We should > > > remove > > > > > > > > > "affintiyFieldName" from metadata. > > > > > > > > > > > > > > > > > > 2.3) Remove restrictions on changing field type > > > > > > > > > I do not know why we did that in the first place. This > > > > restriction > > > > > > > > prevents > > > > > > > > > type evolution and confuses users. > > > > > > > > > > > > > > > > > > 2.4) Use bitmaps for "null" and default values and for > > > > fixed-length > > > > > > > > fields, > > > > > > > > > put fixed-length fields before variable-length. > > > > > > > > > Motivation: to save space. > > > > > > > > > > > > > > > > > > What else? Please share your ideas. > > > > > > > > > > > > > > > > > > Vladimir. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best Regards, Vyacheslav D. > > > > > > > > > > > > > > > > > > > > >
