Re: Ignite 3.0 IgniteTables API Improvement Suggestion

Pavel Tupitsyn Mon, 26 Jul 2021 12:17:12 -0700

Igniters,

I've made the changes to the Tuple interface, please have a look:


https://github.com/apache/ignite-3/pull/245
https://issues.apache.org/jira/browse/IGNITE-14342

On Thu, Jul 1, 2021 at 12:25 PM Pavel Tupitsyn <ptupit...@apache.org> wrote:

> Val, agreed.
> Let's add length(), value(index), and Iterable to the Tuple interface.
>
> I've updated the ticket:
> https://issues.apache.org/jira/browse/IGNITE-14342
>
> On Wed, Jun 30, 2021 at 11:17 PM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Pavel,
>>
>> Thanks for your response, makes sense.
>>
>> Regarding the values() method: I would instead add the required methods to
>> the Tuple itself. E.g., it can implement Iterable, and additionally have
>> the value(int index) method, plus anything else that we might need.
>> I don't like returning a collection, because in Java it's a much wider
>> API.
>> We will end up returning an implementation that throws
>> UnsupportedOperationException for most of the methods. I know this
>> approach
>> is not uncommon in the Java world, but this doesn't make it good.
>> In .NET and other languages, we can use other abstractions, of course.
>> Every platform has its own specifics and practices, so APIs don't have to
>> be identical.
>>
>> -Val
>>
>> On Wed, Jun 30, 2021 at 7:44 AM Pavel Tupitsyn <ptupit...@apache.org>
>> wrote:
>>
>> > Hi Andrey,
>> >
>> > > This will force us to bother about interfaces/contracts and
>> compatibility
>> > > aspects in the future
>> >
>> > Schemas and versions are part of thin client wire protocol.
>> > This protocol is a public API - we'll have to care about compatibility
>> > anyway.
>> >
>> > Schema evolution is an important feature that users should understand.
>> > I see no reason to hide it from the API.
>> >
>> >
>> > > We already have a ticket [1]...
>> > > Will 'idx -> column' mapping only be enough for your purposes?
>> >
>> > The ticket sounds reasonable, but there are no API details.
>> > At the very least we need public access to column names, types, and
>> > indexes.
>> > I propose something like this:
>> >
>> > interface Tuple {
>> >     TupleSchema getSchema();
>> > }
>> >
>> > class TupleSchema {
>> >     int getVersion();
>> >     List<TupleSchemaColumn> getColumns();
>> > }
>> >
>> > class TupleSchemaColumn {
>> >     int index;
>> >     String name;
>> >     String typeName;
>> >     bool isKey;
>> > }
>> >
>> > On Wed, Jun 30, 2021 at 11:05 AM Andrey Mashenkov <
>> > andrey.mashen...@gmail.com> wrote:
>> >
>> > > Hi Pavel,
>> > >
>> > > 2. Schema and its version are internal things and shouldn't be
>> exposed,
>> > > otherwise, eventually, it will lead to the user will manage schemas
>> > > manually on his side for some purposes.
>> > > This will force us to bother about interfaces/contracts and
>> compatibility
>> > > aspects in the future with uncertain benefits for a user.
>> > >
>> > > As SchemaDescriptor is an internal API and the user expects a public
>> API.
>> > > I don't think we want to convert the descriptor back into public API
>> > terms
>> > > and it may be impossible for some data.
>> > >
>> > > We already have a ticket [1] to support accessing a column by an index
>> > and
>> > > exposing some metadata.
>> > > Will 'idx -> column' mapping only be enough for your purposes?
>> > >
>> > > > int Tuple.columnIndex(String)
>> > >
>> > >
>> > >
>> > >  [1] https://issues.apache.org/jira/browse/IGNITE-14342
>> > >
>> > > On Wed, Jun 30, 2021 at 9:34 AM Pavel Tupitsyn <ptupit...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi Val,
>> > > >
>> > > > Thanks for the comments, please see below:
>> > > >
>> > > >
>> > > > > Users will identify tables using names, they will never use IDs
>> > > >
>> > > > Ok, let's keep it this way.
>> > > >
>> > > >
>> > > > > Sounds like the Tuple should implement Iterable.
>> > > >
>> > > > I don't think Iterable is enough.
>> > > > We should have a way to get values by column index: Tuple.value(int
>> > > index),
>> > > > where index is according to column order in schema.
>> > > >
>> > > > Combined with Tuple.schema(), it will provide an efficient way to
>> > consume
>> > > > data -
>> > > > users will be able to read columns in any order, skip some of them,
>> > etc.
>> > > >
>> > > > This is a common pattern in APIs like ODBC or System.Data [1],
>> > > > which we'll need to implement on top of our thin clients later.
>> > > >
>> > > >
>> > > > > However, whether a user might
>> > > > > need to get a table schema for a particular version, I'm not
>> sure. Do
>> > > you
>> > > > > have a use case in mind for this? If not, I would keep this
>> internal
>> > > >
>> > > > Ok, we can keep the Table.schema(ver) method internal, as long as
>> > > > Table.schemas() is public and includes schema versions.
>> > > >
>> > > >
>> > > > > We already have async counterparts for all the methods where this
>> is
>> > > > applicable
>> > > >
>> > > > IgniteTables.createTable(), dropTable(), tables(), table() do not
>> have
>> > > > async counterparts,
>> > > > while internally they perform blocking wait on some futures.
>> > > >
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> https://docs.microsoft.com/en-us/dotnet/api/system.data.idatareader?view=net-5.0
>> > > >
>> > > > On Wed, Jun 30, 2021 at 4:51 AM Valentin Kulichenko <
>> > > > valentin.kuliche...@gmail.com> wrote:
>> > > >
>> > > > > Hi Pavel,
>> > > > >
>> > > > > Please see my comments below.
>> > > > >
>> > > > > -Val
>> > > > >
>> > > > > On Tue, Jun 29, 2021 at 2:23 PM Pavel Tupitsyn <
>> ptupit...@apache.org
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > Igniters,
>> > > > > >
>> > > > > > While working on "IEP-76 Thin Client Protocol for Ignite 3.0"
>> [1]
>> > (to
>> > > > be
>> > > > > > discussed separately), the following suggestions for the Table
>> API
>> > > came
>> > > > > up:
>> > > > > >
>> > > > > > 1. Expose table IDs: sending table name with every operation
>> (e.g.
>> > > GET)
>> > > > > is
>> > > > > > inefficient, string serialization is expensive by itself and
>> names
>> > > can
>> > > > be
>> > > > > > long.
>> > > > > >     - Table.id()
>> > > > > >     - IgniteTables.table(UUID)
>> > > > > >     - IgniteTables.dropTable(UUID)
>> > > > > >
>> > > > >
>> > > > > I don't think this should be a part of the public API. Users will
>> > > > identify
>> > > > > tables using names, they will never use IDs. As an internal
>> > > optimization
>> > > > > though - sure, we can have that. Sounds similar to the cache ID in
>> > 2.x.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > 2. Expose tuple schemas: to reduce payload size when sending
>> tuples
>> > > to
>> > > > > the
>> > > > > > client, we'll write only the schema version and column values,
>> then
>> > > the
>> > > > > > client can retrieve and cache schemas (ordered set of columns
>> per
>> > > > > version).
>> > > > > >     - Tuple.schema()
>> > > > > >     - Table.schemas()
>> > > > > >     - Table.schema(ver)
>> > > > > >
>> > > > >
>> > > > > Exposing the schema of a tuple makes sense. However, whether a
>> user
>> > > might
>> > > > > need to get a table schema for a particular version, I'm not
>> sure. Do
>> > > you
>> > > > > have a use case in mind for this? If not, I would keep this
>> internal
>> > as
>> > > > > well.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > 3. Expose tuple values as a collection: to serialize tuples
>> > > efficiently
>> > > > > > (see p2) we need an API to get all values at once. Right now the
>> > only
>> > > > API
>> > > > > > is to get values by column name, which involves a HashMap
>> lookup on
>> > > > every
>> > > > > > call.
>> > > > > >     - Tuple.values()
>> > > > > >
>> > > > >
>> > > > > Sounds like the Tuple should implement Iterable.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > 4. Simplify createTable API: use POJO-based configuration.
>> > > > > >     Creating a Consumer when some properties are optional seems
>> to
>> > be
>> > > > > > non-trivial.
>> > > > > >
>> > > > >
>> > > > > Yes, it's currently convoluted for sure. To my knowledge, there
>> are
>> > > plans
>> > > > > to improve this, but I will let other folks chime in with more
>> > details.
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > 5. Add async methods to IgniteTables API (all methods are async
>> > > inside
>> > > > > > anyway)
>> > > > > >
>> > > > >
>> > > > > We already have async counterparts for all the methods where this
>> is
>> > > > > applicable. E.g., for TableView#get, there is TableView#getAsync.
>> Is
>> > > > there
>> > > > > anything else that you propose to add?
>> > > > >
>> > > > >
>> > > > > >
>> > > > > >
>> > > > > > Thoughts, objections?
>> > > > > >
>> > > > > >
>> > > > > > [1]
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-76+Thin+Client+Protocol+for+Ignite+3.0
>> > > > > >
>> > > > > > [2]
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-54%3A+Schema-first+Approach
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > > Andrey V. Mashenkov
>> > >
>> >
>>
>

Re: Ignite 3.0 IgniteTables API Improvement Suggestion

Reply via email to