Re: Ignite 3.0 IgniteTables API Improvement Suggestion

Valentin Kulichenko Wed, 30 Jun 2021 13:17:41 -0700

Pavel,

Thanks for your response, makes sense.


Regarding the values() method: I would instead add the required methods to
the Tuple itself. E.g., it can implement Iterable, and additionally have
the value(int index) method, plus anything else that we might need.
I don't like returning a collection, because in Java it's a much wider API.
We will end up returning an implementation that throws
UnsupportedOperationException for most of the methods. I know this approach
is not uncommon in the Java world, but this doesn't make it good.
In .NET and other languages, we can use other abstractions, of course.
Every platform has its own specifics and practices, so APIs don't have to
be identical.

-Val

On Wed, Jun 30, 2021 at 7:44 AM Pavel Tupitsyn <[email protected]> wrote:

> Hi Andrey,
>
> > This will force us to bother about interfaces/contracts and compatibility
> > aspects in the future
>
> Schemas and versions are part of thin client wire protocol.
> This protocol is a public API - we'll have to care about compatibility
> anyway.
>
> Schema evolution is an important feature that users should understand.
> I see no reason to hide it from the API.
>
>
> > We already have a ticket [1]...
> > Will 'idx -> column' mapping only be enough for your purposes?
>
> The ticket sounds reasonable, but there are no API details.
> At the very least we need public access to column names, types, and
> indexes.
> I propose something like this:
>
> interface Tuple {
>     TupleSchema getSchema();
> }
>
> class TupleSchema {
>     int getVersion();
>     List<TupleSchemaColumn> getColumns();
> }
>
> class TupleSchemaColumn {
>     int index;
>     String name;
>     String typeName;
>     bool isKey;
> }
>
> On Wed, Jun 30, 2021 at 11:05 AM Andrey Mashenkov <
> [email protected]> wrote:
>
> > Hi Pavel,
> >
> > 2. Schema and its version are internal things and shouldn't be exposed,
> > otherwise, eventually, it will lead to the user will manage schemas
> > manually on his side for some purposes.
> > This will force us to bother about interfaces/contracts and compatibility
> > aspects in the future with uncertain benefits for a user.
> >
> > As SchemaDescriptor is an internal API and the user expects a public API.
> > I don't think we want to convert the descriptor back into public API
> terms
> > and it may be impossible for some data.
> >
> > We already have a ticket [1] to support accessing a column by an index
> and
> > exposing some metadata.
> > Will 'idx -> column' mapping only be enough for your purposes?
> >
> > > int Tuple.columnIndex(String)
> >
> >
> >
> >  [1] https://issues.apache.org/jira/browse/IGNITE-14342
> >
> > On Wed, Jun 30, 2021 at 9:34 AM Pavel Tupitsyn <[email protected]>
> > wrote:
> >
> > > Hi Val,
> > >
> > > Thanks for the comments, please see below:
> > >
> > >
> > > > Users will identify tables using names, they will never use IDs
> > >
> > > Ok, let's keep it this way.
> > >
> > >
> > > > Sounds like the Tuple should implement Iterable.
> > >
> > > I don't think Iterable is enough.
> > > We should have a way to get values by column index: Tuple.value(int
> > index),
> > > where index is according to column order in schema.
> > >
> > > Combined with Tuple.schema(), it will provide an efficient way to
> consume
> > > data -
> > > users will be able to read columns in any order, skip some of them,
> etc.
> > >
> > > This is a common pattern in APIs like ODBC or System.Data [1],
> > > which we'll need to implement on top of our thin clients later.
> > >
> > >
> > > > However, whether a user might
> > > > need to get a table schema for a particular version, I'm not sure. Do
> > you
> > > > have a use case in mind for this? If not, I would keep this internal
> > >
> > > Ok, we can keep the Table.schema(ver) method internal, as long as
> > > Table.schemas() is public and includes schema versions.
> > >
> > >
> > > > We already have async counterparts for all the methods where this is
> > > applicable
> > >
> > > IgniteTables.createTable(), dropTable(), tables(), table() do not have
> > > async counterparts,
> > > while internally they perform blocking wait on some futures.
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://docs.microsoft.com/en-us/dotnet/api/system.data.idatareader?view=net-5.0
> > >
> > > On Wed, Jun 30, 2021 at 4:51 AM Valentin Kulichenko <
> > > [email protected]> wrote:
> > >
> > > > Hi Pavel,
> > > >
> > > > Please see my comments below.
> > > >
> > > > -Val
> > > >
> > > > On Tue, Jun 29, 2021 at 2:23 PM Pavel Tupitsyn <[email protected]
> >
> > > > wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > While working on "IEP-76 Thin Client Protocol for Ignite 3.0" [1]
> (to
> > > be
> > > > > discussed separately), the following suggestions for the Table API
> > came
> > > > up:
> > > > >
> > > > > 1. Expose table IDs: sending table name with every operation (e.g.
> > GET)
> > > > is
> > > > > inefficient, string serialization is expensive by itself and names
> > can
> > > be
> > > > > long.
> > > > >     - Table.id()
> > > > >     - IgniteTables.table(UUID)
> > > > >     - IgniteTables.dropTable(UUID)
> > > > >
> > > >
> > > > I don't think this should be a part of the public API. Users will
> > > identify
> > > > tables using names, they will never use IDs. As an internal
> > optimization
> > > > though - sure, we can have that. Sounds similar to the cache ID in
> 2.x.
> > > >
> > > >
> > > > >
> > > > > 2. Expose tuple schemas: to reduce payload size when sending tuples
> > to
> > > > the
> > > > > client, we'll write only the schema version and column values, then
> > the
> > > > > client can retrieve and cache schemas (ordered set of columns per
> > > > version).
> > > > >     - Tuple.schema()
> > > > >     - Table.schemas()
> > > > >     - Table.schema(ver)
> > > > >
> > > >
> > > > Exposing the schema of a tuple makes sense. However, whether a user
> > might
> > > > need to get a table schema for a particular version, I'm not sure. Do
> > you
> > > > have a use case in mind for this? If not, I would keep this internal
> as
> > > > well.
> > > >
> > > >
> > > > >
> > > > > 3. Expose tuple values as a collection: to serialize tuples
> > efficiently
> > > > > (see p2) we need an API to get all values at once. Right now the
> only
> > > API
> > > > > is to get values by column name, which involves a HashMap lookup on
> > > every
> > > > > call.
> > > > >     - Tuple.values()
> > > > >
> > > >
> > > > Sounds like the Tuple should implement Iterable.
> > > >
> > > >
> > > > >
> > > > > 4. Simplify createTable API: use POJO-based configuration.
> > > > >     Creating a Consumer when some properties are optional seems to
> be
> > > > > non-trivial.
> > > > >
> > > >
> > > > Yes, it's currently convoluted for sure. To my knowledge, there are
> > plans
> > > > to improve this, but I will let other folks chime in with more
> details.
> > > >
> > > >
> > > > >
> > > > > 5. Add async methods to IgniteTables API (all methods are async
> > inside
> > > > > anyway)
> > > > >
> > > >
> > > > We already have async counterparts for all the methods where this is
> > > > applicable. E.g., for TableView#get, there is TableView#getAsync. Is
> > > there
> > > > anything else that you propose to add?
> > > >
> > > >
> > > > >
> > > > >
> > > > > Thoughts, objections?
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-76+Thin+Client+Protocol+for+Ignite+3.0
> > > > >
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-54%3A+Schema-first+Approach
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>

Re: Ignite 3.0 IgniteTables API Improvement Suggestion

Reply via email to