Re: Apache Ignite 3 Alpha 2 webinar follow up questions

Andrey Mashenkov Thu, 22 Jul 2021 15:11:27 -0700

Hi Courtney,

Thanks for your feedback.


I've gone through the questions and have no the whole picture of your use
case.
Would you please clarify how you exactly use the Ignite? what are the
integration points?
and maybe share some experience with using Ignite SPIs?

We'll keep the information in mind while developing the Ignite,
because this may help us to make a better product.

By the way, I'll try to answer the questions.

>   1. Schema change - does that include the ability to change the types of
>   fields/columns?
Yes, we plan to support transparent conversion to a wider type on-fly (e.g.
'int' to 'long').
This is a major point of our Live-schema concept.
In fact, there is no need to convert data on all the nodes in a synchronous
way as old SQL databases do (if one supports though),
we are going to support multiple schema versions and convert data on-demand
on a per-row basis to the latest version,
then write-back the row.

More complex things like 'String' -> 'int' are out of scope for now because
it requires the execution of a user code on the critical path.
The limitation here is column MUST NOT be indexed, because an index over
the data of different kinds is impossible.


 >  2. Will the new guaranteed consistency between APIs also mean SQL will
 >  gain transaction support?
Yes, we plan to have Transactional SQL.
DDL will be non-transactional though, and I wonder if the one supports this.

Ignite 3 will operate with Rows underneath, but classic Table API and
Key-value will be available to a user
at the same time and with all consistency guarantees.


>  3. Has there been any decision about how much of Calcite will be exposed
>   to the client? When using thick clients, it'll be hugely beneficial to
be
>   able to work with Calcite APIs directly to provide custom rules and
>  optimizations to better suit organization needs
As of now, we have no plans to expose any Calcite API to a user.
AFAIK, we have our custom Calcite convention, custom rules that are aware
of distributed environment,
and additional AST nodes. The rules MUST correctly propagate internal
information about data distribution,
so I'm not sure want to give low-level access to them.

> We Index into Solr and use the Solr indices
Ignite 1-2 has poor support for TEXT queries, which is totally
unconfigurable.
Also, Lucene indices underneath are NOT persistent that requires too much
effort to fix it.
GeoSpatial index has the same issues, we decided to drop them along with
Indexing SPI at all.

However, you can find the activity on dev-list on the Index Query topic.
Guys are going to add IndexQuery (a scan query over the sorted index which
can use simple conditions) in Ignite 2.
We also plan to have the same functionality, maybe it is possible to add
full-text search support here.
Will it work for you, what do you think?


>    4. Will the unified storage model enable different versions of Ignite
to
>   be in the cluster when persistence is enabled so that rolling restarts
can
>   be done?
I'm not sure a rolling upgrade (RU) will be available because too much
compatibility issues should be resolved
to make RU possible under the load without downtime.

Maybe it makes sense to provide some grid mode (maintenance mode) for RU
purposes that will block all the user load
but allow upgrade the grid. E.g. for the pure in-memory case.

Persistence compatibility should be preserved as it works for Ignite 2.


>    5. Will it be possible to provide a custom cache store still and will
>   these changes enable custom cache stores to be queryable from SQL?
I'm not sure I fully understand this.
1. Usually, SQL is about indices. Ignite can't perform a query over the
unindexed data.

2. Fullscan over the cache that contains only part of data + scan the
CacheStore, then merging the results is a pain.
Most likely, running a query over CacheStore directly will be a simpler
way, and even more performant.
Shared CacheStore (same for all nodes) will definitely kill the performance
in that case.
So, the preliminary loadCache() call looks like a good compromise.

3. Splitting query into 2 parts to run on Ignite and to run on CacheStore
looks possible with Calcite,
but I think it impractical because in general, neither CacheStore nor
database structure are aware of the data partitioning.

4. Transactions can't be supported in case of direct CacheStore access,
because even if the underlying database supports 2-phase commit, which is a
rare case, the recovery protocol looks hard.
Just looks like this feature doesn't worth it.


>   6. This question wasn't mine but I was going to ask it as well: What
>   will happen to the Indexing API since H2 is being removed?
As I wrote above, Indexing SPI will be dropped, but IndexQuery will be
added.

>  1. As I mentioned above, we Index into Solr, in earlier versions of
>      our product we used the indexing SPI to index into Lucene on the
Ignite
>      nodes but this presented so many challenges we ultimately abandoned
it and
>      replaced it with the current Solr solution.
AFAIK, some guys developed and sell a plugin for Ignite-2 with persistent
Lucene and Geo indices.
I don't know about the capabilities and limitations of their solution,
because of closed code.
You can easily google it.

I saw few encouraged guys who want to improve TEXT queries,
but unfortunately, things weren't moved far enough. For now, they are in
the middle of fixing the merging TEXT query results.
So far so good.

I think it is a good chance to master the skill developing of a distributed
system for the one
who will take a lead over the full-text search feature and add native
FullText index support into Ignite-3.


>   7. What impact does RAFT now have on conflict resolution?
RAFT is a state machine replication protocol. It guarantees all the nodes
will see the updates in the same order.
So, seems no conflicts are possible. Recovery from split-brain is
impossible in common-case.

However, I think we have a conflict resolver analog in Ignite-3 as it is
very useful in some cases
e.g datacenter replication, incremental data load from 3-rd party source,
recovery from 3-rd party source.


> 8. CacheGroups.
AFAIK, CacheGroup will be eliminated, actually, we'll keep this mechanic,
but it will be configured in a different way,
which makes Ignite configuring a bit simpler.
Sorry, for now, I have no answer on your performance concerns, this part of
Ignite-3 slipped from my radar.


Let's wait if someone will clarify what we could expect in Ignite-3.
Guys, can someone chime in and give more light on 3,4,7,8 questions?


On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson <courtney.robin...@hypi.io>
wrote:

> Hey everyone,
> I attended the Alpha 2 update yesterday and was quite pleased to see the
> progress on things so far. So first, congratulations to everyone on the
> work being put in and thank you to Val and Kseniya for running yesterday's
> event.
>
> I asked a few questions after the webinar which Val had some answers to but
> suggested posting here as some of them are not things that have been
> thought about yet or no plans exist around it at this point.
>
> I'll put all of them here and if necessary we can break into different
> threads after.
>
>    1. Schema change - does that include the ability to change the types of
>    fields/columns?
>       1. Val's answer was yes with some limitations but those are not well
>       defined yet. He did mention that something like some kind of
> transformer
>       could be provided for doing the conversion and I would second this,
> even
>       for common types like int to long being able to do a custom
> conversion will
>       be immensely valuable.
>    2. Will the new guaranteed consistency between APIs also mean SQL will
>    gain transaction support?
>       1. I believe the answer here was yes but perhaps someone else may
>       want to weigh in to confirm
>    3. Has there been any decision about how much of Calcite will be exposed
>    to the client? When using thick clients, it'll be hugely beneficial to
> be
>    able to work with Calcite APIs directly to provide custom rules and
>    optimisations to better suit organisation needs
>    1. We currently use Calcite ourselves and have a lot of custom rules and
>       optimisations and have slowly pushed more of our queries to
> Calcite that we
>       then push down to Ignite.
>       2. We Index into Solr and use the Solr indices and others to
>       fulfill over all queries with Ignite just being one of the
> possible storage
>       targets Calcite pushes down to. If we could get to the calcite
> API from an
>       Ignite thick client, it would enable us to remove a layer of
> abstraction
>       and complexity and make Ignite our primary that we then link
> with Solr and
>       others to fulfill queries.
>    4. Will the unified storage model enable different versions of Ignite to
>    be in the cluster when persistence is enabled so that rolling restarts
> can
>    be done?
>    1. We have to do a strange dance to perform Ignite upgrades without
>       downtime because pods/nodes will fail to start on version mismatch
> and if
>       we get that dance wrong, we will corrupt a node's data. It will make
>       admin/upgrades far less brittle and error prone if this was possible.
>    5. Will it be possible to provide a custom cache store still and will
>    these changes enable custom cache stores to be queryable from SQL?
>    1. Our Ignite usage is wide and complex because we use KV, SQL and other
>       APIs. The inconsistency of what can and can't be used from one API to
>       another is a real challenge and has forced us over time to stick
> to one API
>       and write alternative solutions outside of Ignite. It will
> drastically
>       simplify things if any CacheStore (or some new equivalent) could
> be plugged
>       in and be made accessible to SQL (and in fact all other APIs) without
>       having to load all the data from the underlying CacheStore first
> into memory
>    6. This question wasn't mine but I was going to ask it as well: What
>    will happen to the Indexing API since H2 is being removed?
>       1. As I mentioned above, we Index into Solr, in earlier versions of
>       our product we used the indexing SPI to index into Lucene on the
> Ignite
>       nodes but this presented so many challenges we ultimately
> abandoned it and
>       replaced it with the current Solr solution.
>       2. Lucene indexing was ideal because it meant we didn't have to
>       re-invent Solr or Elasticsearch's sharding capabilities, that was
> almost
>       automatic with Ignite only giving you the data that was meant for the
>       current node.
>       3. The Lucene API enabled more flexibility and removed a network
>       round trip from our queries.
>       4. Given Calcite's ability to support custom SQL functions, I'd love
>       to have the ability to define custom functions that Lucene was
> answering
>    7. What impact does RAFT now have on conflict resolution, off the top of
>    my head there are two cases
>       1. On startup after a split brain Ignite currently takes an "exercise
>       for the reader" approach and dumps a log along the lines of
>
> >    1. BaselineTopology of joining node is not compatible with
> >       BaselineTopology in the cluster.
> >    1. Branching history of cluster BlT doesn't contain branching point
> >       hash of joining node BlT. Consider cleaning persistent storage of
> the node
> >       and adding it to the cluster again.
> >
>    1. This leaves you with no choice except to take one half and manually
>       copy, write data back over to the other half then destroy the bad
> one.
>       2. The second case is conflicts on keys, I
>       beleive CacheVersionConflictResolver and manager are used
>       by GridCacheMapEntry which just says if use old value do this
> otherwise use
>       newVal. Ideally this will be exposed in the new API so that one can
>       override this behaviour. The last writer wins approach isn't always
> ideal
>       and the semantics of the domain can mean that what is consider
> "correct" in
>       a conflict is not so for a different domain.
>    8. This is last on the list but is actually the most important for us
>    right now as it is an impending and growing risk. We allow customers to
>    create their own tables on demand. We're already using the same cache
> group
>    etc for data structures to be re-used but now that we're getting to
>    thousands of tables/caches our startup times are sometimes unpredictably
>    long - at present it seems to depend on the state of the cache/table
> before
>    the restart but we're into the order of 5 - 7 mins and steadily
> increasing
>    with the growth of tables. Are there any provisions in Ignite 3 for
>    ensuring startup time isn't proportional to the number of tables/caches
>    available?
>
>
> Those are the key things I can think of at the moment. Val and others I'd
> love to open a conversation around these.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Apache Ignite 3 Alpha 2 webinar follow up questions

Reply via email to