Re: Service grid redesign

Denis Mekhanikov Mon, 02 Apr 2018 04:56:28 -0700

Vyacheslav,

Service deployment design, based on replicated utility cache has proven to
be unstable and deadlock-prone.
You can find a list of JIRA issues, connected to it, in my previous letter.


The intention behind it is similar to the binary metadata redesign, that
happened in the following ticket: IGNITE-4157
<https://issues.apache.org/jira/browse/IGNITE-4157>
This change in service deployment procedure will eliminate need for another
internal replicated cache
and make service deployment more reliable on unstable topology.

Denis

вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <[email protected]>:

> Hi, Denis Mekhanikov!
>
> As far as I know, Ignite services are based on IgniteCache and we have
> all its features. We can use listeners or continuous queries for
> deployment synchronizations.
>
> Why do you want using the discovery layer for that?
>
> One more thing: we can use baseline approach for services, that means
> *IgniteService.deploy()* returns ready to work service after
> deployment on baseline nodes and deploy to other nodes on demand, for
> example when deployed service's loading will be hight.
>
> About versioning, maybe there is sense to extend public API:
> IgniteServices.service(name, *version*)?
>
> At first deployment, we can compute service's hashcode (just for an
> example) and store it, after new deployment request for services with
> an existing name we will compute new service's hashcode and compare
> them if they have different hashcodes that we will deploy new service
> as service with a different version.
>
>
> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[email protected]> wrote:
> > Denis,
> >
> > Thanks for the extensive analysis. There is a vast room for optimizations
> > on the service grid side.
> >
> > Yakov, Sam, Alex G.,
> >
> > How do you like the idea of the usage of discovery protocol for the
> service
> > grid system messages exchange? Any pitfalls?
> >
> >
> > --
> > Denis
> >
> >
> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <[email protected]
> >
> > wrote:
> >
> >> Igniters,
> >>
> >> I'd like to start a discussion on Ignite service grid redesign.
> >> We have a number of problems in our current architecture, that have to
> be
> >> addressed.
> >>
> >> Here are the most severe ones:
> >>
> >> One of them is lack of guarantee, that service is successfully deployed
> and
> >> ready for work by the time, when *IgniteService.deploy*()* methods
> return.
> >> Furthermore, if an exception is thrown from *Service.init() *method,
> then
> >> the deploying side is not able to receive it, or even understand, that
> >> service is in unusable state.
> >> So, you may end up in such situation, when you deployed a service
> without
> >> receiving any errors, then called a service's method, and hung
> indefinitely
> >> on this invocation.
> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
> >>
> >> Another problem is locking during service deployment on unstable
> topology.
> >> This issue is caused by missing updates in continuous query listeners on
> >> the internal cache.
> >> It is hard to reproduce, but it happens sometimes. We shouldn't allow
> such
> >> possibility, that deployment methods hang without saying anything.
> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
> >>
> >> I think, we should change the deployment procedure to make it more
> >> reliable.
> >> Moving from operating over internal replicated service cache to sending
> >> custom discovery events seems to be a good idea.
> >> Service deployment may trigger a discovery event, that will make chosen
> >> nodes deploy the service, and the same event will notify other nodes
> about
> >> the deployed service instances.
> >> It will eliminate the need for distributed transactions on the internal
> >> replicated system cache, and make the service deployment protocol more
> >> transparent.
> >>
> >> There are a few points, that should be taken into account though.
> >>
> >> First of all, we can't wait for services to be deployed and initialised
> in
> >> the discovery thread.
> >> So, we need to make notification about service deployment result
> >> asynchronous, presumably over communication protocol.
> >> I can think of a procedure similar to the current exchange protocol,
> when
> >> service deployment is initialised with an initial discovery message,
> >> followed by asynchronous notifications from the hosting servers over
> >> communication. And finally, one more discovery message will notify all
> >> nodes about the service deployment result and location of the deployed
> >> service instances. Coordinator will be responsible for collecting of the
> >> deployment results in this scheme.
> >>
> >> Another problem is failover in case, when some nodes fail during
> deployment
> >> or further work.
> >> The following cases should be handled:
> >>
> >>    1. coordinator failure during deployment;
> >>    2. failure of nodes, that were chosen to host the service, during
> >>    deployment;
> >>    3. failure of nodes, that contain deployed services, after the
> >>    deployment.
> >>
> >> The first case may be resolved by either continuation of deployment
> with a
> >> new coordinator, or by cancelling it.
> >> The second case will require another node to be chosen and notified.
> Maybe
> >> another discovery message will be needed.
> >> The third case will require redeployment, so coordinator should track
> >> topology changes and redeploy failed services.
> >>
> >> Another good improvement would be service versioning. This matter was
> >> already discussed in another thread:
> >>
> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
> >> td20858.html
> >> Let's resume this discussion and state the final decision here.
> >> This feature is closely connected to peer class loading, which is not
> >> working for services currently.
> >> So, service versioning should be implemented along with peer class
> loading.
> >> JIRA ticket for versioning:
> >> https://issues.apache.org/jira/browse/IGNITE-6069
> >> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
> >>
> >> Please share your thoughts. Constructive criticism is highly
> appreciated.
> >>
> >> Denis
> >>
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Service grid redesign

Reply via email to