Re: Service grid redesign

Vyacheslav Daradur Tue, 27 Mar 2018 13:22:08 -0700

Hi, Denis Mekhanikov!

As far as I know, Ignite services are based on IgniteCache and we have
all its features. We can use listeners or continuous queries for
deployment synchronizations.


Why do you want using the discovery layer for that?

One more thing: we can use baseline approach for services, that means
*IgniteService.deploy()* returns ready to work service after
deployment on baseline nodes and deploy to other nodes on demand, for
example when deployed service's loading will be hight.

About versioning, maybe there is sense to extend public API:
IgniteServices.service(name, *version*)?

At first deployment, we can compute service's hashcode (just for an
example) and store it, after new deployment request for services with
an existing name we will compute new service's hashcode and compare
them if they have different hashcodes that we will deploy new service
as service with a different version.


On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <dma...@apache.org> wrote:
> Denis,
>
> Thanks for the extensive analysis. There is a vast room for optimizations
> on the service grid side.
>
> Yakov, Sam, Alex G.,
>
> How do you like the idea of the usage of discovery protocol for the service
> grid system messages exchange? Any pitfalls?
>
>
> --
> Denis
>
>
> On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <dmekhani...@gmail.com>
> wrote:
>
>> Igniters,
>>
>> I'd like to start a discussion on Ignite service grid redesign.
>> We have a number of problems in our current architecture, that have to be
>> addressed.
>>
>> Here are the most severe ones:
>>
>> One of them is lack of guarantee, that service is successfully deployed and
>> ready for work by the time, when *IgniteService.deploy*()* methods return.
>> Furthermore, if an exception is thrown from *Service.init() *method, then
>> the deploying side is not able to receive it, or even understand, that
>> service is in unusable state.
>> So, you may end up in such situation, when you deployed a service without
>> receiving any errors, then called a service's method, and hung indefinitely
>> on this invocation.
>> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
>>
>> Another problem is locking during service deployment on unstable topology.
>> This issue is caused by missing updates in continuous query listeners on
>> the internal cache.
>> It is hard to reproduce, but it happens sometimes. We shouldn't allow such
>> possibility, that deployment methods hang without saying anything.
>> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
>>
>> I think, we should change the deployment procedure to make it more
>> reliable.
>> Moving from operating over internal replicated service cache to sending
>> custom discovery events seems to be a good idea.
>> Service deployment may trigger a discovery event, that will make chosen
>> nodes deploy the service, and the same event will notify other nodes about
>> the deployed service instances.
>> It will eliminate the need for distributed transactions on the internal
>> replicated system cache, and make the service deployment protocol more
>> transparent.
>>
>> There are a few points, that should be taken into account though.
>>
>> First of all, we can't wait for services to be deployed and initialised in
>> the discovery thread.
>> So, we need to make notification about service deployment result
>> asynchronous, presumably over communication protocol.
>> I can think of a procedure similar to the current exchange protocol, when
>> service deployment is initialised with an initial discovery message,
>> followed by asynchronous notifications from the hosting servers over
>> communication. And finally, one more discovery message will notify all
>> nodes about the service deployment result and location of the deployed
>> service instances. Coordinator will be responsible for collecting of the
>> deployment results in this scheme.
>>
>> Another problem is failover in case, when some nodes fail during deployment
>> or further work.
>> The following cases should be handled:
>>
>>    1. coordinator failure during deployment;
>>    2. failure of nodes, that were chosen to host the service, during
>>    deployment;
>>    3. failure of nodes, that contain deployed services, after the
>>    deployment.
>>
>> The first case may be resolved by either continuation of deployment with a
>> new coordinator, or by cancelling it.
>> The second case will require another node to be chosen and notified. Maybe
>> another discovery message will be needed.
>> The third case will require redeployment, so coordinator should track
>> topology changes and redeploy failed services.
>>
>> Another good improvement would be service versioning. This matter was
>> already discussed in another thread:
>> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
>> td20858.html
>> Let's resume this discussion and state the final decision here.
>> This feature is closely connected to peer class loading, which is not
>> working for services currently.
>> So, service versioning should be implemented along with peer class loading.
>> JIRA ticket for versioning:
>> https://issues.apache.org/jira/browse/IGNITE-6069
>> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
>>
>> Please share your thoughts. Constructive criticism is highly appreciated.
>>
>> Denis
>>



-- 
Best Regards, Vyacheslav D.

Re: Service grid redesign

Reply via email to