Re: Service grid redesign

Valentin Kulichenko Wed, 04 Apr 2018 16:34:21 -0700

I don't think peer class loading is even possible for services. I believe
we should reuse DeploymentSpi [1] for versioning.


[1] https://apacheignite.readme.io/docs/deployment-spi

-Val

On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <[email protected]> wrote:

> Sorry, that was me who renamed the IEP to "Oil Change in Service Grid". Was
> writing this email after the renaming. Like that title more because it's
> fun and highlights what we're intended to do - cleaning of our service grid
> engine and powering it up with new "liquid" (new communication and
> deployment approach not available before).
>
> Denis
>
>
> > This message contains serialized service instance and its configuration.
> > It is delivered to the coordinator node first, that calculates the
> service
> > deployment assignments and adds this information to the message.
>
>
> I would consider using a NodeFilter first to decide where a service can be
> potentially deployed.  Otherwise, we would require service classes to be on
> every node (every node might become a coordinator) which is not the desired
> requirement.
>
>
> As for the peer-class-loading, I would backup up Dmitriy here. Let's at
> least not to focus on this task for now. We should design services
> versioning in the right way first and support it.
>
> --
> Denis
>
>
>
> On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <[email protected]>
> wrote:
>
> > Here is the correct link:
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 17%3A+Oil+Change+in+Service+Grid
> >
> > I have looked at the tickets there, and I believe that we should not
> > support peer-deployment for services. It is very hard and I do not think
> we
> > should even try.
> >
> > I am proposing closing this ticket as Won't Fix -
> > https://issues.apache.org/jira/browse/IGNITE-975
> >
> > D.
> >
> > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <[email protected]>
> > wrote:
> >
> > > Vyacheslav,
> > >
> > > I've just posted my first draft of the IEP:
> > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 17%3A+Service+grid+
> > > improvements
> > > It's not finished yet, but you can get the idea from it.
> > > If you have some thoughts on your mind, please let me know, I'll add
> them
> > > to the IEP.
> > >
> > > Denis
> > >
> > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <[email protected]>:
> > >
> > > > Denis, thanks for the link.
> > > >
> > > > I looked through the task and I think that understand your redesign
> > point
> > > > now.
> > > >
> > > > Do you have a clear plan or IEP for the whole redesign?
> > > >
> > > > I'm interested in this component and I'd like to take part in the
> > > > development.
> > > >
> > > >
> > > >
> > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > [email protected]>
> > > > wrote:
> > > > > Vyacheslav,
> > > > >
> > > > > Service deployment design, based on replicated utility cache has
> > proven
> > > > to
> > > > > be unstable and deadlock-prone.
> > > > > You can find a list of JIRA issues, connected to it, in my previous
> > > > letter.
> > > > >
> > > > > The intention behind it is similar to the binary metadata redesign,
> > > that
> > > > > happened in the following ticket: IGNITE-4157
> > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > This change in service deployment procedure will eliminate need for
> > > > another
> > > > > internal replicated cache
> > > > > and make service deployment more reliable on unstable topology.
> > > > >
> > > > > Denis
> > > > >
> > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <
> [email protected]
> > >:
> > > > >
> > > > >> Hi, Denis Mekhanikov!
> > > > >>
> > > > >> As far as I know, Ignite services are based on IgniteCache and we
> > have
> > > > >> all its features. We can use listeners or continuous queries for
> > > > >> deployment synchronizations.
> > > > >>
> > > > >> Why do you want using the discovery layer for that?
> > > > >>
> > > > >> One more thing: we can use baseline approach for services, that
> > means
> > > > >> *IgniteService.deploy()* returns ready to work service after
> > > > >> deployment on baseline nodes and deploy to other nodes on demand,
> > for
> > > > >> example when deployed service's loading will be hight.
> > > > >>
> > > > >> About versioning, maybe there is sense to extend public API:
> > > > >> IgniteServices.service(name, *version*)?
> > > > >>
> > > > >> At first deployment, we can compute service's hashcode (just for
> an
> > > > >> example) and store it, after new deployment request for services
> > with
> > > > >> an existing name we will compute new service's hashcode and
> compare
> > > > >> them if they have different hashcodes that we will deploy new
> > service
> > > > >> as service with a different version.
> > > > >>
> > > > >>
> > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[email protected]>
> > > > wrote:
> > > > >> > Denis,
> > > > >> >
> > > > >> > Thanks for the extensive analysis. There is a vast room for
> > > > optimizations
> > > > >> > on the service grid side.
> > > > >> >
> > > > >> > Yakov, Sam, Alex G.,
> > > > >> >
> > > > >> > How do you like the idea of the usage of discovery protocol for
> > the
> > > > >> service
> > > > >> > grid system messages exchange? Any pitfalls?
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Denis
> > > > >> >
> > > > >> >
> > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > > [email protected]
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Igniters,
> > > > >> >>
> > > > >> >> I'd like to start a discussion on Ignite service grid redesign.
> > > > >> >> We have a number of problems in our current architecture, that
> > have
> > > > to
> > > > >> be
> > > > >> >> addressed.
> > > > >> >>
> > > > >> >> Here are the most severe ones:
> > > > >> >>
> > > > >> >> One of them is lack of guarantee, that service is successfully
> > > > deployed
> > > > >> and
> > > > >> >> ready for work by the time, when *IgniteService.deploy*()*
> > methods
> > > > >> return.
> > > > >> >> Furthermore, if an exception is thrown from *Service.init()
> > > *method,
> > > > >> then
> > > > >> >> the deploying side is not able to receive it, or even
> understand,
> > > > that
> > > > >> >> service is in unusable state.
> > > > >> >> So, you may end up in such situation, when you deployed a
> service
> > > > >> without
> > > > >> >> receiving any errors, then called a service's method, and hung
> > > > >> indefinitely
> > > > >> >> on this invocation.
> > > > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
> > > > >> >>
> > > > >> >> Another problem is locking during service deployment on
> unstable
> > > > >> topology.
> > > > >> >> This issue is caused by missing updates in continuous query
> > > > listeners on
> > > > >> >> the internal cache.
> > > > >> >> It is hard to reproduce, but it happens sometimes. We shouldn't
> > > allow
> > > > >> such
> > > > >> >> possibility, that deployment methods hang without saying
> > anything.
> > > > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
> > > > >> >>
> > > > >> >> I think, we should change the deployment procedure to make it
> > more
> > > > >> >> reliable.
> > > > >> >> Moving from operating over internal replicated service cache to
> > > > sending
> > > > >> >> custom discovery events seems to be a good idea.
> > > > >> >> Service deployment may trigger a discovery event, that will
> make
> > > > chosen
> > > > >> >> nodes deploy the service, and the same event will notify other
> > > nodes
> > > > >> about
> > > > >> >> the deployed service instances.
> > > > >> >> It will eliminate the need for distributed transactions on the
> > > > internal
> > > > >> >> replicated system cache, and make the service deployment
> protocol
> > > > more
> > > > >> >> transparent.
> > > > >> >>
> > > > >> >> There are a few points, that should be taken into account
> though.
> > > > >> >>
> > > > >> >> First of all, we can't wait for services to be deployed and
> > > > initialised
> > > > >> in
> > > > >> >> the discovery thread.
> > > > >> >> So, we need to make notification about service deployment
> result
> > > > >> >> asynchronous, presumably over communication protocol.
> > > > >> >> I can think of a procedure similar to the current exchange
> > > protocol,
> > > > >> when
> > > > >> >> service deployment is initialised with an initial discovery
> > > message,
> > > > >> >> followed by asynchronous notifications from the hosting servers
> > > over
> > > > >> >> communication. And finally, one more discovery message will
> > notify
> > > > all
> > > > >> >> nodes about the service deployment result and location of the
> > > > deployed
> > > > >> >> service instances. Coordinator will be responsible for
> collecting
> > > of
> > > > the
> > > > >> >> deployment results in this scheme.
> > > > >> >>
> > > > >> >> Another problem is failover in case, when some nodes fail
> during
> > > > >> deployment
> > > > >> >> or further work.
> > > > >> >> The following cases should be handled:
> > > > >> >>
> > > > >> >>    1. coordinator failure during deployment;
> > > > >> >>    2. failure of nodes, that were chosen to host the service,
> > > during
> > > > >> >>    deployment;
> > > > >> >>    3. failure of nodes, that contain deployed services, after
> the
> > > > >> >>    deployment.
> > > > >> >>
> > > > >> >> The first case may be resolved by either continuation of
> > deployment
> > > > >> with a
> > > > >> >> new coordinator, or by cancelling it.
> > > > >> >> The second case will require another node to be chosen and
> > > notified.
> > > > >> Maybe
> > > > >> >> another discovery message will be needed.
> > > > >> >> The third case will require redeployment, so coordinator should
> > > track
> > > > >> >> topology changes and redeploy failed services.
> > > > >> >>
> > > > >> >> Another good improvement would be service versioning. This
> matter
> > > was
> > > > >> >> already discussed in another thread:
> > > > >> >>
> > > > >>
> > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > com/Service-versioning-
> > > > >> >> td20858.html
> > > > >> >> Let's resume this discussion and state the final decision here.
> > > > >> >> This feature is closely connected to peer class loading, which
> is
> > > not
> > > > >> >> working for services currently.
> > > > >> >> So, service versioning should be implemented along with peer
> > class
> > > > >> loading.
> > > > >> >> JIRA ticket for versioning:
> > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > >> >> Peer class loading: https://issues.apache.org/
> > > jira/browse/IGNITE-975
> > > > >> >>
> > > > >> >> Please share your thoughts. Constructive criticism is highly
> > > > >> appreciated.
> > > > >> >>
> > > > >> >> Denis
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best Regards, Vyacheslav D.
> > > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> > >
> >
>

Re: Service grid redesign

Reply via email to