Hi Patrick,

Patrick Wright wrote:
Hi

We ran into a situation last week where, after a firewall update on a
server hosting an LUS, the LUS was not available for about 45 minutes.
On that server setup, we have two LUS running (LUS-1 and LUS-2), both
configured with the same group and no other attributes--essentially, a
simple redundant setup. All services are registered with both LUS.

On all our clients, we use a ServiceDiscoveryManager and perform
lookups via a LookupCache. What we observed was that, on one
particular client, the client continued to attempt to access a service
instance which we believe was registered with LUS-1, the one which was
no longer reachable; we base this on an extra logging output we in a
DiscoveryListener attached to the SDM, which was throwing an exception
(ConnectionException, timeout) when trying to call
serviceRegistrar.getLocator().toString(). This exception was being
thrown throughout the 45 minutes until the firewall issue was fixed.
Along with those log entries, the client was also reporting that a
given service was not available, although we know it was available on
LUS-2. The exception is not the issue--the issue is that this
particular client was not failing over to the same service instances
registered with LUS-2.
>
On our other (many) Jini clients, we did not see the same behavior. In
the logs where we've taken a look, the client did report the same
exception, but just once, on trying to call
serviceRegistrar.getLocator().toString(), however, appeared to
continue using the service instances registered with LUS-2 without
problems.

I am a bit confused by your outline of your problem. Are you suggesting that some clients didn't find a particular service in the LookupCache that was registered with LUS-2 while LUS-1 was not reachable.

In case the SDM in your client was able to see LUS-2 it shouldn't have any problem seeing your service even in case LUS-1 became unreachable, assuming no other problems than LUS-1 not being reachable occurred.

I don't know whether the lookup services are to be found based on multicast and/or unicast, from your reference of 'same group' I think only multicast is used for finding your lookup service. In that case are you sure that LUS-2 was found by the SDM of your client? A good way to find out is to configure logging for the logger documented in http://java.sun.com/products/jini/2.1/doc/api/net/jini/discovery/LookupDiscovery.html, set the level to FINEST.

What is unclear to us, from the documentation, is how cases of a
ServiceRegistrar outage are handled by the SDM and the LookupCache.
What we imagine is that the event lease between the client and the
registrar must fail to renew, and at that point, the SDM should note
the problem and remove the registrar from the cache.

In particular, what we're not sure of is if we ourselves have to add
some special handling for this case (e.g. creating a new cache,
calling discard on the service instances) or if this should be handled
automagically, and there is some problem with configuration on our
end.

The spec of ServiceDiscoveryListener (http://java.sun.com/products/jini/2.1/doc/api/net/jini/lookup/ServiceDiscoveryListener.html) talks a lot about these cases.

I've used multiple lookup services for redundancy problems and failure of one shouldn't result in a registered service becoming 'invisible' if the others were still reachable.

Regards,
--
Mark

Reply via email to