Re: Problem with SDM/LookupCache when LUS unavailable

Patrick Wright Tue, 18 Aug 2009 05:33:02 -0700

Hi Mark

> From the above it is not completely clear whether LUS-1 and LUS-2 were
> running on the same server, bound to the same IP number or that only LUS-1
> was affected by the firewall update and that LUS-2 is on a different server
> or bound to another IP number and not affacted by the firewall update.


They were running on two different servers.

>
> Also what went exactly wrong with the firewall update, was nothing reachable
> or was it that just certain services were blocked.

>From my understanding of the situation, an admin loaded changes to
iptables config and the box where LUS-1 was running was thereafter
completely unreachable until a hard reboot.


> What was exactly misconfigured with the firewall update. What if the event
> registration fails because certain ports being blocked, while multicast and
> unicast discovery is allowed through the firewall.

I don't know the details, but know enough to say that the box was
unreachable over the network until it was rebooted with the prior
iptables config.


>
> At INFO level for net.jini.lookup.ServiceDiscoveryManager a failure of lease
> creation or renewal should be visible in the logs.

OK, I will make sure we have this enabled in the future.


>
> In case the SDM (by means of an implementation of DiscoveryManagement)
> encounters a definite failure of a lookup service it will discard that
> lookup service, but that lookup service will be eligible for (re)discovery,
> meaning that when the SDM receives another multicast message that indicates
> the lookup service is available on the network it will try to register with
> that lookup service. That will fail in your case and it will be discarded.

OK, thanks for the clarification. I just found the section of
http://java.sun.com/products/jini/2.1/doc/specs/html/servicediscutil-spec.html
(under "The DiscoveryManagement Interface") which describes this.
However, it's not clear to me how a lookup helper class (our clients
are configured to use LookupDiscoveryManager) "determine" if a lookup
service is no longer available. In the Discovery Utilities Spec
(http://java.sun.com/products/jini/2.1/doc/specs/html/discoveryutil-spec.html),
I find:

"Currently, there exist utilities such as the LookupDiscovery and
LookupDiscoveryManager helper utilities that will, on behalf of a
discovering entity, automatically discard a lookup service upon
determining that the lookup service has become unreachable or
uninteresting. Although most entities will typically employ such a
utility to help with both its discovery as well as its discard duties,
it is important to note that if the entity itself determines that the
lookup service is unavailable, it is the responsibility of the entity
to invoke the discard method. This scenario usually happens when the
entity attempts to interact with a lookup service, but encounters an
exceptional condition (for example, a communication failure). When the
entity actively discards a lookup service, the discarded lookup
service becomes eligible to be re-discovered. Allowing unreachable
lookup services to remain in the managed set can result in repeated
and unnecessary attempts to interact with lookup services with which
the entity can no longer communicate. Thus, the mechanism provided by
this method is intended to provide a way to remove such "stale" lookup
service references from the managed set."

However, I don't find any more detail on the topic. Thus it is unclear
if we need to call discard(registrar) when we believe the registrar is
no longer available. At least in this one case, it appears that the
registrar may not have been discarded.


Thanks a lot for helping out with this, Mark. I'm going to rework the
logging and then see if I can reproduce this, or at least have better
logging enabled if it reappears. May be some confusion on our end.


Regards
Patrick

Re: Problem with SDM/LookupCache when LUS unavailable

Reply via email to