Hi,

I got no answer on the ports@ list, therefore I hope someone here has an 
idea.
I am having difficulties to get multicast communication running on the 
heartbeat (http://www.linux-ha.org) port. When I configure it for multicast 
and startup the cluster node, I see the following in /var/log/messages:

Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast 
packet [-1]: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast 
fxp0.: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast 
packet [-1]: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast 
fxp0.: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast 
packet [-1]: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast 
fxp0.: Host is down
Oct 25 09:02:53 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast 
packet [-1]: Host is down
Oct 25 09:02:53 defiant heartbeat: [7910]: ERROR: write failure on mcast 
fxp0.: Host is down

and tcpdump sees this:
Oct 25 09:05:08.038580 00:0e:7b:fc:c0:a0 01:00:5e:00:00:01 0800 42: 10.0.0.5 
> 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1]
Oct 25 09:05:12.063762 00:0e:7b:fc:c0:a0 01:00:5e:00:00:01 0800 42: 10.0.0.5 
> 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1]

that's all on multicast communication. Above was on a i386 machine, on 
another i386 machine the same happens. one with a rl0, one with a fxp0 card.

Then I started a second node on a sparc64, tcpdump sees this:
# tcpdump -n -i hme0 multicast
tcpdump: listening on hme0, link-type EN10MB
10:40:09.218991 10.0.0.24 > 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1]
Bus error

Nevertheless, despite of the some outgoing multicast packets, the cluster 
nodes do not see each other.

I found this part of the heartbeat code where the error message comes from:

mcast_write(struct hb_media* hbm, void *pkt, int len)
{
        struct mcast_private *  mcp;
        int                     rc;

        MCASTASSERT(hbm);
        mcp = (struct mcast_private *) hbm->pd;

        if ((rc=sendto(mcp->wsocket, pkt, len, 0
        ,       (struct sockaddr *)&mcp->addr
        ,       sizeof(struct sockaddr))) != len) {
                PILCallLog(LOG, PIL_CRIT, "Unable to send mcast packet 
[%d]: %s"
                ,       rc, strerror(errno));
                return(HA_FAIL);
        }


does anybody has an idea what the problem here could be? The same compiled 
on Linux works well. Maybe anyone else porting a multicast based application 
had to fiddle around with similar problems?

any idea is greatly appreciated.
Sebastian

Reply via email to