On Sun, Aug 24, 2014 at 04:51:09PM +0300, Matan Barak wrote: > Do you refer to flowinfo? > If so, it isn't initialized, but should be. > 0 should be good enough, so I'll memset the whole struct.
K > >What does this comment refer to? MACs should never be parsed out of > >IPv6 addresses. > > GID0 should always be valid, even if no IPv4 and IPv6 are configured. > IPv6 link-local and multicast addresses should be parsed from the > IPv6 address. Might want to just clarify this is dealing with a GID, not a IPv6 address - they are not interchangable terms. I'm not sure how you are getting a GID from a neigh though? > >>+ timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC); > >>+ if (-1 == timer_fd) { > >>+ print_err("Couldn't create timer\n"); > >>+ return timer_fd; > >>+ } > > > >The use of timerfd will impact the minimum OS version, have you > >checked this is OK? Does RHEL5 still work? > It was added in linux v2.6.25. I think that an API that's more than > 6.5 years old is valid. RHEL5 is using 2.6.18 as their base kernel. You should at least consult with the OFED people to determine if this is a problem for them. > > >>+ while (1) { > >>+ FD_ZERO(&fdset); > >>+ FD_SET(fd, &fdset); > >>+ FD_SET(timer_fd, &fdset); > >>+ > >>+ /* wait for an incoming message on the netlink socket */ > >>+ ret = select(nfds, &fdset, NULL, NULL, NULL); > > > >poll is a better choice here, it would also be fairly simple to remove > >timerfd by using the timeout arg. > > The purpose is to have a timeout on the whole process and not per loop. Yes, I know - it is not too hard to use poll to do that, you need to call clock_gettime(CLOCK_MONOTONIC) to compute the relative timeout. > >>+ if (sendto(sock_fd, buff, sizeof(buff), > >>+ 0, &addr_dst.sktaddr.s, > >>+ addr_dst.len) < 0) > >>+ print_err("Failed to send " > >>+ "packet while waiting" > >>+ " for > >>events\n"); > > > >If the earlier sendto needs to be protected by a loop looking at > >EADDRNOTAVAIL then so does this one. > > > > If the sendto failed, we would just retry it in the next loop. Except the error handing flow is a bit different depending on EADDRNOTAVAIL. > >I'm also not sure all this looping and complexity is needed... Why > >override the kernel timers for ND? > > > >I would think you'd check the ND table, if there is no good entry then > >do the send, and then watch the ND table for a FAILED/REACHABLE > >result. The kernel timers will always transition through INCOMPLETE to > >one of those terminal states. > > > >No need for more timers and retry and things, the kernel already retried. > > Watching just for the ND event is problematic. > If the user sent a packet just after we looked at the ND and before > we waited for an event, we would probably just fail without finding > any neighbor. Hmm, I don't think there is a race like that. You start listening for ND updates, send your packet, and wait for a transition into FAILED or REACHABLE. Whenever you send a packet a FAILED entry will transition back to INCOMPLETE. If someone else already has sent a packet then you start in the INCOMPLETE state, and are still waiting to transition to FAILED or REACHABLE. > Regarding the retries, because we use UD and switches could > sometimes take a few seconds to learn the network, I chose to do a > few reties before giving up. The kernel already has retires and timers for ND. > >>+ > >>+ last_status = __sync_fetch_and_or( > >>+ &neigh_handler->neigh_status, > >>+ GET_NEIGH_STATUS_IN_PROCESS); > > > >Is there a thread running around in here someplace? > > > >Callina raw gcc intrinsic like this should really be avoided, > >especially since this isn't a performance path. Use pthreads. > > > > The current implementation is synchronous. > For the usage of the neigh.c entry function, those guards could be deleted. K Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html