Holger Parplies wrote:
>> But lots of other people including myself run rsync without errors so it
>> has to be something unique to your situation.
>
> well, no. You don't rule out bugs by "it works for me", not even by "it
> works for everyone I know". I'm sure you know that.
Anything is possible I suppose, but if I know something works for
everyone else I move it to the bottom of the list of things to test.
> We don't know much about the "lots of other people", do we? We know there
> have been no further *reports* of it on this list, but I don't remember
> hundreds of people reporting success with rsync on RHEL4 either. You might
> know about other lists, I don't.
I know enough about mailing lists to expect a ton of matches on a google
search for the 'no route to host' problem but I don't see much relating
to a local LAN or RHEL there. And I'm sure I'd have seen mentions on
the Centos or fedora lists if it affected those very similar kernels.
>> Maybe cables from a different vendor would help.
>
> I doubt it, because other applications are doing well. It doesn't seem to be
> hardware related to me. I suspect the kernel on the host side (backup client)
> or its configuration. Of course, it may be hardware specific in that
> different hardware does not trigger whatever is happening (and that could
> include the switch, maybe, perhaps), but the cables? It's not the hardware
> where I would start looking, especially after Tim *has* tested quite a lot
> of different setups.
TCP retries can cover a lot of errors. A bad crimp on a patch cable or
an extra half-inch untwisted on the wall punch-downs can cause exactly
this sort of thing.
> It could be stupid things like arp poisoning, a misbehaving machine on the
> local network or whatever. Remains the question what communication
> characteristics rsync has and SMB doesn't (hmm, SMB is UDP, isn't it?) that
> make the problem appear.
Arp poisoning is possible - maybe just someone plugging/unplugging a
machine with the same IP somewhere else. A badly configured NAT gateway
on the network doing proxy-arp the wrong direction could do it. If
there are multiple interconnected switches involved it could even be a
loop with a spanning-tree problem.
> Tim sent me his /etc/sysctl.conf off-list, and I find it harmless (that
> refered to "kernel configuration" before I added the previous paragraph). As I
> understand him, he's about to try out different kernels (2.4.x ?), now that
> he has a test setup available. Swapping kernels is *not* something I'd happily
> do without further thought on a production server either, and I'm sure you
> agree.
If the machine can be taken down for testing, I'd boot a knoppix or
ubuntu CD instead of installing something different.
> May I summarize a few points I believe we all agree on?
>
> 1.) It's a client side problem, i.e. the backed up client seems to be the
> cause, not the BackupPC server machine.
I'm guessing a network problem with the only likely software connection
being the NIC driver.
> 2.) It is thus not a BackupPC problem. On the client only stock RHEL4
> software is in use (on the test setup anyway).
> 3.) It is still on-topic in that it happens using BackupPC and only then.
> Other users of BackupPC may run into similar problems and be glad to
> find a solution in the archives once we find one.
> 4.) It's an obscure and unnerving problem. There are many things to try out,
> nothing obvious springing to mind, and each of us has different thoughts
> on what to try in which order :).
>
> My bet stays the kernel. Craig has a point with the isolated network. Either
> one might fix it, without leading to a definitive diagnose. Running on an
> isolated network as a workaround is not an option :-), but it's the easier
> thing to try out, and *reproducing* the problem on an isolated network would
> rule out quite a lot of causes.
I think the kernel is the least likely thing to be involved. My test
procedure would be to build a 'known working' pair of machines, perhaps
as simple as booting 2 boxes with knoppix or ubuntu with a crossover
cable using ip addresses between them. Once you get a set that can
rsync without errors (which can't be that hard - it works for everyone
else), start introducing the cable/switch/destination where you've seen
the problem, one piece at a time. Intel 100M NICs would probably be a
safe bet for ruling out obscure driver issues. If the switches are
managed, I'd see what they say about interface errors and that you don't
have a duplex mismatch on the connections. A tcpdump of broadcasts to
see the arp traffic might show something interesting.
--
Les Mikesell
[EMAIL PROTECTED]
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
BackupPC-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/