squid hello write test failed

2008-04-23 Thread Tobias Ernst

Dear All

This is a amd64 box with FreeBSD 6.3. So far it is only acting as a 
firewall (with PF). Yesterday I installed squid via ports with a pretty 
vanilla configuration. I.e. no neighbour caches, just to be used as a 
standalone cache for users from the inside net. No interception caching 
(yet). Squid was not yet put under heavy load - in fact I am so far the 
only person using it.


Everything worked fine yesterday. However, squid died after
squid -k rotate was executed by cron over night. Here is what it came 
up with after (successful) log rotation:


2008/04/23 04:20:00| storeDirWriteCleanLogs: Starting...
2008/04/23 04:20:00|   Finished.  Wrote 1706 entries.
2008/04/23 04:20:00|   Took 0.0 seconds (1714572.9 entries/sec).
2008/04/23 04:20:00| aioSync: flushing pending I/O operations
2008/04/23 04:20:00| aioSync: done
2008/04/23 04:20:00| logfileRotate: /usr/local/squid/logs/access.log
2008/04/23 04:20:00| sendto FD 12: (1) Operation not permitted
2008/04/23 04:20:00| ipcCreate: CHILD: hello write test failed

Squid was running and accepting connections on port 3128, but they were 
not carried out any longer.


I then killed squid (actually I needed kill -9 to bring it down) and 
made sure no more squid processes are running. But now, every time I try 
to start squid - manually, or via rc.d - I get the same messages as 
above. The FD number varies, but everything else stays the same.


There were no other changes made on the machine in between that I am 
aware of.


What is going on here?

Regards
Tobias

FWIW, here is my config:

cache_log /usr/local/squid/logs/cache.log
cache_access_log /usr/local/squid/logs/access.log
cache_store_log none
connect_timeout 2 minutes
log_fqdn on
cache_effective_user squid
http_port 3128

acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl SSL_ports port 443
acl Safe_ports port 80  # http
acl Safe_ports port 21  # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70  # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT

http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access deny to_localhost

acl inside_net src xxx.xxx.xxx.0/24

http_access allow inside_net
http_access allow localhost
http_access deny all

cache_mgr [EMAIL PROTECTED]

maximum_object_size 32 MB

cache_replacement_policy heap LFUDA
cache_dir aufs /usr/local/squid/cache 32768 32 256

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Booting to root on gmirror with disk failure, is it even possible?

2007-09-05 Thread Tobias Ernst
Modulok schrieb:

 Before I invest significantly more time into my current gmirror
 issues, I have but two simple questions for anyone out there:
 
 1. Has anyone used gmirror for the root partition and been able to
 successfully boot with one failed (or un-plugged) disk? It's the
 latter part of the question that is the real issue for me. I'm just
 looking for a confirmed it's possible.

Yes, it is possible. IBM xSeries 346, FreeBSD 6.2-RELEASE, amd64. U360
hard drives. More specs are available from IBM. Using gmirror because we
only have an Adaptec HostRAID (aka FakeRAID) controller and not a
real ServerRaid, i.e. our SCSI controller basically has no useful RAID
capabilities built in.

My test case is to unplug any one disk while the system is running.
(Don't do this with your system unless your hardware is specified for
hot plugging!). FreeBSD detects a bus reset, marks the gmirror as
degraded and continues operating normally, and I can also reboot the
degraded gmirror without any problems.

The more conservative test case is to power down the system, unplug any
one disk, and restart the system. No problems with that either.

In fact, the absolutely robust behaviour of gmirror was one of my key
arguments for switching from Linux to FreeBSD :-).

Of course there are a zillion ways to fail your hard disk, and there
could be cases where one hard disk might start behaving erratically, and
gmirror might not be able to detect all such cases and might try to
continue using the failed disk. This could theoretically lead to some
nasty data integrity issues in the worst case. But this is true for any
RAID, even when implemented in hardware IMO.

Regards
Tobias

-- 
Universität Stuttgart|Fakultät für Architektur und Stadtplanung|casinoIT
70174 Stuttgart Geschwister-Scholl-Straße 24D
T +49 (0)711 121-4228 F +49 (0)711 121-4276
E [EMAIL PROTECTED]  I http://www.casino.uni-stuttgart.de
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: strange arp problem with bge nics

2007-09-01 Thread Tobias Ernst
Nikos Vassiliadis schrieb:

 I don't think this is an auto negotiation issue. How can a Windows
 machine that is connected to the same switch as my two FreeBSD machines
 and does not even talk to them explicitly influence the autonegotation
 of the FreeBSD NIC? 

 I didn't say that a Windows machine can influence adversely a FreeBSD
 machine. 

In my case, to the contrary, a Windows machine does positively influence
the FreeBSD machines. Look:

- One switch connected to bge3 on both FreeBSD machines, no
  other connections.

- The machines cannot ping each other.

- Hook up a Windows machine that basically does nothing at all to
  another port of the switch.

- As a result, the machines can now ping each other.

- Disconnect the Windows machine.

- The machines continue working normally.

 (Symptom is that the NIC reports the link as up (PCS synched) but
 no traffic can be exchanged.)
 This message is from revision 1.71 of the bge driver. In short I
 would really try what's recommended there.

Well, that bug in revision 1.71 was discussed somewhere in the 4.x
branch and a patch was submitted to current at the time. So I would
guess that it is already included in 6.2!?

 hm, what happens if you disable ARP?
 ifconfig intX -arp
 and use static ARP?

Point taken, this does not fix it.

On the other hand, forcing the link speed likewise does not fix the
problem, so I don't think it is an auto negotiation problem, either.

In the meantime, I have found out that the affected interfaces show
similar problems on Linux (Debian Etch). I'm starting to get the
impression that this is a hardware issue.

There was a bug reported for the BIOS of this xSeries 346 that leads to
PCI configuration errors resulting in SCO Unixware not picking up
network connections. I flashed the updated BIOS, but to no avail.

Thanks  Regards
Tobias

-- 
Universität Stuttgart|Fakultät für Architektur und Stadtplanung|casinoIT
70174 Stuttgart Geschwister-Scholl-Straße 24D
T +49 (0)711 121-4228 F +49 (0)711 121-4276
E [EMAIL PROTECTED]  I http://www.casino.uni-stuttgart.de
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


strange arp problem with bge nics

2007-08-31 Thread Tobias Ernst
Dear all,

I've got two xSeries 346 servers here with a total of 6 Broadcom gigabit
NIC's each. I'm going to build a firewall with them, but right now I'm
in an early testing stage. The OS is FreeBSD 6.2-RELEASE for amd64.

Each of the machines is currently configured to have an IP from our
internal LAN on bge0. I use that link to ssh into the machines for
testing purposes. (This is a temporary solution, of course). Both
machines have their bge0 connected to our primary switch, where dozens
of other computers are connected as well. Networking works normally here.

Each machine also has got an IP address from a different network on the
respective bge5 interface. The bge5 interfaces are connected to a switch
having no other connections, i.e. this is a two machine network for
testing purposes.

My problem is I can ping machine #2 from machine #1 when using the IP
addresses configured on the bge1 NICs. I cannot ping the other machine
when using the IP addresses configured on the bge5 NICs as ARP entries
remain incomplete. I can then configure bge5 to promiscous mode on one
machine, and after about 10 seconds the ping starts working.


Here's what ipconfig and netstat -nr say right after booting:

Machine #1:

bge0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=1bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING
inet XX.XX.159.253 netmask 0xfe00 broadcast XX.XX.159.255
ether 00:14:5e:ac:71:c9

bge5: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=1bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING
inet XX.XX.248.158 netmask 0xff00 broadcast XX.XX.248.255
ether 00:10:18:11:72:40

Destination   GatewayFlagsRefs  Use  Netif
default   141.58.159.254 UGS 00   bge0
127.0.0.1 127.0.0.1  UH  00lo0
XX.XX.158/23  link#1 UC  00   bge0
XX.XX.158.1   00:17:f2:93:01:30  UHLW13   bge0
XX.XX.159.254 00:04:76:19:03:de  UHLW20   bge0
XX.XX.248/24  link#6 UC  00   bge5

Machine #2:

bge0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=1bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING
inet XX.XX.159.252 netmask 0xfe00 broadcast XX.XX.159.255
ether 00:14:5e:b4:2e:82

bge5: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=1bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING
inet XX.XX.248.254 netmask 0xff00 broadcast XX.XX.248.255
ether 00:10:18:11:6f:45

Destination   GatewayFlagsRefs  Use  Netif
default   XX.XX.159.254  UGS 00   bge0
127.0.0.1 127.0.0.1  UH  00lo0
XX.XX.158/23  link#1 UC  00   bge0
XX.XX.158.1   00:17:f2:93:01:30  UHLW1   14   bge0
XX.XX.159.254 00:04:76:19:03:de  UHLW20   bge0
XX.XX.248/24  link#6 UC  00   bge5

Now, if I ping XX.XX.248.254 from machine #1, I get Sendto: Host is
down. The ARP table looks like this:

x.de (XX.XX.248.254) at (incomplete) on bge5 [ethernet]

This goes on indefinitely. I can then do ifconfig bge5 promisc on ANY
of the two machines (e.g. I can even do it on machine #2, or I can do it
on machine #1!) and about 10 seconds later, the ARP table on machine #1
gets completed and from then on, the network connection will work
normally, even if I do ifconfig bge5 -promisc after that. I can even
delete the arp table entries on both machines, but they will be
reinstated as soon as I issue the next ping. I need to reboot to trigger
the strange behaviour again.

I have already tried to use a different switch and have also tried using
a crosslink cable. Both show the same behaviour.

This is a vanilla install of 6.2-RELEASE. No firewalling of any sort is
enabled yet. The only thing I did is add option BRIDGE to the kernel
config on machine #1 and build a custom kernel (i.e. my kernel config on
machine #1 only differs from GENERIC in that one line. Machine #2 still
has the binary kernel from CD.)

Am I overlooking something or is this a bug? What should I do next? I am
not going to run the machines in the particular configuration described
above, but I am now worried that there might be a bug in the bge
driver and that I should not put these machines in production at all, at
least not with FreeBSD.

Regards
Tobias

-- 
Universität Stuttgart|Fakultät für Architektur und Stadtplanung|casinoIT
70174 Stuttgart Geschwister-Scholl-Straße 24D
T +49 (0)711 121-4228 F +49 (0)711 121-4276
E [EMAIL PROTECTED]  I http://www.casino.uni-stuttgart.de
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: strange arp problem with bge nics

2007-08-31 Thread Tobias Ernst
Hi,

I have further news on this problem. It really seems to be a
driver/hardware issue.

As I said, the two servers have 6 NICs each. These are:

bge0, bge1: BCM5750, integrated on the motherboard
bge2, bge3: BCM5704, PCIX card
bge4, bge5: BCM5704, PCIX card

I have now greatly simplified the test case: Only connect any two
interfaces with the same number with a crosslink cable or an otherwise
unused switch. Assign two IP addresses from within the same subnet.
E.g., make bge0 on machine #1 10.0.0.1 and bge0 on machine #2 10.0.0.2.
Don't connect anything else.

I can instantly ping the other machine after booting up when using bge0,
bge1 or bge2 on both machines.

I cannot initially ping the other machine when using bge3, bge4 or bge5.
 In this case, I first have to put one of the interfaces into
promiscuous mode, wait for the ping to come through, then disable
promiscuous mode.

Incidentally, the working interfaces all sit on IRQ3, while the other
three sit on IRQ7, IRQ11 and IRQ5, respectively.

Where do I take this from here? I need at least four interfaces working
for the configuration I need to implement. I could do away with the
other two, but four is the minimum I need.

Incidentally, another option to wake up the ping, apart from setting
and unsetting promiscous modem, is to connect any Windows machine to the
same switch. As soon as a Windows machine is present on the switch, the
ping between the two FreeBSD machines works right out from the start.

This looks like a minor issue at first glance, because everything seems
to be normal once the ping is set going, and I could just write a script
that enables promiscuous mode on startup for a certain amount of time,
and there will always be Windows boxes on the network anyway. However, I
am now wary that there might be other hidden bugs or hardware problems,
and I have no use for those in a production machine ...

Best regards
Tobias

-- 
Universität Stuttgart|Fakultät für Architektur und Stadtplanung|casinoIT
70174 Stuttgart Geschwister-Scholl-Straße 24D
T +49 (0)711 121-4228 F +49 (0)711 121-4276
E [EMAIL PROTECTED]  I http://www.casino.uni-stuttgart.de
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: strange arp problem with bge nics

2007-08-31 Thread Tobias Ernst
Nikos Vassiliadis schrieb:
 On Friday 31 August 2007 22:30, I correctly wrote:
 Did you try without forcing a link speed(check ifconfig -m)
 s/without //
 
 anything useful in dmesg?

No, nothing at all in dmesg.

I don't think this is an auto negotiation issue. How can a Windows
machine that is connected to the same switch as my two FreeBSD machines
and does not even talk to them explicitly influence the autonegotation
of the FreeBSD NIC? If the NIC were not properly negotiated, it would
not even see the broadcasts of the Windows machine, I would think.

It must be something with ARP and TCP/IP in connection with that
particular river, I suppose.

The cards properly negotiate whatever the particular switch (tried
several, 100 and 1000) supports and I also tried setting various fixed
rates and duplex settings when using a cross link cable. This does not
change anything.

The interface is live and running, it just does not properly perform ARP
up to the point when I either put the interface in promiscuous mode for
a while or send some Windows broadcasts.

Regards
Tobias

-- 
Universität Stuttgart|Fakultät für Architektur und Stadtplanung|casinoIT
70174 Stuttgart Geschwister-Scholl-Straße 24D
T +49 (0)711 121-4228 F +49 (0)711 121-4276
E [EMAIL PROTECTED]  I http://www.casino.uni-stuttgart.de
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]