Re: 9.2-PRE: switch off that stupid Nakatomi Socrates

2013-09-30 Thread Phil Regnauld
Teske, Devin (Devin.Teske) writes:
  
  Nice, but how does it handle if a Makefile contains a love target?
 
 Right, bmake gives us all the chance to implement love in our own
 unuque way ;D
  ^^

Is that halfway between eunuch and unique love ?

Yes, it's off topic at this point, but heck, I think it's important
to underline that humor doesn't have an absolute metric, and as long
as it doesn't interfere with the functionality or the integrity of
the OS, it has its place :)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 9.2-PRE: switch off that stupid Nakatomi Socrates

2013-09-28 Thread Phil Regnauld
Teske, Devin (Devin.Teske) writes:
 
 If you work seriously on serious issues long enough... you'll become burned-
 out. Let me just come right out and say it...
 
 I coded it.

And thanks, you got me chuckling - nice to see some humor once in a 
while.

To the offended poster: read the last line of tunefs(8) - there's 
probably
many more places you could use serious time looking for deviations from
corporate correctnes.

 And after 8 years of always serious coding on always serious projects has
 made me a dull boy. This little mini-project gave me something to work on that
 lifted my spirits.

Been using it for 20, plan on using it for 20 more. Keep up the good 
work
:)

 Come on...
 
 Let us have some fun every now and then.


http://unix.stackexchange.com/questions/89296/what-does-the-windows-flag-in-the-linux-logo-of-kernel-3-11-mean

 Because when we do have fun... we often find ways of turning that 
 functionality into
 something great (like the ability to use this for a custom boot screen in a 
 fork or distro).

Or for kiosk setups where the machine displays something informative
on boot.

Phil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Musings on ZFS Backup strategies

2013-03-02 Thread Phil Regnauld
Karl Denninger (karl) writes:
 
 I think I'm going to play with this and see what I think of it.  One
 thing that is very attractive to this design is to have the receiving
 side be a mirror, then to rotate to the vault copy run a scrub (to
 insure that both members are consistent at a checksum level), break the
 mirror and put one in the vault, replacing it with the drive coming FROM
 the vault, then do a zpool replace and allow it to resilver into the
 other drive.  You now have the two in consistent state again locally if
 the pool pukes and one in the vault in the event of a fire or other
 entire facility is toast event.

That's one solution.

 The only risk that makes me uncomfortable doing this is that the pool is
 always active when the system is running.  With UFS backup disks it's
 not -- except when being actually written to they're unmounted, and this
 materially decreases the risk of an insane adapter scribbling the
 drives, since there is no I/O at all going to them unless mounted. 
 While the backup pool would be nominally idle it is probably
 more-exposed to a potential scribble than the UFS-mounted packs would be.

Could zpool export in between syncs on the target, assuming that's not
your root pool :)

Cheers,
Phil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Why Are You NOT Using FreeBSD ?

2012-06-01 Thread Phil Regnauld
Daniel Kalchev (daniel) writes:
 
 It will sure be interesting to learn what people avoid to use FreeBSD for.

* full virtualization

I am using VirtualBox in production with HAST + ZVOLs, but we need
something like DRBD's dual master mode to be able to do a teleport of 
the
instance like Ganeti does (http://code.google.com/p/ganeti/) with Linux

Getting Xen dom0 and/or KVM would be a major boost as a virtualization
platform, in particular with ZFS.

* Gluster

For very large FSes, nothing beats it, especially now that 3.3 has been
released.

Mind you, I've been using FreeBSD for about 19 years, so I'm not about
to change, but the two items above would go a long way to help FreeBSD
grow in the data center space again (what the kids call the cloud :)

Cheers,
Phil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Why Are You NOT Using FreeBSD ?

2012-06-01 Thread Phil Regnauld
David Magda (dmagda) writes:
 On Jun 1, 2012, at 09:12, Phil Regnauld wrote:
 
  * Gluster
  
  For very large FSes, nothing beats it, especially now that 3.3 has been
  released.
 
 Isilon built their OneFS on top of FreeBSD, does that count? :)
 
 Panasas too IIRC.

Good pointers, thanks. It's still appliance, but good to know that
FreeBSD is out there :)

Phil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: kvm virtio performance

2012-05-01 Thread Phil Regnauld
Bane Ivosev (bane.ivosev) writes:
 hi, anyone test freebsd as guest on kvm with virtio drivers? any expirience?

http://forums.freebsd.org/archive/index.php/t-28916.html

Cheers,
Phil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Issue with hast replication

2012-03-17 Thread Phil Regnauld
Mikolaj Golub (to.my.trociny) writes:
 
 I just tried to reproduce this and failed. For me a new recource was added
 without problems on reload.
 
 Mar 17 20:04:24 kopusha hastd[52678]: Reloading configuration...
 Mar 17 20:04:24 kopusha hastd[52678]: Keep listening on address 0.0.0.0:7771.
 Mar 17 20:04:24 kopusha hastd[52678]: Resource rtest added.
 Mar 17 20:04:24 kopusha hastd[52678]: Configuration reloaded successfully.
 
 You sent SIGHUP to master process and on both hosts, didn't you?

Nope :-| Duh.

 Could you please provide more details if you still fail to add new resources
 on the fly (configuration, log messages).

I'll look. Right now, I need to try and reproduce the original 
hast-over-
zvol problem.

Thanks,
Phil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Issue with hast replication

2012-03-13 Thread Phil Regnauld
Mikolaj Golub (to.my.trociny) writes:
 
 Ok. So it is send(2). I suppose the network driver could generate the
 error. Did you tell what network adaptor you had?

Not yet.

bce0: HP NC382i DP Multifunction Gigabit Server Adapter (C0) mem 
0xf400-0xf5ff irq 16 at device 0.0 on pci2
bce0: ASIC (0x57092003); Rev (C0); Bus (PCIe x2, 2.5Gbps); B/C (4.6.4); 
Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 1.0.3)

  PR No obvious errors there either, but again what should I look out for 
 ?
 
 I would look at sysctl -a dev.nic statistics and try to find if there is 
 correlation
 between ENOMEM failures and growing of error counters.

0 errors:

dev.bce.0.l2fhdr_error_count: 0
dev.bce.0.stat_emac_tx_stat_dot3statsinternalmactransmiterrors: 0
dev.bce.0.stat_Dot3StatsCarrierSenseErrors: 0
dev.bce.0.stat_Dot3StatsFCSErrors: 0
dev.bce.0.stat_Dot3StatsAlignmentErrors: 0

 Looking at buffer usage from 'netstat -nax' output ran during synchronization
 (on both hosts) could provide useful info where the bottleneck is. top -HS
 output might be useful too.

Good point.

I'll have to attempt to recreate the problem, as the volume has 
replicated
without errors. Typical.

Cheers,
Phil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Issue with hast replication

2012-03-13 Thread Phil Regnauld
Mikolaj Golub (to.my.trociny) writes:
 
 
 What about failed counters like mbuf_alloc_failed_count,
 dma_map_addr_rx_failed_count, dma_map_addr_tx_failed_count?

dev.bce.0.l2fhdr_error_count: 0
dev.bce.0.mbuf_alloc_failed_count: 0
dev.bce.0.mbuf_frag_count: 0
dev.bce.0.dma_map_addr_rx_failed_count: 0
dev.bce.0.dma_map_addr_tx_failed_count: 0
dev.bce.0.unexpected_attention_count: 0
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Issue with hast replication

2012-03-12 Thread Phil Regnauld
Phil Regnauld (regnauld) writes:
 
 7) ktrace on the destination dd:
 
 fstat(0,{ mode=p- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0)
 lseek(0,0x0,SEEK_CUR)ERR#29 'Illegal seek'

[...]

 Illegal seek, eh ? Any clues ?
 
 The boxes are identical (HP DL380 G6), though the RAM config is different.
 
 Summary:
 
 - ssh works fine
 - h1 zvol to h2 zvol over ssh fails
 - h1 zvol to h2 /tmp/x over ssh is fine
 - h2 /dev/zero locally to h2 zvol is fine
 - h2 /tmp/x locally to h2 zvol fails at first, but works afterwards...

A few more data points: dd from a local zvol to a local zvol on either
machine works fine.

Using nc instead of ssh, this time it's the sender nc dying:

ktrace on the sender:

47704 nc   CALL  write(0x3,0x7fff5450,0x800)
47704 nc   RET   write -1 errno 32 Broken pipe
47704 nc   PSIG  SIGPIPE SIG_DFL code=0x10006

truss on the sender:

poll({3/POLLIN 0/POLLIN},2,-1)   = 2 (0x2)
read(3,0x7fff5450,2048)  ERR#54 'Connection reset 
by peer'
close(3) = 0 (0x0)


On tcpdump, I do see the receiver send a FIN when using nc.
When using ssh, the sender is sending the FIN.

Anything else I can look for ?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Issue with hast replication

2012-03-12 Thread Phil Regnauld
Mikolaj Golub (to.my.trociny) writes:
 
 It looks like in the case of hastd this was send(2) who returned ENOMEM, but
 it would be good to check. Could you please start synchronization again,
 ktrace primary worker process when ENOMEM errors are observed and show output
 here?

Ok, took a little while, as running ktrace on the hastd does slow it down
significantly, and the error normally occurs at 30-90 sec intervals.

   0x0f90 b2f3 3ad5 e657 7f0f 3e50 698f 5deb 12af  |..:..W..Pi.]...|
   0x0fa0 740d c343 6e80 75f3 e1a7 bfdf a4c1 f6a6  |t..Cn.u.|
   0x0fb0 ea85 655d e423 bd5e 42f7 7e9a 05d2 363a  |..e].#.^B.~...6:|
   0x0fc0 025e a7b5 0956 417c f31c a6eb 2cd9 d073  |.^...VA|,..s|
   0x0fd0 2589 e8c0 d76a 889f 8345 eeaf f2a0 c2d6  |%j...E..|
   0x0fe0 b89e aaef fee2 6593 e515 7271 88aa cf66  |..e...rq...f|
   0x0ff0 d272 411a 7289 d6c9 6643 bdbe 3c8c 8ae8  |.rA.r...fC.|
 50959 hastdRET   sendto 32768/0x8000
 50959 hastdCALL  sendto(0x6,0x8024bf000,0x8000,0x2MSG_NOSIGNAL,0,0)
 50959 hastdRET   sendto -1 errno 12 Cannot allocate memory
 50959 hastdCALL  clock_gettime(0xd,0x7f3f86f0)
 50959 hastdRET   clock_gettime 0
 50959 hastdCALL  getpid
 50959 hastdRET   getpid 50959/0xc70f
 50959 hastdCALL  sendto(0x3,0x7f3f8780,0x84,0,0,0)
 50959 hastdGIO   fd 3 wrote 132 bytes
   27Mar 12 23:42:43 hastd[50959]: [hvol] (primary) Unable to sen\
d request (Cannot allocate memory): WRITE(8626634752, 131072).  
 50959 hastdRET   sendto 132/0x84
 50959 hastdCALL  close(0x7)
 50959 hastdRET   close 0

 If it is send(2) who fails then monitoring netstat and network driver
 statistics might be helpful. Something like
 
 netstat -nax
 netstat -naT
 netstat -m
 netstat -nid

I could run this in a loop, but that would be a lot of data, and might
not be appropriate to paste here.

I didn't see any obvious errors, but I'm not sure what I'm looking for.
netstat -m didn't show anything close to running out of buffers or
clusters...

 sysctl -a dev.nic

 And may be
 
 vmstat -m
 vmstat -z

No obvious errors there either, but again what should I look out for ?

In the meantime, I've also experimented with a few different scenarios, and
I'm quite puzzled.

For instance, I configured one of the other gigabit cards on each host to
provide a dedicated replication network. The main difference is that up
until now this has been running using tagged vlans. To be on the safe side,
I decided to use an untagged interface (the second gigabit adapter in each
machine).

Here's where I observed, and it is very odd:

- doing a dd ... | ssh dd fails in the same fashion as before

- I created a second zvol + hast resource of just 1 GB, and it replicated
  without any problems, peaking at 75 MB / sec (!) - maybe 1GB is too small
  ?

  (side note: hastd doesn't pick up configuration changes even with SIGHUP,
   which makes it hard to provision new resources on the fly) 

- I restarted replication on the 100 G hast resource, and it's currently
  replicating without any problems over the second ethernet, but it's
  dragging along at 9-10 MB/sec, peaking at 29 MB/sec occasionally.

  Earlier, I was observing peaks at 65-70 MB sec in between failures...

So I don't really know what to conclude :-| 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Issue with hast replication

2012-03-11 Thread Phil Regnauld
Hi,

I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to stable
if told to, but want to check here first), ZFS and HAST. HAST is configured to
run on top of zvols configured on each host, as illustrated:

  FS  FS
   +--++--+ 
   | hvol |  hastd - | hvol | 
   +--++--+ 
   | zvol || zvol | 
   +--++--+ 
   | zfs  || zfs  | 
   +--++--+ 
  h1  h2

Connection is gigabit to the same switch. No issues with large TCP
transfers such as SCP/FTP.

Config is vanilla:

# zfs create -V 10G zfs/hvol

hast.conf:

resource hvol {
on h1 {
local /dev/zvol/zfs/hvol
remote tcp4://192.168.1.100
}
on h2 {
local /dev/zvol/zfs/hvol
remote tcp4://192.168.1.200
}
}


h1 is behaving fine as primary, either with h2 turned off or in init -
but as soon as I set the role to secondary for h2, the receiver
repeatedly crashes and restarts - see the traces below.

I've seen 

http://lists.freebsd.org/pipermail/freebsd-current/2011-May/024871.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2012-01/msg00510.html

... but in the first case the fix is in 9 since last year, and the second
is referring to async replication - I'm using the default (fullsync).

hastctl status on the primary shows the dirty size diminishing slowly,
but obviously this isn't optimal (and causes freezes on I/O to the primary
hvol, causing all kinds of issues with the consumers of the hvol).

Any idea ? Am I doing something wrong ?


Primary:

Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31642091520, 131072).
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31649693696, 131072).
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31691243520, 131072).
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31783256064, 131072).
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31782731776, 131072).
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31803441152, 131072).
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot 
allocate memory): WRITE(31881953280, 131072).
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Disconnected from 
tcp4://192.168.1.200.
Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to write 
synchronization data: Cannot allocate memory.


Secondary:

Mar 11 01:01:30 h2 hastd[2506]: [hvol] (secondary) Worker process exited 
ungracefully (pid=2874, exitcode=75).
Mar 11 01:01:38 h2 hastd[2875]: [hvol] (secondary) Unable to receive request 
header: Socket is not connected.
Mar 11 01:01:44 h2 hastd[2506]: [hvol] (secondary) Worker process exited 
ungracefully (pid=2875, exitcode=75).
Mar 11 01:01:45 h2 hastd[2876]: [hvol] (secondary) Unable to receive request 
header: Socket is not connected.
Mar 11 01:01:50 h2 hastd[2506]: [hvol] (secondary) Worker process exited 
ungracefully (pid=2876, exitcode=75).
Mar 11 01:01:56 h2 hastd[2877]: [hvol] (secondary) Unable to receive request 
header: Socket is not connected.
Mar 11 01:02:01 h2 hastd[2506]: [hvol] (secondary) Worker 

Re: Issue with hast replication

2012-03-11 Thread Phil Regnauld
Mikolaj Golub (trociny) writes:
 
 
  PR Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from 
 tcp4://192.168.1.200.
  PR Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write 
 synchronization data: Cannot allocate memory.
  PR Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request 
 (Cannot allocate memory): WRITE(31642091520, 131072).
 
 31642091520 looks like rather large offset for 10Gb volume...

Sorry, that should have been 100G - I typed from memory instead of 
copy-pasting.

 Just to be more confident that this is a HAST issue could you please try the
 following experiment?
 
 1) Stop hastd on h2.
 
 2) On h1 run something like below:
 
   dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 
 of=/dev/zvol/zfs/hvol
 
 (copy hvol from h1 to h2 without hastd to see if it will succeed).
 
 Note: you will need to recreate HAST provider on secondary after this.

Ok this is interesting.

(For debugging purposes I've renamed the target zvol as junk, you'll see
why below).

1) As you suggested:

h1# dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 
of=/dev/zvol/zfs/junk
dd: /dev/zvol/zfs/junk: Invalid argument
0+6 records in
0+5 records out
131072 bytes transferred in 0.002344 secs (55920640 bytes/sec)

To be certain which dd was complaining, I renamed the target zvol.

2) Tried repeatedly, sometimes the number of bytes is a bit different:

0+7 records in
0+6 records out
147456 bytes transferred in 0.002448 secs (60233277 bytes/sec)

And yes, hastd is stopped on h2.

3) I tried dd'ing zero to the zvol locally on h2:

h2# dd if=/dev/zero of=/dev/zvol/zfs/junk bs=131072
^C1817+0 records in
1816+0 records out
238026752 bytes transferred in 1.582006 secs (150458820 bytes/sec)

That works, until I ^C it.

4) I tried redirecting the output of the dd | ssh to a file on the h2 side:

h1# dd if=/dev/zvol/zfs/hvol bs=131072 | ssh h2 dd bs=131072 of=/tmp/x
^C653+0 records in
652+0 records out
85458944 bytes transferred in 2.408074 secs (35488506 bytes/sec)

That works too, until I ^C it.

5) Things get even weirder - if I then go over to h2 and dd the
/tmp/x test file over to the zvol:

h2# dd if=x bs=131072 of=/dev/zvol/zfs/junk 
dd: /dev/zvol/zfs/junk: Invalid argument
652+1 records in
652+0 records out
85458944 bytes transferred in 0.444571 secs (192227879 bytes/sec)

Note that the file /tmp/x is 86917120 bytes long.

6) I try to copy more data into /tmp/x - it's now 291946496 (~280 MB)

h2# dd if=x bs=131072 of=/dev/zvol/zfs/junk
2227+1 records in
2227+1 records out
291946496 bytes transferred in 3.564129 secs (81912441 bytes/sec)

No more invalid argument...

7) ktrace on the destination dd:

[...]
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0
  5807 dd   RET   read 17992/0x4648
  5807 dd   CALL  write(0x3,0x800c09000,0x4648)
  5807 dd   RET   write -1 errno 22 Invalid argument
  5807 dd   CALL  write(0x2,0x7fffd300,0x4)
  5807 dd   GIO   fd 2 wrote 4 bytes
 dd: 
  5807 dd   RET   write 4
  5807 dd   CALL  write(0x2,0x7fffd3e0,0x12)
  5807 dd   GIO   fd 2 wrote 18 bytes
   /dev/zvol/zfs/junk

truss is a bit more informative:

fstat(0,{ mode=p- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0)
lseek(0,0x0,SEEK_CUR)ERR#29 'Illegal seek'

Illegal seek, eh ? Any clues ?

The boxes are identical (HP DL380 G6), though the RAM config is different.

Summary:

- ssh works fine
- h1 zvol to h2 zvol over ssh fails
- h1 zvol to h2 /tmp/x over ssh is fine
- h2 /dev/zero locally to h2 zvol is fine
- h2 /tmp/x locally to h2 zvol fails at first, but works afterwards...


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: gmirror(8) and graid3(8) changes.

2006-03-07 Thread Phil Regnauld
On Mon, Mar 06, 2006 at 11:28:44PM +0100, Pawel Jakub Dawidek wrote:
 Hi.
 
 Here you can find patches with changes to gmirror(8) and graid3(8):
 
   http://people.freebsd.org/~pjd/patches/gmirror.7.patch
   http://people.freebsd.org/~pjd/patches/graid3.patch

Hi Pawel,

I've been experiencing lockups with gmirror, ATA/SATA on both
i386 and amd64, under severe I/O (very heavily loaded Postgres DB). 
This has been on several different machines (remotely located, with
no possibility of breaking into the debugger).

Do you think these patches are worth testing in my case ?

Phil

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: gmirror(8) and graid3(8) changes.

2006-03-07 Thread Phil Regnauld
On Tue, Mar 07, 2006 at 04:16:36PM +0100, Pawel Jakub Dawidek wrote:
 + 
 +I've been experiencing lockups with gmirror, ATA/SATA on both
 +i386 and amd64, under severe I/O (very heavily loaded Postgres DB). 
 +This has been on several different machines (remotely located, with
 +no possibility of breaking into the debugger).
 + 
 +Do you think these patches are worth testing in my case ?
 
 Are you sure it was gmirror's fault?

It doesn't happen if I use either underlying raw disk without
gmirror.

 It will be quite hard for gmirror
 to hang machine so badly that we are not able to enter ddb...

No, I meant, I couldn't access the box's DDB remotely as it was
in a hosting center without access to a physical console, and I had
to have someone restart it.

 Anyway, I'd prefer to test those patches not in production environment
 yet.

It is a test machine, so I don't mind, was just curious if this could
be related.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]