date:20120217

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

2012-02-17 Thread David Xu


On 2012/2/18 9:30, Julian Elischer wrote:




mine is too, yet it still has problems..
CPU: Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz (2500.14-MHz 
K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x10676  Family = 6  Model = 17  
Stepping = 6
  
Features=0xbfebfbff
  
Features2=0xce3bd

  AMD Features=0x20100800
  AMD Features2=0x1
  TSC: P-state invariant, performance statistics
real memory  = 8589934592 (8192 MB)
avail memory = 8214368256 (7833 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard


Attached file is a small patch, don't know if it works for you, I can 
only find this at

the moment.

Index: src/lib/libthr/thread/thr_umtx.c
===
--- src/lib/libthr/thread/thr_umtx.c(revision 231637)
+++ src/lib/libthr/thread/thr_umtx.c(working copy)
@@ -205,7 +205,7 @@
if (abstime != NULL) {
clock_gettime(clockid, &ts);
TIMESPEC_SUB(&ts2, abstime, &ts);
-   if (ts2.tv_sec < 0 || ts2.tv_nsec <= 0)
+   if (ts2.tv_sec < 0 || (ts2.tv_sec == 0 && ts2.tv_nsec <= 0))
return (ETIMEDOUT);
tsp = &ts2;
} else {
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kerberized NFS

2012-02-17 Thread Rick Macklem

Giulio Ferro wrote:
> Thanks everybody again for your help with setting up a working
> kerberized nfsv4 system.
> 
> I was able to user-mount a nfsv4 share with krb5 security, and I was
> trying to do the same as root.
> 
> Unfortunately the patch I found here:
> http://people.freebsd.org/~rmacklem/rpcsec_gss.patch
> 
> fails to apply cleanly on a 9 stable system.
> 
There is now a patch called:
  http://people.freebsd.org/~rmacklem/rpcsec_gss-9.patch
that should apply to a FreeBSD9 or later kernel.

For the kernel to build after applying the patch, you will
need a kernel config with
options KGSSAPI
in it, since the patch adds a function that can't be called
via one of the XXX_call() functions using the function pointers.

Also, review the section of the wiki where it discusses setting
  vfs.rpcsec.keytab_enctype
because the host based initiator keytab entry won't work unless
it is set correctly.

Good luck with it, rick

> Is there a more recent patch available or some better way to
> automatically
> mount the share at boot time?
> 
> Thanks again.
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

2012-02-17 Thread Julian Elischer


On Friday 17 February 2012 06:28 am, David Xu wrote:

On 2012/2/17 16:06, Julian Elischer wrote:

On 2/16/12 11:41 PM, Julian Elischer wrote:

adding jkim as he seems to be the last person working with TSC.

On 2/16/12 6:42 PM, David Xu wrote:

On 2012/2/17 10:19, Julian Elischer wrote:

On 2/16/12 5:56 PM, David Xu wrote:

On 2012/2/17 8:42, Julian Elischer wrote:

Adding David Xu for his thoughts since he reqrote the code
in quesiton in revision 213098

On 2/16/12 2:57 PM, Julian Elischer wrote:

On 2/16/12 1:06 PM, Julian Elischer wrote:

On 2/16/12 9:34 AM, Andriy Gapon wrote:

on 15/02/2012 23:41 Julian Elischer said the following:

The program fio (an IO test in ports) uses pthreads

the following code (from fio-2.0.3, but its in earlier
code too) has suddenly started misbehaving.

  clock_gettime(CLOCK_REALTIME,&t);
  t.tv_sec += seconds + 10;

  pthread_mutex_lock(&mutex->lock);

  while (!mutex->value&&   !ret) {
  mutex->waiters++;
  ret =
pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
  mutex->waiters--;
  }

  if (!ret) {
  mutex->value--;
  pthread_mutex_unlock(&mutex->lock);
  }


It turns out that 'ret' sometimes comes back instantly
(on my machine) with a
value of 60 (ETIMEDOUT)
despite the fact that we set the timeout 10 seconds into
the future.

Has anyone else seen anything like this?
(and yes the condition variable attribute have been set
to use the REALTIME clock).

But why?

Just a hypothesis that maybe there is some issue with
time keeping on that system.
How would that code work out for you with MONOTONIC?

Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
CLOCK_MONOTONIC, and they both had the same problem..
i.e. random early returns with ETIMEDOUT.

I think we will try move out machine forward to a newer
-stable to see if it resolves.

Kan upgraded the machine today to today's 9.x branch tip
and the problem still occurs.
8.x does not have this problem.

I have not got a 9-RELEASE machine to test on.. so I can
not tell if this came in with the burst of stuff
that came in after the 9.x branch was unfrozen after the
release of 9.0.

I am trying to reproduce the problem,  do you have complete
sample code to test ?

I'm still looking the exact set
but on my machine (4 cpus) the program from ports sysutils/fio
exhibits the problem when used with
kern.timecounter.hardware=TSC-low and with the following
config file:

pu05 # cat config.fio

[global]
#clocksource=cpu
direct=1
rw=randread
bs=4096
fill_device=1
numjobs=16
iodepth=16
#ioengine=posixaio
#ioengine=psync
ioengine=psync
group_reporting
norandommap
time_based
runtime=6
randrepeat=0

[file1]
filename=/dev/ada0

pu05 #
pu05 # fio config.fio
fio: this platform does not support process shared mutexes,
forcing use of threads. Use the 'thread' option to get rid of
this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K,
ioengine=psync, iodepth=16 ...
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync,
iodepth=16 fio 2.0.3
Starting 15 threads and 1 process
fio: job startup hung? exiting.
fio: 5 jobs failed to start
Segmentation fault (core dumped)
pu05#


The reason 5 jobs failed to start is because the parent timed
out on them immediately.
It didn't time out on 10 of them apparently.


if I set the timer to ACPI-fast it works as expected..

maybe following code can check to see if TSC-LOW works by let
the thread run
on each cpu.

gettimeofday(&prev, NULL);
int cpu = 0;
for (;;) {
  cpuset_t set;
  cpu = ++cpu % 4;
  CPU_ZERO(&set);
  CPU_SET(cpu,&set);
  pthread_setaffinity_np(pthread_self(), sizeof(set),&set);
  gettimeofday(&cur, NULL);
  if ( timercmp(&prev,&cur,>=)) {
 abort();
}
}

pu05# sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: ACPI-fast ->  TSC-low
pu05# ./test
^C
pu05# cat test.c

#include
#include
#include
#include

#include

main()
{
 int cpu = 0;
 struct timeval prev, cur;

 gettimeofday(&prev, NULL);
 for (;;) {
  cpuset_t set;
  cpu = ++cpu % 4;
  CPU_ZERO(&set);
  CPU_SET(cpu,&set);
  pthread_setaffinity_np(pthread_self(), sizeof(set),
&set); gettimeofday(&cur, NULL);
  if ( timercmp(&prev,&cur,>)) {
 abort();
}
prev = cur;
 }
}

pu05# ./test

minutes pass...

^C
pu05#

so it looks as if the TSC is working ok..
I'm just going to check that the program is actually moving
CPU... yes it is moving around but I can't tell at what speed.
(according to top).

so we are still left with a question of "where is the problem?"

kernel TSC driver?
generic gettimeofday() code?
pthreads cond code?
the application?

I am running the fio test on my notebook which is using TSC-low,
it is on 9.0-RC3, I can not reproduce the problem for
minutes, then I interrupt it with ctrl-c:

http://people.freebsd.org/~davidxu/tsc_pthread/dmes

Re: ZFS + nullfs + Linuxulator = panic?

2012-02-17 Thread Konstantin Belousov

On Thu, Feb 16, 2012 at 12:07:46PM -0500, Paul Mather wrote:
> On Feb 16, 2012, at 10:49 AM, Konstantin Belousov wrote:
> 
> > On Thu, Feb 16, 2012 at 10:09:27AM -0500, Paul Mather wrote:
> >> On Feb 14, 2012, at 7:47 PM, Konstantin Belousov wrote:
> >> 
> >>> On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:
>  I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, 
>  last built 2012-02-08).  It will panic during the daily periodic scripts 
>  that run at 3am.  Here is the most recent panic message:
>  
>  Fatal trap 9: general protection fault while in kernel mode
>  cpuid = 0; apic id = 00
>  instruction pointer = 0x20:0x8069d266
>  stack pointer   = 0x28:0xff8094b90390
>  frame pointer   = 0x28:0xff8094b903a0
>  code segment= base 0x0, limit 0xf, type 0x1b
>    = DPL 0, pres 1, long 1, def32 0, gran 1
>  processor eflags= resume, IOPL = 0
>  current process = 72566 (ps)
>  trap number = 9
>  panic: general protection fault
>  cpuid = 0
>  KDB: stack backtrace:
>  #0 0x8062cf8e at kdb_backtrace+0x5e
>  #1 0x805facd3 at panic+0x183
>  #2 0x808e6c20 at trap_fatal+0x290
>  #3 0x808e715a at trap+0x10a
>  #4 0x808cec64 at calltrap+0x8
>  #5 0x805ee034 at fill_kinfo_thread+0x54
>  #6 0x805eee76 at fill_kinfo_proc+0x586
>  #7 0x805f22b8 at sysctl_out_proc+0x48
>  #8 0x805f26c8 at sysctl_kern_proc+0x278
>  #9 0x8060473f at sysctl_root+0x14f
>  #10 0x80604a2a at userland_sysctl+0x14a
>  #11 0x80604f1a at __sysctl+0xaa
>  #12 0x808e62d4 at amd64_syscall+0x1f4
>  #13 0x808cef5c at Xfast_syscall+0xfc
> >>> 
> >>> Please look up the line number for the fill_kinfo_thread+0x54.
> >> 
> >> 
> >> Is there a way for me to do this from the above information? As
> >> I said in the original message, I failed to get a crash dump after
> >> reboot (because, it turns out, I hadn't set up my gmirror swap device
> >> properly). Alas, with the latest panic, it appears to have hung[1]
> >> during the "Dumping" phase, so it looks like I won't get a saved crash
> >> dump this time, either. :-(
> > 
> > Load the kernel.debug into kgdb, and from there do
> > "list *fill_kinfo_thread+0x54".
> 
> 
> gromit# kgdb /usr/obj/usr/src/sys/GENERIC/kernel.debug
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> (kgdb) list *fill_kinfo_thread+0x54
> 0x805ee034 is in fill_kinfo_thread 
> (/usr/src/sys/kern/kern_proc.c:854).
> 849 thread_lock(td);
> 850 if (td->td_wmesg != NULL)
> 851 strlcpy(kp->ki_wmesg, td->td_wmesg, 
> sizeof(kp->ki_wmesg));
> 852 else
> 853 bzero(kp->ki_wmesg, sizeof(kp->ki_wmesg));
> 854 strlcpy(kp->ki_ocomm, td->td_name, sizeof(kp->ki_ocomm));
> 855 if (TD_ON_LOCK(td)) {
> 856 kp->ki_kiflag |= KI_LOCKBLOCK;
> 857 strlcpy(kp->ki_lockname, td->td_lockname,
> 858 sizeof(kp->ki_lockname));
> (kgdb) 

This is indeed strange. It can only occur if td pointer is damaged.

Please, try to get a core and at least print the content of *td in this case.


pgp5KWcNFJhq0.pgp
Description: PGP signature

Re: kerberized NFS

2012-02-17 Thread Rick Macklem

Giulio Ferro wrote:
> Thanks everybody again for your help with setting up a working
> kerberized nfsv4 system.
> 
> I was able to user-mount a nfsv4 share with krb5 security, and I was
> trying to do the same as root.
> 
> Unfortunately the patch I found here:
> http://people.freebsd.org/~rmacklem/rpcsec_gss.patch
> 
> fails to apply cleanly on a 9 stable system.
> 
I'll try and generate an updated patch. I guess some commit has
changed the code enough that "patch" gets confused and it's a little
big to do the patch manually. (I'm pretty sure any changes done to
the sys/rpc/rpcsec_gss code hasn't broken the patch, but I have no
way of doing Kerberos testing these days.)

> Is there a more recent patch available or some better way to
> automatically
> mount the share at boot time?
> 
> Thanks again.
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: devd based AUTOMOUNTER

2012-02-17 Thread vermaden

Latest version with additional checks for NTFS and FAT32, to be precise,
for NTFS filesystem with label "FAT" and for FAT filesystem with label "NTFS" ;)

#! /bin/sh

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
LOG="/var/log/automount.log"
STATE="/var/run/automount.state"
DATEFMT="%Y-%m-%d %H:%M:%S"

__create_mount_point() { # /* 1=DEV */
  MNT="/mnt/$( basename ${1} )"
  mkdir -p ${MNT}
}

__state_lock() {
  while [ -f ${STATE}.lock ]; do sleep 0.5; done
  :> ${STATE}.lock
}

__state_unlock() {
  rm ${STATE}.lock
}

__state_add() { # /* 1=DEV 2=PROVIDER 3=MNT */
  __state_lock
  grep -E "${3}" ${STATE} 1> /dev/null 2> /dev/null && {
__log "${1}:duplicated '${STATE}'"
return 1
  }
  echo "${1} ${2} ${3}" >> ${STATE}
  __state_unlock
}

__state_remove() { # /* 1=MNT 2=STATE 3=LINE */
  BSMNT=$( echo ${1} | sed 's/\//\\\//g' )
  sed -i '' "/${BSMNT}\$/d" ${2}
}

__log() { # /* @=MESSAGE */
  echo $( date +"${DATEFMT}" ) ${@} >> ${LOG}
}

case ${2} in
  (attach)
for I in /dev/${1}*
do
  case $( file -L -s ${I} | sed -E 's/label:\ \".*\"//g' ) in
(*NTFS*)
  dd < ${I} count=1 2> /dev/null \
| strings \
| head -1 \
| grep -q "NTFS" && {
  __create_mount_point ${I}
  ntfs-3g ${I} ${MNT} # /* sysutils/fusefs-ntfs */
  __log "${I}:mount (ntfs)"
  }
  ;;
(*FAT*)
  dd < ${I} count=1 2> /dev/null \
| strings \
| grep -q "FAT32" && {
  __create_mount_point ${I}
  fsck_msdosfs -y ${I}
  mount_msdosfs -o large -l -L pl_PL.ISO8859-2 -D cp852 ${I} ${MNT}
  __log "${I}:mount (fat)"
  }
  ;;
(*ext2*)
  __create_mount_point ${I}
  fsck.ext2 -y ${I}
  mount -t ext2fs ${I} ${MNT}
  __log "${I}:mount (ext2)"
  ;;
(*ext3*)
  __create_mount_point ${I}
  fsck.ext3 -y ${I}
  mount -t ext2fs ${I} ${MNT}
  __log "${I}:mount (ext3)"
  ;;
(*ext4*)
  __create_mount_point ${I}
  fsck.ext4 -y ${I}
  ext4fuse ${I} ${MNT} # /* sysutils/fusefs-ext4fuse */
  __log "${I}:mount (ext4)"
  ;;
(*Unix\ Fast\ File*)
  __create_mount_point ${I}
  fsck_ufs -y ${I}
  mount ${I} ${MNT}
  __log "${I}:mount (ufs)"
  ;;
(*)
  case $( dd < ${I} count=1 2> /dev/null | strings | head -1 ) in
(EXFAT)
  __create_mount_point ${I}
  mount.exfat ${I} ${MNT} # /* sysutils/fusefs-exfat */
  __log "${I}:mount (ufs)"
  ;;
(*) continue ;;
  esac
  ;;
  esac
  __state_add ${I} $( mount | grep -m 1 " ${MNT} " | awk '{printf $1}' ) \
${MNT} || continue
done
;;

  (detach)
MOUNT=$( mount )
__state_lock
grep ${1} ${STATE} \
  | while read DEV PROVIDER MNT
do
  TARGET=$( echo "${MOUNT}" | grep -E "^${PROVIDER} " | awk '{print 
$3}' )
  [ -z ${TARGET} ] && {
__state_remove ${MNT} ${STATE} ${LINE}
continue
  }
  umount -f ${TARGET} &
  unset TARGET
  __state_remove ${MNT} ${STATE} ${LINE}
  __log "${DEV}:umount"
done
__state_unlock
__log "/dev/${1}:detach"
;;

esac


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: devd based AUTOMOUNTER

2012-02-17 Thread vermaden

... even newer version, seems to have all 'problems' fixed now ;)

#! /bin/sh

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
LOG="/var/log/automount.log"
STATE="/var/run/automount.state"
DATEFMT="%Y-%m-%d %H:%M:%S"

__create_mount_point() { # /* 1=DEV */
  MNT="/mnt/$( basename ${1} )"
  mkdir -p ${MNT}
}

__state_lock() {
  while [ -f ${STATE}.lock ]; do sleep 0.5; done
  :> ${STATE}.lock
}

__state_unlock() {
  rm ${STATE}.lock
}

__state_add() { # /* 1=DEV 2=PROVIDER 3=MNT */
  __state_lock
  grep -E "${3}" ${STATE} 1> /dev/null 2> /dev/null && {
__log "${1}:duplicated '${STATE}'"
return 1
  }
  echo "${1} ${2} ${3}" >> ${STATE}
  __state_unlock
}

__state_remove() { # /* 1=MNT 2=STATE 3=LINE */
  BSMNT=$( echo ${1} | sed 's/\//\\\//g' )
  sed -i '' "/${BSMNT}\$/d" ${2}
}

__log() { # /* @=MESSAGE */
  echo $( date +"${DATEFMT}" ) ${@} >> ${LOG}
}

case ${2} in
  (attach)
for I in /dev/${1}*
do
  case $( file -L -s ${I} ) in
(*NTFS*)
  __create_mount_point ${I}
  ntfs-3g ${I} ${MNT} # /* sysutils/fusefs-ntfs */
  __log "${I}:mount (ntfs)"
  ;;
(*FAT*)
  __create_mount_point ${I}
  fsck_msdosfs -y ${I}
  mount_msdosfs -o large -l -L pl_PL.ISO8859-2 -D cp852 ${I} ${MNT}
  __log "${I}:mount (fat)"
  ;;
(*ext2*)
  __create_mount_point ${I}
  fsck.ext2 -y ${I}
  mount -t ext2fs ${I} ${MNT}
  __log "${I}:mount (ext2)"
  ;;
(*ext3*)
  __create_mount_point ${I}
  fsck.ext3 -y ${I}
  mount -t ext2fs ${I} ${MNT}
  __log "${I}:mount (ext3)"
  ;;
(*ext4*)
  __create_mount_point ${I}
  fsck.ext4 -y ${I}
  ext4fuse ${I} ${MNT} # /* sysutils/fusefs-ext4fuse */
  __log "${I}:mount (ext4)"
  ;;
(*Unix\ Fast\ File*)
  __create_mount_point ${I}
  fsck_ufs -y ${I}
  mount ${I} ${MNT}
  __log "${I}:mount (ufs)"
  ;;
(*)
  case $( dd < ${O} count=1 | strings | head -1 ) in
(EXFAT)
  __create_mount_point ${I}
  mount.exfat ${I} ${MNT} # /* sysutils/fusefs-exfat */
  __log "${I}:mount (ufs)"
  ;;
(*) continue ;;
  esac
  ;;
  esac
  __state_add ${I} $( mount | grep -m 1 " ${MNT} " | awk '{printf $1}' ) \
${MNT} || continue
done
;;

  (detach)
MOUNT=$( mount )
__state_lock
grep ${1} ${STATE} \
  | while read DEV PROVIDER MNT
do
  TARGET=$( echo "${MOUNT}" | grep -E "^${PROVIDER} " | awk '{print 
$3}' )
  [ -z ${TARGET} ] && {
__state_remove ${MNT} ${STATE} ${LINE}
continue
  }
  umount -f ${TARGET} &
  unset TARGET
  __state_remove ${MNT} ${STATE} ${LINE}
  __log "${DEV}:umount"
done
__state_unlock
__log "/dev/${1}:detach"
;;

esac

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Freddie Cash

On Fri, Feb 17, 2012 at 1:38 PM, Andriy Gapon  wrote:
> And just in case:
> Unified Extensible Firmware Interface Specification Version 2.3.1, Errata A
> September 7, 2011 says:
> [snip]
>> Two GPT Header structures are stored on the device: the primary and the
>> backup. The primary GPT Header must be located in LBA 1 (i.e., the second
>> logical block), and the backup GPT Header must be located in the last LBA
>> of the device.
>
> I can not see any ambiguity or openness to interpretation in this paragraph.

Unless it's specified somewhere else (which is possible), in this
paragraph, "device" does not necessarily mean "physical disk".  "Last
LBA of the device" could be interpreted as "last LBA of the GEOM
provider".

The beauty of GEOM is that "device" is whatever logical mapping it provides.

After all, LBAs are logical addresses (it's right there in the name!),
not hardwired physical sector addresses.  ;)  If they were hardwired,
then how would internal sector remapping work?  ;)

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Andriy Gapon

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

on 17/02/2012 16:28 Hiroki Sato said the following:
> Andriy Gapon  wrote in <4f3e3000.9000...@freebsd.org>:
> 
> av> -BEGIN PGP SIGNED MESSAGE- av> Hash: SHA1 av> av> on 17/02/2012
> 09:04 Hiroki Sato said the following: av> > No, the issue is our gptloader
> assumes the backup header is always located av> > at the (physical) last
> sector while this is not mandatory in the UEFI av> > specification. av> av>
> Are you sure?
> 
> Yes, sure.  In the gm0->md0+md1 case, the last LBA of the "device" is 
> changed (growed in size) but they can still have a valid backup header at
> "the last LBA - 1" before an attempt to grow the size of the volume as the
> last paragraph of your excerpts says.  If we *choose* to grow the device
> size permanently, the backup header must be relocated at the new last LBA.
> However, before the relocation happens, the specification says both the
> primary and secondary header must be valid in the previous device size.
> This is my understanding.

No, not before the relocation, but before the resizing.
The specification just tries to protect against unexpected situations during
the resizing, which otherwise should be a quick "almost-atomic" operation.

> This means software should assume the device size can grow and should not
> assume the backup header is always located at the last possible LBA on the
> device.  If AlternateLBA does not match "the device size - 1", the software
> should recognize the location of the backup header based on the information
> in the primary header first.

Nowhere in the specification this requirement is placed on software.  The
specification is quite explicit in using the word "must" when referring to the
placement of the backup header at the last LBA.

I don't follow your suggestion of putting FreeBSD into a position of
permanently living "in the moment" between now and potentially resizing the
volume in the undetermined future.

And just in case:
Unified Extensible Firmware Interface Specification Version 2.3.1, Errata A
September 7, 2011 says:
[snip]
> Two GPT Header structures are stored on the device: the primary and the 
> backup. The primary GPT Header must be located in LBA 1 (i.e., the second 
> logical block), and the backup GPT Header must be located in the last LBA 
> of the device.

I can not see any ambiguity or openness to interpretation in this paragraph.

- -- 
Andriy Gapon
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPPsjdAAoJEHSlLSemUf4vdlgIAK+iYLdNKK+ICREBcADHwdrN
vzht66LRhgghZAfiJ3ZYmWnV2dLcy1c2y676L2dRu+BKBEaS26sEKinieUAIVpaI
L3H/Wer8du9ywkTzZ+wBIo6aHOhxn+/Aj7dTDr9nUj7aNBY0pQbTqNLZfqSkrrUB
2y/oy1dCrw8J4VQnRXPhsieG7e4NHVvHzKLNfT9ShduuBd8jBBPDneZvXoZcBh0z
x9wDmBMshVISVz53s9mQGKQ2+nKTX9Y1dtCEHOOYmmRHWWWfFru8ABN7/F6lJA3p
UPEQU6UUfIFYNKf5g4mz5pOcOMfagNFmCAlZnso/DSIV6DaGj0b4Sn/oI0hf9eM=
=/R4S
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Nikola Pavlović

On Fri, Feb 17, 2012 at 06:09:55PM +0100, Miroslav Lachman wrote:
> Pete French wrote:
> >
> >Should this not be the recommended way of doing things even for MBR
> >disks ? I have a lot of machines booting from gmirror, but we always
> >do it by mirroring MBR partitions (or GPT ones). I cant see why you would
> >want to do it the other way round in fact. It doesnt gain you anything
> >does it ?
> 
> Yes it does? Am I the only one person on the whole earth seeing the
> big difference in easy setup of mirroring two drives instead of many
> individual partitions?
> 

You are not.  In fact, the current situation is ironic considering the
following passage from geom(4):

"Compared to traditional “volume management”, GEOM differs from most and
in some cases all previous implementations in the following ways:

[...]

" ·  GEOM is topologically agnostic.  Most volume management implementa‐
 tions have very strict notions of how classes can fit together, very
 often one fixed hierarchy is provided, for instance, subdisk - plex -
 volume.

[...]

"Fixed hierarchies are bad because they make it impossible to express
the intent efficiently.  In the fixed hierarchy above, it is not possible to
mirror two physical disks and then partition the mirror into subdisks,
instead one is forced to make subdisks on the physical volumes and to
mirror these two and two, resulting in a much more complex configuration.
GEOM on the other hand does not care in which order things are done, the
only restriction is that cycles in the graph will not be allowed."

So there, even the docs agree that mirror-partition ordering is not so
outlandish as some are suggesting.  IIRC, that's the way gmirror-ing is
described in the Handbook as well.

I would like to be understood that I didn't write this just to make a
smartass comment--I understand the difficulty and that the regression is
unintentional (as they all are).  But on the other hand, I don't think
it's now OK to just tell people something like "oh well, you are all
better of with partition-mirror order anyway, problem solved".  It's true
that it can be better sometimes, but that's not the point.  The point
is, specifically, you are now forced to set up mirroring in a way that
may not suit your needs or you have to start jumping through hoops (the
workaround with one big GPT and bsdlabel inside that doesn't seem *too*
bad though), and generally, an important aspect of GEOM is now formally
broken.

It must be fixed IMO, no ifs and buts, but OTOH people affected by
this should also have a certain degree of patience and understanding as
longs as the whole thing is not swept under the rug.

-- 
He that is giddy thinks the world turns round.
-- William Shakespeare, "The Taming of the Shrew"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: devd based AUTOMOUNTER

2012-02-17 Thread vermaden

I already made some changes for the 'better' ...

Here is the latest version:

#! /bin/sh

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
LOG="/var/log/automount.log"
STATE="/var/run/automount.state"
DATEFMT="%Y-%m-%d %H:%M:%S"

__create_mount_point() { # /* 1=DEV */
  MNT="/mnt/$( basename ${1} )"
  mkdir -p ${MNT}
}

__state_lock() {
  while [ -f ${STATE}.lock ]; do sleep 0.5; done
  :> ${STATE}.lock
}

__state_unlock() {
  rm ${STATE}.lock
}

__state_add() { # /* 1=DEV 2=PROVIDER 3=MNT */
  __state_lock
  grep -E "${3}$" ${STATE} 1> /dev/null 2> /dev/null && {
__log "${1}:duplicated '${STATE}'"
return 1
  }
  echo "${1} ${2} ${3}" >> ${STATE}
  __state_unlock
}

__state_remove() { # /* 1=MNT 2=STATE 3=LINE */
  BSMNT=$( echo ${1} | sed 's/\//\\\//g' )
  sed -i '' "/${BSMNT}\$/d" ${2}
}

__log() { # /* @=MESSAGE */
  echo $( date +"${DATEFMT}" ) ${@} >> ${LOG}
}

case ${2} in
  (attach)
for I in /dev/${1}*
do
  case $( file -L -s ${I} ) in
(*NTFS*)
  __create_mount_point ${I}
  ntfs-3g ${I} ${MNT} # /* sysutils/fusefs-ntfs */
  __log "${I}:mount (ntfs)"
  ;;
(*FAT*)
  __create_mount_point ${I}
  fsck_msdosfs -y ${I}
  mount_msdosfs -o large -o longnames -l -L pl_PL.ISO8859-2 -D cp852 
${I} ${MNT}
  __log "${I}:mount (fat)"
  ;;
(*ext2*)
  __create_mount_point ${I}
  fsck.ext2 -y ${I}
  mount -t ext2fs ${I} ${MNT}
  __log "${I}:mount (ext2)"
  ;;
(*ext3*)
  __create_mount_point ${I}
  fsck.ext3 -y ${I}
  mount -t ext2fs ${I} ${MNT}
  __log "${I}:mount (ext3)"
  ;;
(*ext4*)
  __create_mount_point ${I}
  fsck.ext4 -y ${I}
  ext4fuse ${I} ${MNT} # /* sysutils/fusefs-ext4fuse */
  __log "${I}:mount (ext4)"
  ;;
(*Unix\ Fast\ File*)
  __create_mount_point ${I}
  fsck_ufs -y ${I}
  mount ${I} ${MNT}
  __log "${I}:mount (ufs)"
  ;;
(*)
  case $( dd < ${O} count=1 | strings | head -1 ) in
(EXFAT)
  __create_mount_point ${I}
  mount.exfat ${I} ${MNT} # /* sysutils/fusefs-exfat */
  __log "${I}:mount (ufs)"
  ;;
(*) continue ;;
  esac
  ;;
  esac
  __state_add ${I} $( mount | grep " ${MNT} " | awk '{printf $1}' ) ${MNT} 
|| continue
done
;;

  (detach)
__state_lock
grep ${1} ${STATE} \
  | while read DEV PROVIDER MNT
do
  TARGET=$( mount | grep -E "^${PROVIDER} " | awk '{print $3}' )
  [ -z ${TARGET} ] && {
__state_remove ${MNT} ${STATE} ${LINE}
continue
  }
  umount -f ${TARGET} &
  unset TARGET
  __state_remove ${MNT} ${STATE} ${LINE}
  __log "${DEV}:umount"
done
__state_unlock
__log "/dev/${1}:detach"
;;

esac



I have made tests with 3 different USB drives, with different and same 
failesystems, connecting them all at once, disconnecting all at once, random 
connect, disconnect etc.

Currently it seems to work ok but suggestions are very welcome.


> Some things to consider/test:
> 
> How do I set custom flags, like nosuid,noatime,nodev,noexec,async (or
> sync) for mounts?
Currently You can add these options to filesystem specific mount command, but 
its definitely possible.

> What if make a usb drive with an illegal name, existing name or other 
> dangerous values?
The filesystem label is not used at all, I just use device names, which are 
reported by FreeBSD, so quite bulletproof here and then create appreciate 
/mnt/da0s1 directories.

> Can I use the automounter to either mount over another mount to
> impersonate it, or can I overwrite arbitrary files or directories?
I have done everything to check that and to omit that, if not, then submit a 
BUG ;)

Thanks for suggestions Matt.

Regards,
vermaden


-- 








































...
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

The "New BSD Installer" thread has shown me that I am totally obsolete in disk partitioning.

2012-02-17 Thread Edwin L. Culp W.

I've been following the above mentioned thread because I
wasn't convinced by the new bsd installer on my latest installation. Now,
the problem that I am seeing is no longer the new installer but that I am
obsolete in modern freebsd disk partitioning options and reliability of
each. I've been doing it the same way since before "the turn of the
century" (13 or 14 years). Hopefully I'm not alone.

If such a thing exists, I need a howto in mixing and matching all the
different partitioning options and combinations, pro's and con's, for as
many modern situations as possible. Any suggestions appreciated.

I did look at the handbook but it seems to have changed little and uses
sysinstall for the examples at:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/install-steps.html#SYSINSTALL-FDISK2

Thanks for any suggestions. I apologize for my ignorance.

P.S. I have wanted to understand and try things like the following comment
to the thread, but I have no idea where to begin or options for doing it.

Sorry, I wasnt suggesting that you should always mirror
the indiviudual partititons - just I happen to do that where
I am mixing ZFS and gmirror. Obviosuly you dont want to create
lots of little mirrors if you dont have to. But even with
one mirror, you can mirror a big partiton covering the whole
drive, and then carve that up with bsdlabel. No need to ever
mirror
the actual raw discs, and it works with GPT.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

2012-02-17 Thread Stefan Bethke

Am 14.02.2012 um 12:37 schrieb Alexander Leidinger:

> 1 FLOWTABLE

The last time I included this in a kernel it seemed to have odd effects on TCP 
connections.  Admittedly, that was probably two years or so ago, and I never 
bothered to find out what was happening in detail.  Is it safe now?


Stefan

-- 
Stefan BethkeFon +49 151 14070811



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

devd based AUTOMOUNTER

2012-02-17 Thread vermaden

Hi,

I have finally made some effort on writing flexible yet very simple automounter 
for FreeBSD desktop.

Feel free to submit me BUG reports ;)

It currently supports these file formats:
-- NTFS(rw) requires [port]sysutils/fusefs-ntfs[/port]
-- FAT/FAT32
-- exFAT requires [port]sysutils/fusefs-exfat[/port]
-- EXT2
-- EXT3
-- EXT4 requires [port]sysutils/fusefs-ext4fuse[/port]
-- UFS (DOH!)

It keeps state of the mounted devices at /var/run/automount.state and logs all 
activities to /var/log/automount.log file.

The place for the script is at /usr/local/sbin/automount.sh executable.

The only additional configuration it requires are those lines at the end of 
/etc/devd.conf file along with restarting /etc/rc.d/devd daemon.

notify 200 {
  match "system" "DEVFS";
  match "type" "CREATE";
  match "cdev" "(da|mmcsd)[0-9]+";
  action "/usr/local/sbin/automount.sh $cdev attach";
};

notify 200 {
  match "system" "DEVFS";
  match "type" "DESTROY";
  match "cdev" "(da|mmcsd)[0-9]+";
  action "/usr/local/sbin/automount.sh $cdev detach";
};

The /usr/local/sbin/automount.sh executable is here:

#! /bin/sh

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
LOG="/var/log/automount.log"
STATE="/var/run/automount.state"
DATEFMT="%Y-%m-%d %H:%M:%S"

__create_mount_point() { # /* 1=DEV */
  MNT="/mnt/$( basename ${1} )"
  mkdir -p ${MNT}
}

__state_lock() {
  while [ -f ${STATE}.lock ]; do sleep 0.5; done
  :> ${STATE}.lock
}

__state_unlock() {
  rm ${STATE}.lock
}

__state_add() { # /* 1=DEV 2=PROVIDER 3=MNT */
  __state_lock
  echo "${1} ${2} ${3}" >> ${STATE}
  __state_unlock
}

__state_remove() { # /* 1=MNT 2=STATE 3=LINE */
  LINE=$( grep -n -E "${1}$" ${2} | cut -d : -f 1 )
  sed -i '' ${3}d ${2}
}

__log() { # /* @=MESSAGE */
  echo $( date +"${DATEFMT}" ) ${@} >> ${LOG}
}

case ${2} in
  (attach)
for I in /dev/${1}*
do
  case $( file -L -s ${I} ) in
(*NTFS*)
  __create_mount_point ${I}
  ntfs-3g ${I} ${MNT} # /* sysutils/fusefs-ntfs */
  __log "${I}:mount (ntfs)"
  ;;
(*FAT*)
  __create_mount_point ${I}
  fsck_msdosfs -y ${I}
  mount_msdosfs -o large -o longnames -l -L pl_PL.ISO8859-2 -D cp852 
${I} ${MNT}
  __log "${I}:mount (fat)"
  ;;
(*ext2*)
  __create_mount_point ${I}
  fsck.ext2 -y ${I}
  mount -t ext2fs ${I} ${MNT}
  __log "${I}:mount (ext2)"
  ;;
(*ext3*)
  __create_mount_point ${I}
  fsck.ext3 -y ${I}
  mount -t ext2fs ${I} ${MNT}
  __log "${I}:mount (ext3)"
  ;;
(*ext4*)
  __create_mount_point ${I}
  fsck.ext4 -y ${I}
  ext4fuse ${I} ${MNT} # /* sysutils/fusefs-ext4fuse */
  __log "${I}:mount (ext4)"
  ;;
(*Unix\ Fast\ File*)
  __create_mount_point ${I}
  fsck_ufs -y ${I}
  mount ${I} ${MNT}
  __log "${I}:mount (ufs)"
  ;;
(*)
  case $( dd < ${O} count=1 | strings | head -1 ) in
(EXFAT)
  __create_mount_point ${I}
  mount.exfat ${I} ${MNT} # /* sysutils/fusefs-exfat */
  __log "${I}:mount (ufs)"
  ;;
(*) continue ;;
  esac
  ;;
  esac
  __state_add ${I} $( mount | grep " ${MNT} " | awk '{printf $1}' ) ${MNT}
done
;;

  (detach)
MOUNTED=$( mount )
__state_lock
while read DEV PROVIDER MNT
do
  TARGET=$( echo "${MOUNTED}" | grep -E "^${PROVIDER} " | awk '{print $3}' )
  [ -z ${TARGET} ] && {
__state_remove ${MNT} ${STATE} ${LINE}
continue
  }
  umount -f ${TARGET} &
  unset TARGET
  __state_remove ${MNT} ${STATE} ${LINE}
  __log "${DEV}:umount"
done < ${STATE}
__state_unlock
__log "/dev/${1}:detach"
;;

esac


PS. Below are links for 'mirror' threads.
http://forums.freebsd.org/showthread.php?t=29895
http://daemonforums.org/showthread.php?t=6838

Regards,
vermaden
























---
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

2012-02-17 Thread Julian Elischer


On 2/17/12 3:28 AM, David Xu wrote:

On 2012/2/17 16:06, Julian Elischer wrote:

On 2/16/12 11:41 PM, Julian Elischer wrote:

adding jkim as he seems to be the last person working with TSC.


On 2/16/12 6:42 PM, David Xu wrote:

On 2012/2/17 10:19, Julian Elischer wrote:

On 2/16/12 5:56 PM, David Xu wrote:

On 2012/2/17 8:42, Julian Elischer wrote:
Adding David Xu for his thoughts since he reqrote the code in 
quesiton in revision 213098


On 2/16/12 2:57 PM, Julian Elischer wrote:

On 2/16/12 1:06 PM, Julian Elischer wrote:

On 2/16/12 9:34 AM, Andriy Gapon wrote:

on 15/02/2012 23:41 Julian Elischer said the following:

The program fio (an IO test in ports) uses pthreads

the following code (from fio-2.0.3, but its in earlier 
code too)

has suddenly started misbehaving.

 clock_gettime(CLOCK_REALTIME,&t);
 t.tv_sec += seconds + 10;

 pthread_mutex_lock(&mutex->lock);

 while (!mutex->value&&  !ret) {
 mutex->waiters++;
 ret = 
pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);

 mutex->waiters--;
 }

 if (!ret) {
 mutex->value--;
 pthread_mutex_unlock(&mutex->lock);
 }


It turns out that 'ret' sometimes comes back instantly (on 
my machine) with a

value of 60 (ETIMEDOUT)
despite the fact that we set the timeout 10 seconds into 
the future.


Has anyone else seen anything like this?
(and yes the condition variable attribute have been set to 
use the REALTIME clock).

But why?

Just a hypothesis that maybe there is some issue with time 
keeping on that system.

How would that code work out for you with MONOTONIC?


Jens Axboe, (CC'd) tried both CLOCK_REALTIME and 
CLOCK_MONOTONIC, and they both had the same problem..

i.e. random early returns with ETIMEDOUT.

I think we will try move out machine forward to a newer 
-stable to see if it resolves.
Kan upgraded the machine today to today's 9.x branch tip and 
the problem still occurs.

8.x does not have this problem.

I have not got a 9-RELEASE machine to test on.. so I can not 
tell if this came in with the burst of stuff
that came in after the 9.x branch was unfrozen after the 
release of 9.0.





I am trying to reproduce the problem,  do you have complete 
sample code to test ?


I'm still looking the exact set
but on my machine (4 cpus) the program from ports sysutils/fio 
exhibits the problem when used with
kern.timecounter.hardware=TSC-low and with the following config 
file:


pu05 # cat config.fio

[global]
#clocksource=cpu
direct=1
rw=randread
bs=4096
fill_device=1
numjobs=16
iodepth=16
#ioengine=posixaio
#ioengine=psync
ioengine=psync
group_reporting
norandommap
time_based
runtime=6
randrepeat=0

[file1]
filename=/dev/ada0

pu05 #
pu05 # fio config.fio
fio: this platform does not support process shared mutexes, 
forcing use of threads. Use the 'thread' option to get rid of 
this warning.
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, 
iodepth=16

...
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, 
iodepth=16

fio 2.0.3
Starting 15 threads and 1 process
fio: job startup hung? exiting.
fio: 5 jobs failed to start
Segmentation fault (core dumped)
pu05#


The reason 5 jobs failed to start is because the parent timed 
out on them immediately.

It didn't time out on 10 of them apparently.


if I set the timer to ACPI-fast it works as expected..
maybe following code can check to see if TSC-LOW works by let the 
thread run

on each cpu.

gettimeofday(&prev, NULL);
int cpu = 0;
for (;;) {
 cpuset_t set;
 cpu = ++cpu % 4;
 CPU_ZERO(&set);
 CPU_SET(cpu, &set);
 pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
 gettimeofday(&cur, NULL);
 if ( timercmp(&prev, &cur, >=)) {
abort();
   }
}




pu05# sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: ACPI-fast -> TSC-low
pu05# ./test
^C
pu05# cat test.c

#include 
#include 
#include 
#include 

#include 

main()
{
int cpu = 0;
struct timeval prev, cur;

gettimeofday(&prev, NULL);
for (;;) {
 cpuset_t set;
 cpu = ++cpu % 4;
 CPU_ZERO(&set);
 CPU_SET(cpu, &set);
 pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
 gettimeofday(&cur, NULL);
 if ( timercmp(&prev, &cur, >)) {
abort();
   }
   prev = cur;
}
}

pu05# ./test

minutes pass...

^C
pu05#

so it looks as if the TSC is working ok..
I'm just going to check that the program is actually moving CPU...
yes it is moving around but I can't tell at what speed. (according 
to top).


so we are still left with a question of "where is the problem?"

kernel TSC driver?
generic gettimeofday() code?
pthreads cond code?
the application?



I am running the fio test on my notebook which is using TSC-low,
it is on 9.0-RC3, I can not reproduce the problem for
minutes, then I interrupt it with ctrl-c: looks mot

http://people.freebsd.org/~da

Re: New BSD Installer

2012-02-17 Thread Pete French

> Yes it does? Am I the only one person on the whole earth seeing the big 
> difference in easy setup of mirroring two drives instead of many 
> individual partitions?

Sorry, I wasnt suggesting that you should always mirror
the indiviudual partititons - just I happen to do that where
I am mixing ZFS and gmirror. Obviosuly you dont want to create
lots of little mirrors if you dont have to. But even with
one mirror, you can mirror a big partiton covering the whole
drive, and then carve that up with bsdlabel. No need to ever mirror
the actual raw discs, and it works with GPT.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Miroslav Lachman

Pete French wrote:

I wasn't aware you could do that.  I was only aware that it was the
other way around.  That (my) misconception seems to also be relayed
by others such as Miroslav who said:

Should this not be the recommended way of doing things even for MBR
disks ? I have a lot of machines booting from gmirror, but we always
do it by mirroring MBR partitions (or GPT ones). I cant see why you would
want to do it the other way round in fact. It doesnt gain you anything
does it ?

Yes it does? Am I the only one person on the whole earth seeing the big 
difference in easy setup of mirroring two drives instead of many 
individual partitions?

Freddie Cash already write about disk thrashing on rebuild after power 
failure, but initial setup or repair after disk replacement is also pain 
with mirroring individual partitions.

> As someone else pointed out, you do need to partition the two drives
> to match, and add bootloaders to them. But thats not really a great
> hardship is it, and everything just works properly. You dont need
> any different bootloader (as it sees the start of the drive and the
> gmirror stuff is at the end).
>

Comparing our usual setup with 7 partitions after disk failure and 
replacement:

Mirroring whole drives (after failed disk replacement):

1) gmirror forget -v gm0
2) gmirror insert -v gm0 ada1

And I am done!

Mirroring individual partitions (maintenance nightmare)):

 1) find sizes of partitions
 2) create partitions on new drive
 3) install boot sector
 4) gmirror forget -v gm0p1
 5) gmirror insert -v gm0p1 ada1p1 (and wait til synchronized)
 6) gmirror forget -v gm0p2
 7) gmirror insert -v gm0p2 ada1p2 (and wait til synchronized)
 8) gmirror forget -v gm0p3
 9) gmirror insert -v gm0p3 ada1p3 (and wait til synchronized)
10) gmirror forget -v gm0p4
11) gmirror insert -v gm0p4 ada1p4 (and wait til synchronized)
12) gmirror forget -v gm0p5
13) gmirror insert -v gm0p5 ada1p5 (and wait til synchronized)
14) gmirror forget -v gm0p6
15) gmirror insert -v gm0p6 ada1p6 (and wait til synchronized)
16) gmirror forget -v gm0p7
17) gmirror insert -v gm0p7 ada1p7

And after 15 more steps, you are done too.

I think you cannot compare mirrored partitions to what can be done by 
ZFS mirror or gmirror on whole drives and I am not willing to go by this 
way. I will use gmirror and MBR where possible.

Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

kerberized NFS

2012-02-17 Thread Giulio Ferro


Thanks everybody again for your help with setting up a working
kerberized nfsv4 system.

I was able to user-mount a nfsv4 share with krb5 security, and I was
trying to do the same as root.

Unfortunately the patch I found here:
http://people.freebsd.org/~rmacklem/rpcsec_gss.patch

fails to apply cleanly on a 9 stable system.

Is there a more recent patch available or some better way to automatically
mount the share at boot time?

Thanks again.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pthread_cond_timedwait() broken in 9-stable? [possible answer]

2012-02-17 Thread Jung-uk Kim

On Thursday 16 February 2012 08:55 pm, Julian Elischer wrote:
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950)
> ACPI-fast(900) dummy(-100)
> kern.timecounter.hardware: ACPI-fast
> kern.timecounter.stepwarnings: 0
>
> switching the machine from TSC_low to ACPI-fast  fixes the problem.
>
> in 8.x it used to default to ACPI
> but I used to switch it to "TSC" to get better performance.
>
> I wonder why TSC-low is now bad to use..
> maybe the TSCs are not as well sychronised as they were in 8.x?

Can you please show us verbose dmesg output?

FYI, TSC and TSC-low are not very different.  TSC-low is just lower 
resolution version of TSC for SMP.  Only difference is, we have 
automated your timecounter choice, i.e., if TSCs seem reasonably 
well-synchronized, select it by default but give lower resolution.  
In other words, if your TSC timecounter was never going backwards 
previously, TSC-low timecounter won't, guaranteed.  So, the root 
cause should be somewhere else.

Jung-uk Kim
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

2012-02-17 Thread Jung-uk Kim

On Friday 17 February 2012 06:28 am, David Xu wrote:
> On 2012/2/17 16:06, Julian Elischer wrote:
> > On 2/16/12 11:41 PM, Julian Elischer wrote:
> >> adding jkim as he seems to be the last person working with TSC.
> >>
> >> On 2/16/12 6:42 PM, David Xu wrote:
> >>> On 2012/2/17 10:19, Julian Elischer wrote:
>  On 2/16/12 5:56 PM, David Xu wrote:
> > On 2012/2/17 8:42, Julian Elischer wrote:
> >> Adding David Xu for his thoughts since he reqrote the code
> >> in quesiton in revision 213098
> >>
> >> On 2/16/12 2:57 PM, Julian Elischer wrote:
> >>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>  On 2/16/12 9:34 AM, Andriy Gapon wrote:
> > on 15/02/2012 23:41 Julian Elischer said the following:
> >> The program fio (an IO test in ports) uses pthreads
> >>
> >> the following code (from fio-2.0.3, but its in earlier
> >> code too) has suddenly started misbehaving.
> >>
> >>  clock_gettime(CLOCK_REALTIME,&t);
> >>  t.tv_sec += seconds + 10;
> >>
> >>  pthread_mutex_lock(&mutex->lock);
> >>
> >>  while (!mutex->value&&  !ret) {
> >>  mutex->waiters++;
> >>  ret =
> >> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
> >>  mutex->waiters--;
> >>  }
> >>
> >>  if (!ret) {
> >>  mutex->value--;
> >>  pthread_mutex_unlock(&mutex->lock);
> >>  }
> >>
> >>
> >> It turns out that 'ret' sometimes comes back instantly
> >> (on my machine) with a
> >> value of 60 (ETIMEDOUT)
> >> despite the fact that we set the timeout 10 seconds into
> >> the future.
> >>
> >> Has anyone else seen anything like this?
> >> (and yes the condition variable attribute have been set
> >> to use the REALTIME clock).
> >
> > But why?
> >
> > Just a hypothesis that maybe there is some issue with
> > time keeping on that system.
> > How would that code work out for you with MONOTONIC?
> 
>  Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
>  CLOCK_MONOTONIC, and they both had the same problem..
>  i.e. random early returns with ETIMEDOUT.
> 
>  I think we will try move out machine forward to a newer
>  -stable to see if it resolves.
> >>>
> >>> Kan upgraded the machine today to today's 9.x branch tip
> >>> and the problem still occurs.
> >>> 8.x does not have this problem.
> >>>
> >>> I have not got a 9-RELEASE machine to test on.. so I can
> >>> not tell if this came in with the burst of stuff
> >>> that came in after the 9.x branch was unfrozen after the
> >>> release of 9.0.
> >
> > I am trying to reproduce the problem,  do you have complete
> > sample code to test ?
> 
>  I'm still looking the exact set
>  but on my machine (4 cpus) the program from ports sysutils/fio
>  exhibits the problem when used with
>  kern.timecounter.hardware=TSC-low and with the following
>  config file:
> 
>  pu05 # cat config.fio
> 
>  [global]
>  #clocksource=cpu
>  direct=1
>  rw=randread
>  bs=4096
>  fill_device=1
>  numjobs=16
>  iodepth=16
>  #ioengine=posixaio
>  #ioengine=psync
>  ioengine=psync
>  group_reporting
>  norandommap
>  time_based
>  runtime=6
>  randrepeat=0
> 
>  [file1]
>  filename=/dev/ada0
> 
>  pu05 #
>  pu05 # fio config.fio
>  fio: this platform does not support process shared mutexes,
>  forcing use of threads. Use the 'thread' option to get rid of
>  this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K,
>  ioengine=psync, iodepth=16 ...
>  file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync,
>  iodepth=16 fio 2.0.3
>  Starting 15 threads and 1 process
>  fio: job startup hung? exiting.
>  fio: 5 jobs failed to start
>  Segmentation fault (core dumped)
>  pu05#
> 
> 
>  The reason 5 jobs failed to start is because the parent timed
>  out on them immediately.
>  It didn't time out on 10 of them apparently.
> 
> 
>  if I set the timer to ACPI-fast it works as expected..
> >>>
> >>> maybe following code can check to see if TSC-LOW works by let
> >>> the thread run
> >>> on each cpu.
> >>>
> >>> gettimeofday(&prev, NULL);
> >>> int cpu = 0;
> >>> for (;;) {
> >>>  cpuset_t set;
> >>>  cpu = ++cpu % 4;
> >>>  CPU_ZERO(&set);
> >>>  CPU_SET(cpu, &set);
> >>>  pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
> >>>  gettimeofday(&cur, NULL);
> >>>  if ( timercmp(&prev, &cur, >=)) {
> >>> abort();
> >>>}
>

Re: New BSD Installer

2012-02-17 Thread Michiel Boland


On 02/17/2012 16:21, Freddie Cash wrote:
[...]

The problem with mirroring partitions is that you thrash the disk
during the rebuild after replacing a failed disk.  And the more
partitions you have, the worse it gets.


I guess that if you do per-slice mirroring you should turn off autosync, right?

Cheers
Michiel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Warren Block


On Fri, 17 Feb 2012, Freddie Cash wrote:


On Fri, Feb 17, 2012 at 3:12 AM, Pete French  wrote:

I wasn't aware you could do that.  I was only aware that it was the
other way around.  That (my) misconception seems to also be relayed
by others such as Miroslav who said:


Should this not be the recommended way of doing things even for MBR
disks ? I have a lot of machines booting from gmirror, but we always
do it by mirroring MBR partitions (or GPT ones). I cant see why you would
want to do it the other way round in fact. It doesnt gain you anything
does it ?


The problem with mirroring partitions is that you thrash the disk
during the rebuild after replacing a failed disk.


Potentially, yes.


And the more partitions you have, the worse it gets.


One big mirrored partition avoids it, but then there's only one 
partition.  (ad0p2a?  Forget I mentioned that.)



If you mirror the device, then the rebuild process only has to rebuild
a single "thing".

If you mirror 4 partitions on a device, then there will be four
simultaneous, parallel rebuild processes running, thrashing the drive
heads on both devices, killing you I/O throughput and extending the
length of the rebuild.


Some queuing logic in the mirror rebuild could avoid that.  I am 
blissfully unaware of how complicated that might be.___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Andriy Gapon

on 17/02/2012 16:28 Hiroki Sato said the following:
> Andriy Gapon  wrote
>   in <4f3e3000.9000...@freebsd.org>:
> 
> av> -BEGIN PGP SIGNED MESSAGE-
> av> Hash: SHA1
> av>
> av> on 17/02/2012 09:04 Hiroki Sato said the following:
> av> > No, the issue is our gptloader assumes the backup header is always 
> located
> av> > at the (physical) last sector while this is not mandatory in the UEFI
> av> > specification.
> av>
> av> Are you sure?
> 
>  Yes, sure.

Sorry, I haven't really given a thought to what you wrote below.
You said that "this is not mandatory in the UEFI specification" and I gave the
quotes which say that it is.  Also keep in mind that BIOS and other OSs know
nothing about FreeBSD GEOM.

> In the gm0->md0+md1 case, the last LBA of the "device" is
>  changed (growed in size) but they can still have a valid backup
>  header at "the last LBA - 1" before an attempt to grow the size of
>  the volume as the last paragraph of your excerpts says.  If we
>  *choose* to grow the device size permanently, the backup header must
>  be relocated at the new last LBA.  However, before the relocation
>  happens, the specification says both the primary and secondary header
>  must be valid in the previous device size.  This is my understanding.
> 
>  This means software should assume the device size can grow and should
>  not assume the backup header is always located at the last possible
>  LBA on the device.  If AlternateLBA does not match "the device size -
>  1", the software should recognize the location of the backup header
>  based on the information in the primary header first.  The gptboot
>  does not do so currently.  I didn't give it a try actually but the
>  attached patch is what I want to say.
> 
> -- Hiroki


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Pete French

> The problem with mirroring partitions is that you thrash the disk
> during the rebuild after replacing a failed disk.  And the more
> partitions you have, the worse it gets.

yes, this is true - actually I have had this on older
machiens, and have had to stop the rebuilds of each bit until
the other completes for this reason. had forgotten that

am about to replace a failed drive in one of my 2 gmirror ++ zfs boxes,
so the reminder comes at a good time ;)

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Freddie Cash

On Fri, Feb 17, 2012 at 3:12 AM, Pete French  wrote:
>> I wasn't aware you could do that.  I was only aware that it was the
>> other way around.  That (my) misconception seems to also be relayed
>> by others such as Miroslav who said:
>
> Should this not be the recommended way of doing things even for MBR
> disks ? I have a lot of machines booting from gmirror, but we always
> do it by mirroring MBR partitions (or GPT ones). I cant see why you would
> want to do it the other way round in fact. It doesnt gain you anything
> does it ?

The problem with mirroring partitions is that you thrash the disk
during the rebuild after replacing a failed disk.  And the more
partitions you have, the worse it gets.

If you mirror the device, then the rebuild process only has to rebuild
a single "thing".

If you mirror 4 partitions on a device, then there will be four
simultaneous, parallel rebuild processes running, thrashing the drive
heads on both devices, killing you I/O throughput and extending the
length of the rebuild.

And if you mix your redundancy technologies (like gmirror and zfs
mirror) it gets even worse due to competing rebuild schedulers.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

2012-02-17 Thread Freddie Cash

On Fri, Feb 17, 2012 at 3:21 AM, Alexander Leidinger
 wrote:
> Quoting Freddie Cash  (from Tue, 14 Feb 2012 08:26:54
> -0800):
>
>> On Tue, Feb 14, 2012 at 7:43 AM, Ian Smith  wrote:
>>>
>>> On Tue, 14 Feb 2012 2:37:55 +0100, Alexander Leidinger wrote:
>>>  > 1 IPSTEALTH                      -> changes ipfw module only?
>>>
>>> I don't think this is specific to ipfw.  From /sys/conf/NOTES:
>>>
>>> # IPSTEALTH enables code to support stealth forwarding (i.e., forwarding
>>> # packets without touching the TTL).  This can be useful to hide
>>> firewalls
>>> # from traceroute and similar tools.
>>>
>>> But can it be disabled once added to kernel?  It's no good as a default.
>>
>>
>> It's controllable via sysctl once it's compiled into the kernel.  If
>> it's not compiled into the kernel, then the sysctl doesn't exist.
>
>
> Is it the following?
> net.inet.ip.stealth=0

Yeah, that's the one.
-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Hiroki Sato

Andriy Gapon  wrote
  in <4f3e3000.9000...@freebsd.org>:

av> -BEGIN PGP SIGNED MESSAGE-
av> Hash: SHA1
av>
av> on 17/02/2012 09:04 Hiroki Sato said the following:
av> > No, the issue is our gptloader assumes the backup header is always located
av> > at the (physical) last sector while this is not mandatory in the UEFI
av> > specification.
av>
av> Are you sure?

 Yes, sure.  In the gm0->md0+md1 case, the last LBA of the "device" is
 changed (growed in size) but they can still have a valid backup
 header at "the last LBA - 1" before an attempt to grow the size of
 the volume as the last paragraph of your excerpts says.  If we
 *choose* to grow the device size permanently, the backup header must
 be relocated at the new last LBA.  However, before the relocation
 happens, the specification says both the primary and secondary header
 must be valid in the previous device size.  This is my understanding.

 This means software should assume the device size can grow and should
 not assume the backup header is always located at the last possible
 LBA on the device.  If AlternateLBA does not match "the device size -
 1", the software should recognize the location of the backup header
 based on the information in the primary header first.  The gptboot
 does not do so currently.  I didn't give it a try actually but the
 attached patch is what I want to say.

-- Hiroki
Index: sys/boot/common/gpt.c
===
--- sys/boot/common/gpt.c	(revision 230616)
+++ sys/boot/common/gpt.c	(working copy)
@@ -333,24 +333,26 @@
 	gptread_table("primary", uuid, dskp, &hdr_primary,
 	table_primary) == 0) {
 		hdr_primary_lba = hdr_primary.hdr_lba_self;
+		/* Use AlternateLBA if valid.  If not, use LastUsableLBA+34. */
+		if (hdr_primary_lba < hdr_primary.hdr_lba_alt)
+			altlba = hdr_primary.hdr_lba_alt;
+		else if (hdr_primary.hdr_lba_end != 0)
+			altlba = hdr_primary.hdr_lba_end + 34;
 		gpthdr = &hdr_primary;
 		gpttable = table_primary;
 	}

-	altlba = drvsize(dskp);
-	if (altlba > 0)
-		altlba--;
-	else if (hdr_primary_lba > 0) {
-		/*
-		 * If we cannot obtain disk size, but primary header
-		 * is valid, we can get backup header location from
-		 * there.
-		 */
-		altlba = hdr_primary.hdr_lba_alt;
+	/*
+	 * Try to locate the backup header from the media size if no primary
+	 * header found.
+	 */
+	if (hdr_primary_lba == 0) {
+		altlba = drvsize(dskp);
+		if (altlba > 0)
+			altlba--;
 	}
-	if (altlba == 0)
-		printf("%s: unable to locate backup GPT header\n", BOOTPROG);
-	else if (gptread_hdr("backup", dskp, &hdr_backup, altlba) == 0 &&
+	if (altlba != 0 &&
+	gptread_hdr("backup", dskp, &hdr_backup, altlba) == 0 &&
 	gptread_table("backup", uuid, dskp, &hdr_backup,
 	table_backup) == 0) {
 		hdr_backup_lba = hdr_backup.hdr_lba_self;
@@ -359,7 +361,8 @@
 			gpttable = table_backup;
 			printf("%s: using backup GPT\n", BOOTPROG);
 		}
-	}
+	} else
+		printf("%s: unable to locate backup GPT header\n", BOOTPROG);

 	/*
 	 * Convert all BOOTONCE without BOOTME flags into BOOTFAILED.


pgppi2XRbnX5b.pgp
Description: PGP signature

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

2012-02-17 Thread Alexander Leidinger


Quoting Freddie Cash  (from Tue, 14 Feb 2012
08:26:54 -0800):


On Tue, Feb 14, 2012 at 7:43 AM, Ian Smith  wrote:

On Tue, 14 Feb 2012 2:37:55 +0100, Alexander Leidinger wrote:
 > 1 IPSTEALTH                      -> changes ipfw module only?

I don't think this is specific to ipfw.  From /sys/conf/NOTES:

# IPSTEALTH enables code to support stealth forwarding (i.e., forwarding
# packets without touching the TTL).  This can be useful to hide firewalls
# from traceroute and similar tools.

But can it be disabled once added to kernel?  It's no good as a default.


It's controllable via sysctl once it's compiled into the kernel.  If
it's not compiled into the kernel, then the sysctl doesn't exist.


Is it the following?
net.inet.ip.stealth=0

Bye,
Alexander.

--
BOFH excuse #152:

My pony-tail hit the on/off switch on the power strip

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Reducing the need to compile a custom kernel

2012-02-17 Thread Alexander Leidinger

Quoting Nenhum_de_Nos  (from Tue, 14 Feb  
2012 10:49:56 -0200):



On Tue, February 14, 2012 08:31, Alexander Leidinger wrote:



Embedded devices are out of the scope of this, normally you do a lot
of other modifictions to such systems anyway, so a custom kernel
should be not a big problem.

I will also not touch the dual-stack part of the kernel config (it
shall still allow the generic purpose computing like the GERNERIC
config).


I'm really curious why, if they are the piece of hardware that  
usually are worse to compile things
on, for access issues to poor hardware (great to compile  
kernel+world on i7, pain to do so in my

net5501-70).


Typically embedded environments have a different goal regarding the  
kernel, than a normal server/desktop. In an embedded sytem RAM and  
disc space may be very limited and as such you need to get rid of a  
lot of things you want to have in a server kernel. A server is also  
some kind of generic purpose device, whereas an embedded system is a  
special purpose device. If we do not know the special purpose of the  
device, we can not provide a suitable kernel for it (a NAS has other  
requirements than a router, firewall, WLAN access point or multimedia  
system).


Regarding the compile time issue you talked about: cross compiling a  
world/kernel is supported by FreeBSD.


It may be not a bad idea to provide examples of special purpose  
kernels with FreeBSD, but this is a completely different topic I (as  
the thread started) want to discuss in this thread about the work _I_  
want to do and need input from the community for. You are off course  
free to open a new thread to discuss the kernel-config of special  
purpose devices.


Bye,
Alexander.

--
Bender: Bite my shiny, metal ass!

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

2012-02-17 Thread David Xu


On 2012/2/17 16:06, Julian Elischer wrote:

On 2/16/12 11:41 PM, Julian Elischer wrote:

adding jkim as he seems to be the last person working with TSC.


On 2/16/12 6:42 PM, David Xu wrote:

On 2012/2/17 10:19, Julian Elischer wrote:

On 2/16/12 5:56 PM, David Xu wrote:

On 2012/2/17 8:42, Julian Elischer wrote:
Adding David Xu for his thoughts since he reqrote the code in 
quesiton in revision 213098


On 2/16/12 2:57 PM, Julian Elischer wrote:

On 2/16/12 1:06 PM, Julian Elischer wrote:

On 2/16/12 9:34 AM, Andriy Gapon wrote:

on 15/02/2012 23:41 Julian Elischer said the following:

The program fio (an IO test in ports) uses pthreads

the following code (from fio-2.0.3, but its in earlier code too)
has suddenly started misbehaving.

 clock_gettime(CLOCK_REALTIME,&t);
 t.tv_sec += seconds + 10;

 pthread_mutex_lock(&mutex->lock);

 while (!mutex->value&&  !ret) {
 mutex->waiters++;
 ret = 
pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);

 mutex->waiters--;
 }

 if (!ret) {
 mutex->value--;
 pthread_mutex_unlock(&mutex->lock);
 }


It turns out that 'ret' sometimes comes back instantly (on my 
machine) with a

value of 60 (ETIMEDOUT)
despite the fact that we set the timeout 10 seconds into the 
future.


Has anyone else seen anything like this?
(and yes the condition variable attribute have been set to 
use the REALTIME clock).

But why?

Just a hypothesis that maybe there is some issue with time 
keeping on that system.

How would that code work out for you with MONOTONIC?


Jens Axboe, (CC'd) tried both CLOCK_REALTIME and 
CLOCK_MONOTONIC, and they both had the same problem..

i.e. random early returns with ETIMEDOUT.

I think we will try move out machine forward to a newer -stable 
to see if it resolves.
Kan upgraded the machine today to today's 9.x branch tip and the 
problem still occurs.

8.x does not have this problem.

I have not got a 9-RELEASE machine to test on.. so I can not 
tell if this came in with the burst of stuff
that came in after the 9.x branch was unfrozen after the release 
of 9.0.





I am trying to reproduce the problem,  do you have complete sample 
code to test ?


I'm still looking the exact set
but on my machine (4 cpus) the program from ports sysutils/fio 
exhibits the problem when used with

kern.timecounter.hardware=TSC-low and with the following config file:

pu05 # cat config.fio

[global]
#clocksource=cpu
direct=1
rw=randread
bs=4096
fill_device=1
numjobs=16
iodepth=16
#ioengine=posixaio
#ioengine=psync
ioengine=psync
group_reporting
norandommap
time_based
runtime=6
randrepeat=0

[file1]
filename=/dev/ada0

pu05 #
pu05 # fio config.fio
fio: this platform does not support process shared mutexes, forcing 
use of threads. Use the 'thread' option to get rid of this warning.

file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
...
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
fio 2.0.3
Starting 15 threads and 1 process
fio: job startup hung? exiting.
fio: 5 jobs failed to start
Segmentation fault (core dumped)
pu05#


The reason 5 jobs failed to start is because the parent timed out 
on them immediately.

It didn't time out on 10 of them apparently.


if I set the timer to ACPI-fast it works as expected..
maybe following code can check to see if TSC-LOW works by let the 
thread run

on each cpu.

gettimeofday(&prev, NULL);
int cpu = 0;
for (;;) {
 cpuset_t set;
 cpu = ++cpu % 4;
 CPU_ZERO(&set);
 CPU_SET(cpu, &set);
 pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
 gettimeofday(&cur, NULL);
 if ( timercmp(&prev, &cur, >=)) {
abort();
   }
}




pu05# sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: ACPI-fast -> TSC-low
pu05# ./test
^C
pu05# cat test.c

#include 
#include 
#include 
#include 

#include 

main()
{
int cpu = 0;
struct timeval prev, cur;

gettimeofday(&prev, NULL);
for (;;) {
 cpuset_t set;
 cpu = ++cpu % 4;
 CPU_ZERO(&set);
 CPU_SET(cpu, &set);
 pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
 gettimeofday(&cur, NULL);
 if ( timercmp(&prev, &cur, >)) {
abort();
   }
   prev = cur;
}
}

pu05# ./test

minutes pass...

^C
pu05#

so it looks as if the TSC is working ok..
I'm just going to check that the program is actually moving CPU...
yes it is moving around but I can't tell at what speed. (according to 
top).


so we are still left with a question of "where is the problem?"

kernel TSC driver?
generic gettimeofday() code?
pthreads cond code?
the application?



I am running the fio test on my notebook which is using TSC-low,
it is on 9.0-RC3, I can not reproduce the problem for
minutes, then I interrupt it with ctrl-c:

http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt
http://people.freebsd.org/

Re: New BSD Installer

2012-02-17 Thread Pete French

> I wasn't aware you could do that.  I was only aware that it was the
> other way around.  That (my) misconception seems to also be relayed
> by others such as Miroslav who said:

Should this not be the recommended way of doing things even for MBR
disks ? I have a lot of machines booting from gmirror, but we always
do it by mirroring MBR partitions (or GPT ones). I cant see why you would
want to do it the other way round in fact. It doesnt gain you anything
does it ?

As someone else pointed out, you do need to partition the two drives to
match, and add bootloaders to them. But thats not really a great
hardship is it, and everything just works properly. You dont need
any different bootloader (as it sees the start of the drive and the gmirror
stuff is at the end).

An example I have here is a machine setup with a pair of drives, each
has 3 partitions on it. ada0s1 ada0s2 and ada0s, plsu the correspoding
ones on ada1. The two s1 partititons are gmirrored for boot, and the two
s2 partitions are configured for swap, and the two s3 partitions are also
mirrored, but using ZFS instead of gmirror.

That kind of configuration is only really possible if you put the mirroring
inside the external partition table.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Andriy Gapon

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

on 17/02/2012 09:04 Hiroki Sato said the following:
> No, the issue is our gptloader assumes the backup header is always located
> at the (physical) last sector while this is not mandatory in the UEFI
> specification.

Are you sure?

Unified Extensible Firmware Interface Specification Version 2.3.1, Errata A
September 7, 2011 says:
[snip]
> Two GPT Header structures are stored on the device: the primary and the
> backup. The primary GPT Header must be located in LBA 1 (i.e., the second
> logical block), and the backup GPT Header must be located in the last LBA
> of the device. Within the GPT Header the My LBA field contains the
[snip]
> If the primary GPT is corrupt, software must check the last LBA of the
> device to see if it has a valid GPT Header and point to a valid GPT
> Partition Entry Array. If it points to a valid GPT Partition Entry Array,
> then software should restore the primary GPT if allowed by platform policy
> settings (e.g. a platform may require a user to provide confirmation before
> restoring the table, or may allow the table to be restored automatically).
> Software must report whenever it restores a GPT.
[snip]
> Both the primary and backup GPTs must be valid before an attempt is made to
> grow the size of a physical volume. This is due to the GPT recovery scheme
> depending on locating the backup GPT at the end of the device. A volume may
> grow in size when disks are added to a RAID device. As soon as the volume
> size is increased the backup GPT must be moved to the end of the volume and
> the primary and backup GPT Headers must be updated to reflect the new
> volume size.

- -- 
Andriy Gapon
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPPjoJEHSlLSemUf4vpjUH/0S2gDBN5gD1o7Aqa8W3BL2F
mbz+riZYoCKca1QBRVb6sJ/xaCVHidoivbJMbXDCNLf35tdCvillQiNuaR4YizRD
a8McAg4OpQmYlaNJ39/dpnIpPyY0XZ2jZWVV9PGob5tnh0uBDm0TL8/JSxIrsyol
l+QmUbuicRXzcKhwHRW4MArLtUD5jiZK2ytxpUvDgv8rJcKQO3dnMSPSFi2V8eFQ
0Yq2Nzb7Dwf9Ie6ldLT/Pw2dtkbCBYQbngPqtt7ynwVDQY0NA5OysPW3gym2OLo+
Vk+SsVTrLe9MVeD8T/4qSVvGIgm0xNqXcyOt7XIpN/yyHkbR20kfuzuq3sooN4o=
=/Q6i
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New BSD Installer

2012-02-17 Thread Andriy Gapon

on 17/02/2012 07:37 Freddie Cash said the following:
> Seems to me that we need a GEOM-aware loader

I am also adding a GEOM-aware BIOS/firmware to the wish-list.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pthread_cond_timedwait() broken in 9-stable? [possible answer]

2012-02-17 Thread Andriy Gapon

on 17/02/2012 03:55 Julian Elischer said the following:
> 
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) ACPI-fast(900)
> dummy(-100)
> kern.timecounter.hardware: ACPI-fast
> kern.timecounter.stepwarnings: 0
> 
> switching the machine from TSC_low to ACPI-fast  fixes the problem.
> 
> in 8.x it used to default to ACPI
> but I used to switch it to "TSC" to get better performance.
> 
> I wonder why TSC-low is now bad to use..
> maybe the TSCs are not as well sychronised as they were in 8.x?
> maybe the pthreads code didn't get the memo about changing timers?

More useful information that you can provide:
- C-states configuration
- CPU identification

I see that you've already contacted jkim, that's useful too.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

2012-02-17 Thread Julian Elischer


On 2/16/12 11:41 PM, Julian Elischer wrote:

adding jkim as he seems to be the last person working with TSC.


On 2/16/12 6:42 PM, David Xu wrote:

On 2012/2/17 10:19, Julian Elischer wrote:

On 2/16/12 5:56 PM, David Xu wrote:

On 2012/2/17 8:42, Julian Elischer wrote:
Adding David Xu for his thoughts since he reqrote the code in 
quesiton in revision 213098


On 2/16/12 2:57 PM, Julian Elischer wrote:

On 2/16/12 1:06 PM, Julian Elischer wrote:

On 2/16/12 9:34 AM, Andriy Gapon wrote:

on 15/02/2012 23:41 Julian Elischer said the following:

The program fio (an IO test in ports) uses pthreads

the following code (from fio-2.0.3, but its in earlier code 
too)

has suddenly started misbehaving.

 clock_gettime(CLOCK_REALTIME,&t);
 t.tv_sec += seconds + 10;

 pthread_mutex_lock(&mutex->lock);

 while (!mutex->value&&  !ret) {
 mutex->waiters++;
 ret = 
pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);

 mutex->waiters--;
 }

 if (!ret) {
 mutex->value--;
 pthread_mutex_unlock(&mutex->lock);
 }


It turns out that 'ret' sometimes comes back instantly (on 
my machine) with a

value of 60 (ETIMEDOUT)
despite the fact that we set the timeout 10 seconds into the 
future.


Has anyone else seen anything like this?
(and yes the condition variable attribute have been set to 
use the REALTIME clock).

But why?

Just a hypothesis that maybe there is some issue with time 
keeping on that system.

How would that code work out for you with MONOTONIC?


Jens Axboe, (CC'd) tried both CLOCK_REALTIME and 
CLOCK_MONOTONIC, and they both had the same problem..

i.e. random early returns with ETIMEDOUT.

I think we will try move out machine forward to a newer 
-stable to see if it resolves.
Kan upgraded the machine today to today's 9.x branch tip and 
the problem still occurs.

8.x does not have this problem.

I have not got a 9-RELEASE machine to test on.. so I can not 
tell if this came in with the burst of stuff
that came in after the 9.x branch was unfrozen after the 
release of 9.0.





I am trying to reproduce the problem,  do you have complete 
sample code to test ?


I'm still looking the exact set
but on my machine (4 cpus) the program from ports sysutils/fio 
exhibits the problem when used with

kern.timecounter.hardware=TSC-low and with the following config file:

pu05 # cat config.fio

[global]
#clocksource=cpu
direct=1
rw=randread
bs=4096
fill_device=1
numjobs=16
iodepth=16
#ioengine=posixaio
#ioengine=psync
ioengine=psync
group_reporting
norandommap
time_based
runtime=6
randrepeat=0

[file1]
filename=/dev/ada0

pu05 #
pu05 # fio config.fio
fio: this platform does not support process shared mutexes, 
forcing use of threads. Use the 'thread' option to get rid of this 
warning.

file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
...
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
fio 2.0.3
Starting 15 threads and 1 process
fio: job startup hung? exiting.
fio: 5 jobs failed to start
Segmentation fault (core dumped)
pu05#


The reason 5 jobs failed to start is because the parent timed out 
on them immediately.

It didn't time out on 10 of them apparently.


if I set the timer to ACPI-fast it works as expected..
maybe following code can check to see if TSC-LOW works by let the 
thread run

on each cpu.

gettimeofday(&prev, NULL);
int cpu = 0;
for (;;) {
 cpuset_t set;
 cpu = ++cpu % 4;
 CPU_ZERO(&set);
 CPU_SET(cpu, &set);
 pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
 gettimeofday(&cur, NULL);
 if ( timercmp(&prev, &cur, >=)) {
abort();
   }
}




pu05# sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: ACPI-fast -> TSC-low
pu05# ./test
^C
pu05# cat test.c

#include 
#include 
#include 
#include 

#include 

main()
{
int cpu = 0;
struct timeval prev, cur;

gettimeofday(&prev, NULL);
for (;;) {
 cpuset_t set;
 cpu = ++cpu % 4;
 CPU_ZERO(&set);
 CPU_SET(cpu, &set);
 pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
 gettimeofday(&cur, NULL);
 if ( timercmp(&prev, &cur, >)) {
abort();
   }
   prev = cur;
}
}

pu05# ./test

minutes pass...

^C
pu05#

so it looks as if the TSC is working ok..
I'm just going to check that the program is actually moving CPU...
yes it is moving around but I can't tell at what speed. (according to 
top).


so we are still left with a question of "where is the problem?"

kernel TSC driver?
generic gettimeofday() code?
pthreads cond code?
the application?





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Re: kerberized NFS

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Re: ZFS + nullfs + Linuxulator = panic?

Re: kerberized NFS

Re: devd based AUTOMOUNTER

Re: devd based AUTOMOUNTER

Re: New BSD Installer

Re: New BSD Installer

Re: New BSD Installer

Re: devd based AUTOMOUNTER

The "New BSD Installer" thread has shown me that I am totally obsolete in disk partitioning.

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

devd based AUTOMOUNTER

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Re: New BSD Installer

Re: New BSD Installer

kerberized NFS

Re: pthread_cond_timedwait() broken in 9-stable? [possible answer]

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Re: New BSD Installer

Re: New BSD Installer

Re: New BSD Installer

Re: New BSD Installer

Re: New BSD Installer

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

Re: New BSD Installer

Re: Custom kernel poll summary (was: Re: Reducing the need to compile a custom kernel)

Re: Reducing the need to compile a custom kernel

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Re: New BSD Installer

Re: New BSD Installer

Re: New BSD Installer

Re: pthread_cond_timedwait() broken in 9-stable? [possible answer]

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

35 matches

Site Navigation

Mail list logo

Footer information