nanosleep - does it make sense with tv_sec 0?

2010-07-29 Thread Garrett Cooper
Hi Hackers,
I ran into an oddity with the POSIX spec that seems a bit unrealistic:

[EINVAL]
The rqtp argument specified a nanosecond value less than zero or
greater than or equal to 1000 million.

Seems like it should also apply for seconds  0. We current
silently pass this argument in kern/kern_time.c:kern_nanosleep:

int
kern_nanosleep(struct thread *td, struct timespec *rqt, struct timespec *rmt)
{
struct timespec ts, ts2, ts3;
struct timeval tv;
int error;

if (rqt-tv_nsec  0 || rqt-tv_nsec = 10)
return (EINVAL);
if (rqt-tv_sec  0 || (rqt-tv_sec == 0  rqt-tv_nsec ==
0)) // -- first clause here
return (0);

but I'm wondering whether or not it makes logical sense for us to
do this (sleep for a negative amount of time?)...
FWIW Linux returns -1 and sets EINVAL in this case, which makes
more sense to me.
Thanks,
-Garrett
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Deterministic failure to meet sysconf(_SC_TIMER_MAX) for CLOCK_REALTIME

2010-07-29 Thread Garrett Cooper
Hi,
Running the following noted test [1], I always run into issues on
the 29th iteration and EAGAIN:

$ conformance/behavior/timers/1-1.run-test
timer_create() did not return success for iteration 29: Resource
temporarily unavailable
$ conformance/behavior/timers/1-1.run-test
timer_create() did not return success for iteration 29: Resource
temporarily unavailable
$ conformance/behavior/timers/1-1.run-test
timer_create() did not return success for iteration 29: Resource
temporarily unavailable
$ conformance/behavior/timers/1-1.run-test
timer_create() did not return success for iteration 29: Resource
temporarily unavailable

Interestingly enough, sysconf(_SC_TIMER_MAX) returns 54; this is
the requirement that the test is attempting to validate (that at least
_SC_TIMER_MAX timers can be created via timer_create).
The timers kernel code is capped to 25 by default, by a
preprocessor define in .../sys/sysctl.h:

/sys/sys/sysctl.h:#define CTL_P1003_1B_TIMER_MAX25  
/* int */

Doesn't make sense why an additional 4 timers were created.
Oh, and the sysctl reports something else entirely:

p1003_1b.timers: 200112
p1003_1b.delaytimer_max: 2147483647
p1003_1b.timer_max: 32

So, what number is the source of truth and why don't they all match?
Thanks!
-Garrett

PS I'm still running a CURRENT kernel based off of r206173...

[1] 
http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp-dev.git;a=blob;f=testcases/open_posix_testsuite/conformance/behavior/timers/1-1.c;h=ac043b0913e93f8db93cc74e249316f5ff82bdc8;hb=HEAD
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


(no subject)

2010-07-29 Thread rhfb
I have a similar problem.

I have a NFS server (8.0 upgraded a couple times since Feb 2010) that locks up
and requires a reboot.

The clients are busy vm's from VMWare ESXi using the NFS server for vmdk virtual
disk storage.

The ESXi reports nfs server inactive and all the vm's post disk write errors 
when
trying to write to their disk.

/etc/rc.d/nfsd restart fails to work (it can not kill the nfsd process)

The nfsd process runs at 100% cpu at rc_lo state in top.

reboot is the only fix.

It has only happened under two circumstances.
1) Installation of a VM using Windows 2008.
2) Migrating 16 million mail messages from a physical server to a VM running 
FreeBSD with ZFS file system as a VM on the ESXi box that uses NFS to store the 
VM's ZFS disk.

The NFS server uses ZFS also.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


coherence-problem on the mapped memory buffer

2010-07-29 Thread Alexander Fiveg
Hello hackers, 

while working on the ringmap-project I've faced a problem of
no coherency in the memory regions mapped from kernel into the 
user-space.

Details: 
While integrating ringmap with the ixgbe-driver, I've made some 
changes to the ixgbe:

1. The mbufs for received packets will be only allocated once.

2. Allocated mbufs will be reused as in ring-buffer one after the other (no
new mbufs will be allocated again).  

3. Packet buffers (mbuf-m_data) will mapped into the user-space. So, the
user-space process has access to the packets after those DMA-transfer from the 
network adapter into the RAM

Problem: 
Sometimes the user-space process sees not new DMAed data in the mapped
packet-buffer, but the OLD data that was previously stored in the same packet
buffer. If I try to monitor the received data in the kernel, the kernel sees
the data correctly. But sometimes it is vice versa: the user-space process 
sees the correct new data and the kernel sees the old data in the buffer.

It seems to be that the memory-buffer for packets is not synchronized with all  
CPU's caches. Probably [user|kernel]-thread tries sometimes to reads the old 
dirty data from the cache of the CPU the thread running on. (In the same time 
the other thread sees the new data in the same mapped buffer).
 
Can you please provide me with some information that would be helpful for 
avoiding this unexpected coherence-problem. 

Alex

P.S. Details about hardware and used software:
1. /var/run/dmesg.boot :
...
CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
  Origin = AuthenticAMD  Id = 0x20f10  Family = f  Model = 21  Stepping = 0
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x1SSE3
  AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!
  AMD Features2=0x3LAHF,CMP
real memory  = 3758030848 (3583 MB)
avail memory = 3677495296 (3507 MB)
ACPI APIC Table: A M I  OEMAPIC 
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 4 package(s) x 2 core(s)
...

2. uname -v
FreeBSD 9.0-CURRENT #3

3. sysctl kern.osreldate
kern.osreldate: 900014

4. //depot/projects/soc2010/ringmap/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: coherence-problem on the mapped memory buffer

2010-07-29 Thread Andriy Gapon
on 29/07/2010 17:13 Alexander Fiveg said the following:
 P.S. Details about hardware and used software:
 1. /var/run/dmesg.boot :
 ...
 CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
   Origin = AuthenticAMD  Id = 0x20f10  Family = f  Model = 21  Stepping = 0
   
 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
   Features2=0x1SSE3
   AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!
   AMD Features2=0x3LAHF,CMP
 real memory  = 3758030848 (3583 MB)
 avail memory = 3677495296 (3507 MB)
 ACPI APIC Table: A M I  OEMAPIC 
 FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
 FreeBSD/SMP: 4 package(s) x 2 core(s)
 ...
 
 2. uname -v
 FreeBSD 9.0-CURRENT #3
 
 3. sysctl kern.osreldate
 kern.osreldate: 900014
 
 4. //depot/projects/soc2010/ringmap/

No help, but just curious - do use amd64 variant?
If yes, can you reproduce the problem with i386?

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: coherence-problem on the mapped memory buffer

2010-07-29 Thread Alexander Fiveg
On Thursday 29 July 2010 18:13:23 Andriy Gapon wrote:
 on 29/07/2010 17:13 Alexander Fiveg said the following:
  P.S. Details about hardware and used software:
  1. /var/run/dmesg.boot :
  ...
  CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
Origin = AuthenticAMD  Id = 0x20f10  Family = f  Model = 21  Stepping
  = 0
 
  Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,
 MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x1SSE3
AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!
AMD Features2=0x3LAHF,CMP
  real memory  = 3758030848 (3583 MB)
  avail memory = 3677495296 (3507 MB)
  ACPI APIC Table: A M I  OEMAPIC 
  FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
  FreeBSD/SMP: 4 package(s) x 2 core(s)
  ...
 
  2. uname -v
  FreeBSD 9.0-CURRENT #3
 
  3. sysctl kern.osreldate
  kern.osreldate: 900014
 
  4. //depot/projects/soc2010/ringmap/

 No help, but just curious - do use amd64 variant?
 If yes, can you reproduce the problem with i386?

No, my kernel is i386, but I will try test it with amd64.

Thanks 
Alex
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: coherence-problem on the mapped memory buffer

2010-07-29 Thread Andriy Gapon
on 29/07/2010 19:13 Andriy Gapon said the following:
 on 29/07/2010 17:13 Alexander Fiveg said the following:
 P.S. Details about hardware and used software:
 1. /var/run/dmesg.boot :
 ...
 CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
   Origin = AuthenticAMD  Id = 0x20f10  Family = f  Model = 21  Stepping = 0
   
 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
   Features2=0x1SSE3
   AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!
   AMD Features2=0x3LAHF,CMP
 real memory  = 3758030848 (3583 MB)
 avail memory = 3677495296 (3507 MB)
 ACPI APIC Table: A M I  OEMAPIC 
 FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
 FreeBSD/SMP: 4 package(s) x 2 core(s)
 ...

 2. uname -v
 FreeBSD 9.0-CURRENT #3

 3. sysctl kern.osreldate
 kern.osreldate: 900014

 4. //depot/projects/soc2010/ringmap/

In fact I have a suspicion that the problem might have to do with multiple
mappings of the shared pages, but far from sure...
Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the 
PAT;
starting at the following words:
«The PAT allows any memory type to be specified in the page tables, and 
therefore
it is possible to have a single physical page mapped to two or more different
linear addresses, each with different memory types. Intel does not support this
practice...»



-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: coherence-problem on the mapped memory buffer

2010-07-29 Thread Andriy Gapon
on 29/07/2010 19:45 Alexander Fiveg said the following:
 On Thursday 29 July 2010 18:13:23 Andriy Gapon wrote:
 on 29/07/2010 17:13 Alexander Fiveg said the following:
 P.S. Details about hardware and used software:
 1. /var/run/dmesg.boot :
 ...
 CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
   Origin = AuthenticAMD  Id = 0x20f10  Family = f  Model = 21  Stepping
 = 0

 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,
 MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x1SSE3
   AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!
   AMD Features2=0x3LAHF,CMP
 real memory  = 3758030848 (3583 MB)
 avail memory = 3677495296 (3507 MB)
 ACPI APIC Table: A M I  OEMAPIC 
 FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
 FreeBSD/SMP: 4 package(s) x 2 core(s)
 ...

 2. uname -v
 FreeBSD 9.0-CURRENT #3

 3. sysctl kern.osreldate
 kern.osreldate: 900014

 4. //depot/projects/soc2010/ringmap/
 No help, but just curious - do use amd64 variant?
 If yes, can you reproduce the problem with i386?
 
 No, my kernel is i386, but I will try test it with amd64.

Oh, nevermind actually.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Re: coherence-problem on the mapped memory buffer

2010-07-29 Thread Sergey Babkin
Jul 29, 2010 12:58:07 PM, a...@icyb.net.ua wrote:

on 29/07/2010 19:13 Andriy Gapon said the following:
 on 29/07/2010 17:13 Alexander Fiveg said the following:
In fact I have a suspicion that the problem might have to do with multiple
mappings of the shared pages, but far from sure...
Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the 
PAT;
starting at the following words:
«The PAT allows any memory type to be specified in the page tables, and 
therefore
it is possible to have a single physical page mapped to two or more different
linear addresses, each with different memory types. Intel does not support this
practice...»

My guess would be that the memory type is not marked as DMA-capable. AFAIK the 
Intel CPUs
do the hardware snooping on the physical addresses, so they have no coherency 
issues benween 
themselves. However if a DMA writer changes the memory, this I think does not 
get normally 
propagated to the front-side bus, and the CPUs would not see it. You may need 
to either
explicitly flush the CPU cache before accessing these pages or mark them as 
non-cacheable.

-SB
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: coherence-problem on the mapped memory buffer

2010-07-29 Thread Andriy Gapon
on 29/07/2010 23:02 Sergey Babkin said the following:
 Jul 29, 2010 12:58:07 PM, a...@icyb.net.ua wrote:
 
 on 29/07/2010 19:13 Andriy Gapon said the following:
 on 29/07/2010 17:13 Alexander Fiveg said the following:
 In fact I have a suspicion that the problem might have to do with multiple
 mappings of the shared pages, but far from sure...
 Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual
 Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming 
 the PAT;
 starting at the following words:
 «The PAT allows any memory type to be specified in the page tables, and 
 therefore
 it is possible to have a single physical page mapped to two or more different
 linear addresses, each with different memory types. Intel does not support 
 this
 practice...»
 
 My guess would be that the memory type is not marked as DMA-capable. AFAIK 
 the Intel CPUs
 do the hardware snooping on the physical addresses, so they have no coherency 
 issues benween 
 themselves. However if a DMA writer changes the memory, this I think does not 
 get normally 
 propagated to the front-side bus, and the CPUs would not see it. You may need 
 to either
 explicitly flush the CPU cache before accessing these pages or mark them as 
 non-cacheable.

My guess was approximately the same - if one mapping is done in kernel for DMA
purposes, then the memory type is, most likely, set to uncached.  But the
userland mapping of the same pages most likely marks the same pages (via
different virtual addresses) as cached.  Depending on the hardware and on what
mappings were used on a particular CPU (core) to access that memory, there could
be differences in interaction with DMA.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Improvement for Distributed Audit Project

2010-07-29 Thread Sergio Ligregni
I am Sergio Ligregni, from Mexico, I am currently working in the Distributed
Audit Project at GSoC 2010, I want to ask your help in these things:

HELP NEEDED:

/*++*/

- which code should I base my development in getting parameters from a file?
(I've searched some audit.c, auditd_fbsd.c, auditd.c but not got the
function to do that, maybe I missed something), currently I have files like:
/var/audit
/var2/audit
1000
yes
53686

and got the parameters with sscanf, but the right way (the one I want to
know wich code to take as baseline):

dir:/var/audit /var2/audit
time: 1000
slave_dir: yes
port: 53686

and not to use sscanf (the avoiding of that function is a security concern
made by my mentor). I think I can do an algorithm to implement that, but
maybe there is a better/safer way to do in order to keeping the standard.

/*++*/
Currently I have this function to verify if a file is a trail, having it's
name, this is very poor and it needs to be improved, any ideas?

 /*
* When exploring /var/audit/ (or the directory where the trails are), not
* all files are trails so we must ensure we will only deal with the ones
* that are trails.
*/
static int
is_audit_trail(char *path)
{
  /*
   * We have these posibilities, only the first one is allowed
   * 20100619223115.20100619223131 20100619223131.not_terminated
   * current
   */
  if (strlen(path) == 29  path[14] == '.'  isdigit(path[15])) {
/* XXX To improve this checking later */
return 1;
  }
  return 0;
}
/*++*/

By the way the Wiki and the Perforce Repository for this project are:

http://wiki.freebsd.org/SOC2010SergioLigregni
http://p4db.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/projects/soc2010/disauditHIDEDEL=NO

Thanks!
-- 
---
Sergio Andrés Ligregni Arredondo

Estudiante Ingeniería en Sistemas Computacionales, ITQ.
Is UNIX Hot Enough for You? | FreeBSD
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: svn commit: r210561 - projects/sv/sys/net

2010-07-29 Thread Ed Maste
On Wed, Jul 28, 2010 at 03:10:31PM +, Attilio Rao wrote:

 Log:
   Initial import of the netdump files.
   They still need a lot of polishing and cleanup so they might not be
   considered definitive at all.

This code is a port to recent FreeBSD of Darrell Anderson's network
crashdump support, which was done in the 4.x days.  I can't find a
current website with the original versions but archive.org has a cache
of course:

http://web.archive.org/web/20041204223729/http://www.cs.duke.edu/~anderson/freebsd/netdump/

Quoting from the old readme:

  Netdump provides FreeBSD kernel crash dumping over the network.
  Netdump is a FreeBSD kernel module client and user-level server.

  A normal kernel crash writes a raw dump of memory to a dedicated
  partition (usually the swap partition) using a low-level disk routine,
  and then copies that raw dump into a file (via savecore) during the
  following boot process.

  Netdump replaces the standard dump routine. During a crash, a netdump
  client broadcasts to locate a netdump server, then sends the dump as
  UDP/IP packets (with retransmission after loss). The netdump server
  creates a dump file suitable for gdb. If netdump fails (for example,
  no netdump server is located), a normal disk dump is performed. 

There is cleanup work to be done still, but we plan to have this in
shape for 9.0.

-Ed
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: coherence-problem on the mapped memory buffer

2010-07-29 Thread Alexander Fiveg
On Thursday 29 July 2010 22:16:24 Andriy Gapon wrote:
 on 29/07/2010 23:02 Sergey Babkin said the following:
  Jul 29, 2010 12:58:07 PM, a...@icyb.net.ua wrote:
  on 29/07/2010 19:13 Andriy Gapon said the following:
  on 29/07/2010 17:13 Alexander Fiveg said the following:
 
  In fact I have a suspicion that the problem might have to do with
  multiple mappings of the shared pages, but far from sure...
  Take a look at Intel® 64 and IA-32 Architectures Software Developer’s
  Manual Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4
  Programming the PAT; starting at the following words:
  «The PAT allows any memory type to be specified in the page tables, and
  therefore it is possible to have a single physical page mapped to two or
  more different linear addresses, each with different memory types. Intel
  does not support this practice...»
 
  My guess would be that the memory type is not marked as DMA-capable.
  AFAIK the Intel CPUs do the hardware snooping on the physical addresses,
  so they have no coherency issues benween themselves. However if a DMA
  writer changes the memory, this I think does not get normally propagated
  to the front-side bus, and the CPUs would not see it. You may need to
  either explicitly flush the CPU cache before accessing these pages or
  mark them as non-cacheable.

 My guess was approximately the same - if one mapping is done in kernel for
 DMA purposes, then the memory type is, most likely, set to uncached.  But
 the userland mapping of the same pages most likely marks the same pages
 (via different virtual addresses) as cached.  Depending on the hardware and
 on what mappings were used on a particular CPU (core) to access that
 memory, there could be differences in interaction with DMA.

Thanks a lot for your answers. But  i am afraid i do not have enough 
experience to solve these tasks. Could you please provide me with helpful 
information how to: 
- get access to the pages associated with a certain memory-buffer ? 
I mean, I want to get the structures, that describe the page properties I 
should change (for instance, in order to make the page non-cacheable).

if you are aware of any good papers or examples in the system code, where 
these topics are covered, I would appreciate it if you gave me the 
references. 

Alex
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: coherence-problem on the mapped memory buffer

2010-07-29 Thread Andriy Gapon
on 30/07/2010 00:41 Alexander Fiveg said the following:
 Thanks a lot for your answers. But  i am afraid i do not have enough 
 experience to solve these tasks. Could you please provide me with helpful 
 information how to: 
 - get access to the pages associated with a certain memory-buffer ? 
 I mean, I want to get the structures, that describe the page properties I 
 should change (for instance, in order to make the page non-cacheable).
 
 if you are aware of any good papers or examples in the system code, where 
 these topics are covered, I would appreciate it if you gave me the 
 references. 

I don't have a recipe, but some pointers to get you started:
1. investigate BUS_DMA_NOCACHE, see bus_dma(9)
2. check sys/dev/sound/pci/hda/hdac.c for HDAC_F_DMA_NOCACHE and comment about
PCIe snoop - this might be relevenat
3. see pmap_change_attr for way to change caching type for a memory mapping
4. hope that more knowledgeable people (experts) provide their advice, keep
nudging them via mailing list(s) :-)

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


sched_pin() versus PCPU_GET

2010-07-29 Thread mdf
We've seen a few instances at work where witness_warn() in ast()
indicates the sched lock is still held, but the place it claims it was
held by is in fact sometimes not possible to keep the lock, like:

thread_lock(td);
td-td_flags = ~TDF_SELECT;
thread_unlock(td);

What I was wondering is, even though the assembly I see in objdump -S
for witness_warn has the increment of td_pinned before the PCPU_GET:

802db210:   65 48 8b 1c 25 00 00mov%gs:0x0,%rbx
802db217:   00 00
802db219:   ff 83 04 01 00 00   incl   0x104(%rbx)
 * Pin the thread in order to avoid problems with thread migration.
 * Once that all verifies are passed about spinlocks ownership,
 * the thread is in a safe path and it can be unpinned.
 */
sched_pin();
lock_list = PCPU_GET(spinlocks);
802db21f:   65 48 8b 04 25 48 00mov%gs:0x48,%rax
802db226:   00 00
if (lock_list != NULL  lock_list-ll_count != 0) {
802db228:   48 85 c0test   %rax,%rax
 * Pin the thread in order to avoid problems with thread migration.
 * Once that all verifies are passed about spinlocks ownership,
 * the thread is in a safe path and it can be unpinned.
 */
sched_pin();
lock_list = PCPU_GET(spinlocks);
802db22b:   48 89 85 f0 fe ff ffmov%rax,-0x110(%rbp)
802db232:   48 89 85 f8 fe ff ffmov%rax,-0x108(%rbp)
if (lock_list != NULL  lock_list-ll_count != 0) {
802db239:   0f 84 ff 00 00 00   je 802db33e
witness_warn+0x30e
802db23f:   44 8b 60 50 mov0x50(%rax),%r12d

is it possible for the hardware to do any re-ordering here?

The reason I'm suspicious is not just that the code doesn't have a
lock leak at the indicated point, but in one instance I can see in the
dump that the lock_list local from witness_warn is from the pcpu
structure for CPU 0 (and I was warned about sched lock 0), but the
thread id in panic_cpu is 2.  So clearly the thread was being migrated
right around panic time.

This is the amd64 kernel on stable/7.  I'm not sure exactly what kind
of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.

So... do we need some kind of barrier in the code for sched_pin() for
it to really do what it claims?  Could the hardware have re-ordered
the mov%gs:0x48,%rax PCPU_GET to before the sched_pin()
increment?

Thanks,
matthew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sched_pin() versus PCPU_GET

2010-07-29 Thread mdf
On Thu, Jul 29, 2010 at 4:39 PM,  m...@freebsd.org wrote:
 We've seen a few instances at work where witness_warn() in ast()
 indicates the sched lock is still held, but the place it claims it was
 held by is in fact sometimes not possible to keep the lock, like:

        thread_lock(td);
        td-td_flags = ~TDF_SELECT;
        thread_unlock(td);

 What I was wondering is, even though the assembly I see in objdump -S
 for witness_warn has the increment of td_pinned before the PCPU_GET:

 802db210:       65 48 8b 1c 25 00 00    mov    %gs:0x0,%rbx
 802db217:       00 00
 802db219:       ff 83 04 01 00 00       incl   0x104(%rbx)
         * Pin the thread in order to avoid problems with thread migration.
         * Once that all verifies are passed about spinlocks ownership,
         * the thread is in a safe path and it can be unpinned.
         */
        sched_pin();
        lock_list = PCPU_GET(spinlocks);
 802db21f:       65 48 8b 04 25 48 00    mov    %gs:0x48,%rax
 802db226:       00 00
        if (lock_list != NULL  lock_list-ll_count != 0) {
 802db228:       48 85 c0                test   %rax,%rax
         * Pin the thread in order to avoid problems with thread migration.
         * Once that all verifies are passed about spinlocks ownership,
         * the thread is in a safe path and it can be unpinned.
         */
        sched_pin();
        lock_list = PCPU_GET(spinlocks);
 802db22b:       48 89 85 f0 fe ff ff    mov    %rax,-0x110(%rbp)
 802db232:       48 89 85 f8 fe ff ff    mov    %rax,-0x108(%rbp)
        if (lock_list != NULL  lock_list-ll_count != 0) {
 802db239:       0f 84 ff 00 00 00       je     802db33e
 witness_warn+0x30e
 802db23f:       44 8b 60 50             mov    0x50(%rax),%r12d

 is it possible for the hardware to do any re-ordering here?

 The reason I'm suspicious is not just that the code doesn't have a
 lock leak at the indicated point, but in one instance I can see in the
 dump that the lock_list local from witness_warn is from the pcpu
 structure for CPU 0 (and I was warned about sched lock 0), but the
 thread id in panic_cpu is 2.  So clearly the thread was being migrated
 right around panic time.

 This is the amd64 kernel on stable/7.  I'm not sure exactly what kind
 of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.

 So... do we need some kind of barrier in the code for sched_pin() for
 it to really do what it claims?  Could the hardware have re-ordered
 the mov    %gs:0x48,%rax PCPU_GET to before the sched_pin()
 increment?

So after some research, the answer I'm getting is maybe.  What I'm
concerned about is whether the h/w reordered the read of PCPU_GET in
front of the previous store to increment td_pinned.  While not an
ultimate authority,
http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_systems
implies that stores can be reordered after loads for both Intel and
amd64 chips, which would I believe account for the behavior seen here.

Thanks,
matthew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org