Re: enable ECC in OS code?

2009-08-26 Thread Steve Watt
In <4a954a35.4030...@icyb.net.ua>, a...@icyb.net.ua wrote:
>
>Here is a question that I am afraid I know an answer for.
>I have some ECC capable hardware:
>1) Athlon II with embedded memory controller that can do ECC
>2) DRAM modules with ECC
>Assuming that ECC data lanes are connected between the two on motherboard, and
>given that BIOS doesn't perform any ECC setup (nor there is any option to 
>control
>that) - would it be possible to turn on ECC from OS code?
>Or is it too late in the game already?

It's about 100 times easier to have the BIOS do this.  First off, it's
usually quite specific to the chip set exactly how to do it.  Next, if
ECC wasn't enabled previously, the ECC bytes will all be wrong, which
means that you'll have to rewrite all of memory after you've turned it
on.  Oh, and you have to fetch the code that rewrites the ECC from the
memory with incorrect ECC to do that.

If the BIOS is broken to the extent that it doesn't enable ECC on a
system that it should be available, whine at the vendor.

-- 
Steve Watt KD6GGD  PP-ASEL-IA  ICBM: 121W 56' 57.5" / 37N 20' 15.3"
 Internet: steve @ Watt.COM  Whois: SW32-ARIN
   Free time?  There's no such thing.  It just comes in varying prices...
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-08-26 Thread Nate Eldredge

On Wed, 26 Aug 2009, Linda Messerschmidt wrote:


I'm trying to troubleshoot an intermittent Apache performance problem,
and I've narrowed it down using to what appears to be a brief
whole-system hang that last from 0.5 - 3 seconds.  They occur every
few minutes.


One thought would be to use "ps" to try to determine which process, if 
any, is charged with CPU time during the hang.


If you could afford a little downtime, it would be worth seeing if the 
hang occurs in single-user mode (perhaps with a simple program that loops 
calling gettimeofday() and warns when the time between successive 
iterations is large).


I once had a problem like this that I eventually traced to a power 
management problem.  (Specifically, the machine had a modem, and would 
hang for a few seconds whenever the line would ring.  It was apparently 
related to the Wake-On-Ring feature.)  If I remember correctly, disabling 
ACPI made it go away.  So that might be something to try, if rebooting is 
an option.


What are the similarities and differences in hardware and software among 
the affected machines (you mentioned there were several)?


--

Nate Eldredge
neldre...@math.ucsd.edu
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


MBR hack for serial console

2009-08-26 Thread remodeler
I am hoping for input on a patch I want to apply to the MBR of a FreeBSD
8-BETA3 AMD64 server. I need a serial console on this server. The ASUS
motherboard (amibios) has PCI and PCI-e expansion slots, and a Moschip MCS9820
UART (serial board) is installed at pci0:3:5:0. The amibios can be configured
to do the plug-and-play enumeration, or this feature turned off, but there is
no way to assign a particular i/o port to a PCI device in the BIOS, and I
cannot get source for the BIOS to change this behavior. The serial board has a
single Base Address Register at 10h in its pci configuration space. Whether
the PCI bus is probed by the BIOS or FreeBSD, the UART BAR is assigned the
i386 I/O port address of 0xe800. It must be COM1-COM4 (i.e. 0x3F8) to work in
the boot sequence. I need access to the serial console before loader.

I do not expect the hardware configuration to change so a hack is ok. My plan
is to patch the MBR to override the serial card's BAR with 0x3F8. My reasoning
is that the CPU is still in Real mode (allowing direct hardware access) until
loader executes, and the serial console would work for the boot0 and boot2
calls to the terminal. I have experimented with using pciconf to change the
BAR from a command line; curiously the command:

 pciconf -w pci0:3:5:0 16 1016

loads 0x3F9 into the serial card's PCI configuration space instead of 0x3F8,
and I don't understand why. I've worked up this patch and hope someone can
tell me why this would or wouldn't work:

/usr/src/sys/boot/i386/mbr/mbr.s

41,57d40
< # Patch to reconfigure PCI UART's Base Address to COM1
< # I count 40 bytes in opcode  
< #
< startcon: .set PCIADD_PORT,0xcf8  # Load pci config port addy
<   .set PCIDATA_PORT,0xcfc # Load pci data port addy  
<   .set PCIADD,0x8003e810  # Load pci register identifier
<   .set PCIDATA,0x3f8  # Load pci register data  
<
<   pushad  # save double registers
<   mov %ax,$PCIADD # put pci reg to access in ax
<   mov %dx,$PCIADD_PORT# put pci config port in dx
<   out %dx,%ax # send to cpu i/o space
<   mov %ax,$PCIDATA# put pci data in ax
<   mov %dx,$PCIDATA_PORT   # put pci data port in dx
<   out %dx,%ax # send data to cpu i/o space
<   popad   # pop saved registers
< #
166,171c149,151
< #
< # Instruction messages reduced to numbers, saves 60 bytes
< #
< msg_pt:   .asciz "1"  # "Invalid partition table"
< msg_rd:   .asciz "2"  # "Error loading operating 
system"
< msg_os:   .asciz "3"  # "Missing operating system"
---
> msg_pt:   .asciz "Invalid partition table"
> msg_rd:   .asciz "Error loading operating system"
> msg_os:   .asciz "Missing operating system"

Thanks in advance for any help. I am not an assembly coder so am really
uncertain about my patch.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Intermittent system hangs on 7.2-RELEASE-p1

2009-08-26 Thread John Baldwin
On Wednesday 26 August 2009 3:03:13 pm Linda Messerschmidt wrote:
> I'm trying to troubleshoot an intermittent Apache performance problem,
> and I've narrowed it down using to what appears to be a brief
> whole-system hang that last from 0.5 - 3 seconds.  They occur every
> few minutes.

One thing to note is that ktrace only logs voluntary context switches (i.e. 
call to tsleep or waiting on a condition variable).  It specifically does not 
log preemptions or blocking on a mutex, so in theory if your machine was 
livelocked temporarily that might explain this.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Intermittent system hangs on 7.2-RELEASE-p1

2009-08-26 Thread Linda Messerschmidt
I'm trying to troubleshoot an intermittent Apache performance problem,
and I've narrowed it down using to what appears to be a brief
whole-system hang that last from 0.5 - 3 seconds.  They occur every
few minutes.

I took the rather extreme step of doing "ktrace -t cnisuwt -i -d -p 1"
and then I waited for the hang.

This is what I got:

 54937 httpd1251302859.375313 CALL  shutdown(0x3,)
 54937 httpd1251302859.375333 RET   shutdown 0
 54937 httpd1251302859.375348 CALL  select(0x4,0xbfbfe92c,0,0,0xbfbfe9ac)
 54937 httpd1251302859.375363 CSW   stop kernel
 54937 httpd1251302859.376402 CSW   resume kernel
 54937 httpd1251302859.376439 RET   select 1
 54937 httpd1251302859.376453 CALL  read(0x3,0xbfbfe9b4,0x200)
 54937 httpd1251302859.376470 GIO   fd 3 read 0 bytes
 54937 httpd1251302859.376482 RET   read 0
 54937 httpd1251302859.376495 CALL  close(0x3)
 54937 httpd1251302859.376511 RET   close 0
 54937 httpd1251302859.376525 CALL  sigaction(SIGUSR1,0xbfbfebb0,0xbfbfeb98)
 54937 httpd1251302859.376538 RET   sigaction 0
 54937 httpd1251302859.376552 CALL  munmap(0x282ff000,0x11)
 54937 httpd1251302859.376607 RET   munmap 0
 54937 httpd1251302859.376633 CALL  accept(0x11,0xbfbfebf0,0xbfbfec10)
 54937 httpd1251302859.376649 CSW   stop kernel
   796 svscan   1251302859.481064 CSW   resume kernel
 54937 httpd1251302859.489374 CSW   resume kernel
 54937 httpd1251302859.489391 STRU  struct sockaddr { AF_INET,
172.17.0.143:61610 }
 98229 httpd1251302859.601850 CSW   resume kernel
 46517 httpd1251302859.601900 CSW   resume kernel
 98202 httpd1251302859.611661 CSW   resume kernel
   837 nrpe21251302859.622681 CSW   resume kernel
 54454 httpd1251302859.655422 CSW   resume kernel
 54454 httpd1251302859.655443 STRU  struct sockaddr { AF_INET,
172.17.0.131:59011 }
  7182 httpd1251302859.722381 CSW   resume kernel
 98178 httpd1251302859.722438 CSW   resume kernel
   858 gmond1251302859.794996 CSW   resume kernel
   858 gmond1251302859.794998 GIO   fd 5 wrote 0 bytes
   770 ntpd 1251302860.076501 CSW   resume kernel
 98346 httpd1251302860.086261 CSW   resume kernel
 65277 httpd1251302860.086300 CSW   resume kernel
 98514 httpd1251302860.106849 CSW   resume kernel
  7191 httpd1251302860.106894 CSW   resume kernel
   796 svscan   1251302861.403335 RET   nanosleep 0
   796 svscan   1251302861.403370 CALL  wait4(0x,0xbfbfee18,WNOHANG,0)
   796 svscan   1251302861.403405 RET   wait4 0
 54454 httpd1251302861.403481 RET   accept 3
 98229 httpd1251302861.403532 RET   select 0
   796 svscan   1251302861.403553 CALL  stat(0x804a3bb,0xbfbfed6c)
   858 gmond1251302861.403601 GIO   fd 5 read 20 bytes
 54454 httpd1251302861.403619 CSW   stop user
 46517 httpd1251302861.403647 RET   select 0
   858 gmond1251302861.403674 RET   kevent 1
   858 gmond1251302861.403710 CALL  socket(PF_INET,SOCK_DGRAM,IPPROTO_IP)
 98202 httpd1251302861.403714 RET   select 0
   858 gmond1251302861.403752 RET   socket 9
   837 nrpe21251302861.403756 RET   select 0


There is a gap between 1251302860.106894 and 1251302861.403335 of over
one second, and the "effective gap" starts around 1251302859.376649
and thus lasts for about two seconds.

This machine runs Apache and during this sample it was being hit every
0.1 seconds with a test request for a simple static file (in addition
to production traffic).  It is a 2-processor machine that is 85-95%
idle; there's nothing in userspace that runs that long without
yielding.  According to systat, it handles 5000+ syscalls every
second.  But according to ktrace, nothing happens at all during the
hang.  This matches user experience.  (The static file request, which
usually completes in <0.01s suddenly takes 2 seconds as observed from
the remote machine issuing the requests.)

Here's the relevant snip from the httpd process handling that static
file at the time of the hang:

 54937 httpd1251302859.376633 CALL  accept(0x11,0xbfbfebf0,0xbfbfec10)
 54937 httpd1251302859.376649 CSW   stop kernel
 54937 httpd1251302859.489374 CSW   resume kernel
 54937 httpd1251302859.489391 STRU  struct sockaddr { AF_INET,
172.17.0.143:61610 }
 54937 httpd1251302861.403862 RET   accept 3

It's stuck in accept, but does *not* get context-switched away from
during the delay.  (The earlier context switch corresponds to the 0.1
seconds between requests; there is an Apache instance configured to
handle just the test requests with one child process; that process has
nothing else to do or block on.)

I'll include some other processes below.

I think it's weird that all these processes get context-switched-into
before/during the hang, and I wonder if it's a clue.  The kernel is
obviously still running, since it wakes these processes up, but
nothing is happening.  That and the fact that it happens on multiple
machines (though we've only tested this one) 

Re: Deprecating ps(1)s -w switch

2009-08-26 Thread Alex Goncharov
,--- You/Dag-Erling (Wed, 26 Aug 2009 16:20:59 +0200) *
| Tim Kientzle  writes:
| > The difference between "ps", "ps -w", and "ps -ww" is pretty
| > significant for Java, in particular.  Java command lines
| > are typically enormous (thank you, CLASSPATH) which makes
| > "ps -ww" often more annoying than it's worth.
| 
| Java command lines aren't necessarily enormous.  If they are, it is
| because whoever invoked Java didn't know that it respects the CLASSPATH
| environment variable, and that setting -classpath on the command line
| f*s up the user's preferences (e.g. the user may want to replace a
| particular set of classes with an alternative implementation).

Using either the `-classpath' option to `java' or `CLASSPATH'
environment variable is a pretty obsolete practice (whoever does
either these days, should stop and re-think, IMHO.)

The deficiency of the above, in either variation, is the need to list
every `jar' file used, which gets ugly with more than a few files.

A persons who keeps up with modern Java will call it with one or
several of the options:

-Djava.ext.dirs
-Djava.library.path
-Djava.endorsed.dirs

Java Virtual Machine will internally list the files in each of the
directories (specified on the command line or default ones), saving a
user the effort to mention them explicitly in `CLASSPATH'.

This cuts on the length of the command line dramatically, but still
`java' processes' command lines are typically enormously long: even
the lists of the directories, with their absolute paths are
significant; on top of it, `java' is usually invoked with a gazillion
of options modifying JVM's runtime behaviour.

It's a fact of life that for real-life applications, `java' command
lines are *long* -- you can't change that by moving from `-classpath'
to `CLASSPATH'.

(This said, I am not in favor of modifying `ps' in the manner
proposed, as my previous message indicated.)

-- Alex -- alex-goncha...@comcast.net --
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


enable ECC in OS code?

2009-08-26 Thread Andriy Gapon

Here is a question that I am afraid I know an answer for.
I have some ECC capable hardware:
1) Athlon II with embedded memory controller that can do ECC
2) DRAM modules with ECC
Assuming that ECC data lanes are connected between the two on motherboard, and
given that BIOS doesn't perform any ECC setup (nor there is any option to 
control
that) - would it be possible to turn on ECC from OS code?
Or is it too late in the game already?

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Need some help understanding a jail system call.

2009-08-26 Thread Dag-Erling Smørgrav
bert wiley  writes:
> No where in the code do i ever see any access to the jail.h type systems
> calls

Because at that stage in the development process, the system calls in
 belong to the old implementation.

> so does the syscall(375, JAIL_CREATE, argv[1]); actually access the
> jail subsystem and create a jail?

It calls the new system call, which at that stage hasn't been added to
libc yet, because it would conflict with the existing system calls.

> Here is the link i used to find this code
> http://www.watson.org/~robert/freebsd/jailng/

You realize that this is eight years old, right?  And that the jail
infrastructure has been extensively modified since then, and is
currently being rewritten again?

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Need some help understanding a jail system call.

2009-08-26 Thread bert wiley
Hello
   I found this code under a project called jailNG which has some system
calls for doing jail stuff. Im still new to freebsd and im stumped on what
this code is actually doing. In the source from the project there are few
function calls that look like it creates and access the jail layer. Here is
an example




#define JAIL_CREATE 1
#define JAIL_DESTROY2
#define JAIL_JOIN   3

extern char *environ[];

static void
usage(void)
{

  fprintf(stderr, "usage:\n");
  fprintf(stderr, "  jailctl create [jailname]\n");
  fprintf(stderr, "  jailctl destroy [jailname]\n");
  fprintf(stderr, "  jailctl join [jailname] [-c chrootpath] [path] "
  "[cmd] [args...]\n");

  exit(-1);
}

static int
jail_create(int argc, char *argv[])
{
  int error;

  if (argc < 2)
usage();

  error = syscall(375, JAIL_CREATE, argv[1]);
  if (error)
perror("jailconf().create");
  return (error);
}



No where in the code do i ever see any access to the jail.h type systems
calls, so does the syscall(375, JAIL_CREATE, argv[1]); actually access the
jail subsystem and create a jail?

Here is the link i used to find this code
http://www.watson.org/~robert/freebsd/jailng/



Any help on this question is appreciated thanks.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Deprecating ps(1)s -w switch

2009-08-26 Thread Dag-Erling Smørgrav
Ivan Radovanovic  writes:
> I think software should evolve to be better rather then to stick with
> something done the wrong way, even that has been done maybe 30 years
> ago - that is why behavior should be changed. It is never too late to
> do the right thing ;-)

Are you also going to rewrite 30 years' worth of scripts that expect
ps(1) to have a -w option which behaves in a particular manner?

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Deprecating ps(1)s -w switch

2009-08-26 Thread Dag-Erling Smørgrav
Tim Kientzle  writes:
> The difference between "ps", "ps -w", and "ps -ww" is pretty
> significant for Java, in particular.  Java command lines
> are typically enormous (thank you, CLASSPATH) which makes
> "ps -ww" often more annoying than it's worth.

Java command lines aren't necessarily enormous.  If they are, it is
because whoever invoked Java didn't know that it respects the CLASSPATH
environment variable, and that setting -classpath on the command line
f*s up the user's preferences (e.g. the user may want to replace a
particular set of classes with an alternative implementation).

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


NMI running X with dual-monitor

2009-08-26 Thread ivan anyukov
Hi guys,
I'm running 7.2-STABLE on a Thinkpad T60.
When connecting a second monitor to my docking station sometimes my FreeBSD
freezes.
kgdb on the vmcore-file says "non-maskable interrupt trap"
Some details:
X.Org 1.5.3 using the radeon-Driver
I think the problem appears when moving xterms from the first to the second
monitor (or back). The mouse cursor looks _very_ strange then and after some
minutes the whole system freezes.
Does anyone know about the problem? Is it a hardware-failure for sure?
Thanks a lot!
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: AMD SB700 SMBus controller driver

2009-08-26 Thread Andriy Gapon
on 26/08/2009 01:27  said the following:
> Could you please forward me the patch to make it work in polling mode ? I'd 
> like to test it as I've been trying to make intpm work with a SB400 (which 
> should be quite the same as yours) but system hangs when I try to force 
> polling mode (didn't have the specs nor all the differences you just 
> presented). And btw, I didn't find any implementation using interrupt 
> neither but I'm ready to test your updated version.

[what charset/encoding was your email?]

Please see:
http://people.freebsd.org/~avg/ga-ma780g-ud3h/intpm.diff

The patch is work-in-progress and is not clean for this reason (style
violations, experimental hacks)

What the patch does:
1. redefine PCI_INTR_SMB_IRQ9 to 2 (bit 1)
2. disable writing to PCIR_INTLINE
3. add PCI id of my hardware
4. attempt to use IRQ mode with interrupt 20 - doesn't work
5. force polling mode - seems to work

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: GA-MA780G-UD3H motherboard

2009-08-26 Thread Andriy Gapon
on 25/08/2009 21:34 Sam Fourman Jr. said the following:
>> Meanwhile, if you interested in any information about this motherboard - data
>> dumps, outputs from tools, etc - please let me know, I will try my best to 
>> provide
>> that.
> 
> it would be interesting to see a dmesg as a starting point.

Please see http://people.freebsd.org/~avg/ga-ma780g-ud3h/
Replying to the other email - I use amd64 arch.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Partial kvm dumps

2009-08-26 Thread Bruce Cran
On Mon, 24 Aug 2009 10:45:58 +0300
Mikolaj Golub  wrote:

> http://code.google.com/p/trociny/downloads/list
> 
> I would like to hear what other people think about this. It looks
> very useful for me. At least as a first step it would be nice to
> extend KVM to work with partial dumps so the users could try this and
> see if it turned out to be useful.

Having recently been debugging core dump support in the base system
utilities I spotted what looks like a bug in your code: the 'execfile'
parameter to kvm_open or kvm_openfiles should be NULL if you want to
use the kernel from the running system; some people may not be running
a kernel from "/boot/kernel/kernel" by default.

-- 
Bruce
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Deprecating ps(1)s -w switch

2009-08-26 Thread Jonathan McKeown
On Tuesday 25 August 2009 22:51:43 Rick C. Petty wrote:
> On Tue, Aug 25, 2009 at 04:09:09PM +0200, Jonathan McKeown wrote:
> > I usually want to see ps(1) output in easily-read columns. Without width
> > limits, this can't be guaranteed.
> >
> > I would strongly object to the complete removal of any option to limit
> > the output width of ps(1) and make it easily human-readable.
> >
> > I'm also astonished at the suggestion that not using -ww is ``a
> > mistake''. I very seldom need to see the whole commandline for every
> > process.
>
> Then you must not use Java much.  I almost always need the -ww option.
> I'm fine with the default being "fit into my terminal width", but I'd be
> for one option to specify limited width and another option (-w) to
> specify "as wide as possible".

As it happens, you're right: I don't use Java at all. Neither do I object 
(much) to a change in the default behaviour such that wide output is the norm 
and restricted-width an option.

In the original message, Brian Somers wrote:

> The suggestion is that ps's -w switch is a strange artifact that can
> be safely deprecated.  ps goes to great lengths to implement width
> limitations, and any time I've seen people not using -ww has either
> been a mistake or doesn't matter.  Using 'cut -c1-N' is also a great
> way of limiting widths if people really want that...
>
> I'd like to propose changing ps so that width limits are removed and
> '-w' is deprecated - ignored for now with a note in the man page
> saying that it will be removed in a future release.

The suggestion seems to be to remove the width-limiting code altogether, and 
make people who want width-restricted output (for example to keep it in 
columns which are easily scanned by eye) pipe the output through another 
command. That I do object to.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"