Re: FreeBSD for serious performance?

2012-12-26 Thread Dieter BSD
 If the driver is doing something daft like DELAY(x) in a fast
 interrupt handler which would lead to that behaviour, it should be
 fixed.

 If it's doing a DELAY(x) in a critical section, it shuld be fixed.

They are doing *something* that completely locks out everything else.
It is always a device driver.

 Now, it's quite likely you hit some kind of ata(4) bug which kept it
 in a tight loop

Hard to imagine locking everything out for 19 minutes without being
in a loop.

 So it was likely just spun
 in some high priority loop that nothing lower-priority could really do
 anything about.

Would several different drivers have this same bug?

 The next time it happens, please break into the debugger and grab some
 debugging output. Show alllocks, ps, should be a good couple of things
 to start with.

I've only caught it hanging forever once. It only takes a few
milliseconds to cause incoming data to be lost, so I usually
don't know about it until looking at the log file later. Not
that I could jump into the debugger and gather data in a few
milliseconds even if I knew when it was happening.

BTW, how do I break into the debugger and gather data when all of
the devices are locked out, including the console?

I assume that once it recovers, there is no point in gathering data.

 Alternately - please find a currently actively maintained SATA chipset.

The ata controller is soldered to the mainboard, a gazillion pins
I'm sure, and no doubt requires very specialized equipment to replace,
and I don't know of any pin-compatable replacements. Besides the
hardware itself has never caused any problems. The problem is caused
by the software, it is the software that needs to be fixed.

Ata isn't maintained? Why the bleep not? Disk drivers are essential.

I was under the impression that siis(4) and ahci(4) were actively
maintained? I'm running four sata controllers using three different
drivers and all three drivers lock out other drivers for too long
when something unusual happens.

And other, non-disk drivers have the same problem of locking out
other drivers, even during normal operation. And this happens on
yet other drivers on other people's hardware, not just mine.

 help migrate the nvidia chipset support out of ata(4)

I've looked at several of FreeBSD's device drivers (including,
as you might expect, ata, siis, and ahci) and I can't make
heads or tails out of any of them. Back before FreeBSD existed, I
did manage to make a significant improvement to a driver in a
BSD-derived system, so I'm not a complete idiot.

Several different drivers cause the same problem. Are they all
making the same mistake? Or is there a problem in something
they all use? Whether a design problem or an implementation bug.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: looking for someone to fix humanize_number (test cases included)

2012-12-26 Thread Clifton Royston
On Wed, Dec 26, 2012 at 12:00:01PM +, freebsd-hackers-requ...@freebsd.org 
wrote:
 Date: Tue, 25 Dec 2012 14:52:09 -0500
 From: Eitan Adler li...@eitanadler.com
 To: freebsd-hackers@freebsd.org, John-Mark Gurney j...@funkthat.com
 Subject: Re: looking for someone to fix humanize_number (test cases
   included)
 Message-ID:
   caf6rxgkcodg2ep2pdxjkjcyqzbynre_tpt3cqeygwrtz6ak...@mail.gmail.com
 Content-Type: text/plain; charset=UTF-8
 
 On 25 December 2012 14:46, Clifton Royston clift...@volcano.org wrote:
I correct myself: the function works fine, and there are no bugs I
  could find, though it's clear the man page could emphasize the correct
  usage a bit more.
 
 Can you submit a diff to the man page as well? I figure if you got
 confused at least 10 others got even more confused.

  I'd be happy to, and will do so soon.  I would like to finish rereading
and poking the code a little more first, so I understand and can document how
scale actually works and what it's doing without autoscale set, which is
the actual case which John's tests first brought up.  Right now its results
for some test cases I'm writing don't make much sense to me, particularly
with HN_DIVISOR_1000.

  So far from find+grep under /usr/src it appears to me that every call
to humanize_number() in the code base is correctly passing HN_AUTOSCALE
- i.e. the ability to pass a specific scale is unused - which may be
why this never came up. 

  -- Clifton

-- 
   Clifton Royston  --  clift...@iandicomputing.com / clift...@volcano.org
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD for serious performance?

2012-12-26 Thread Adrian Chadd
Hm, can you come up with a reproducable scenario where this happening?

A lot of times the issues with disk drivers being upset is due to
bad or incorrectly seated SATA cables.

We're willing to help you out if you're willing to delve into the
driver. Just ask questions about how it works and you'll likely get
help :)



Adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD for serious performance?

2012-12-26 Thread Peter Jeremy
On 2012-Dec-25 21:51:14 -0500, Dieter BSD dieter...@engineer.com wrote:
ata(4) completely hung the system for 19 minutes (at which point
I manually intervened, see the PR), probably an infinite loop.

http://www.freebsd.org/cgi/query-pr.cgi?pr=170675

Which contains no useful information.  You've even edited out
system details that are automatically inserted by send-pr.

Please provide a dmesg from a verbose boot of that machine.  What
brand/model motherboard?  What add-in cards do yau have?  What do you
mean by completely hung?  What did you try to do to provoke a
response?  Are you running a GENERIC kernel?  If not, please provide
your kernel configuration.  Please provide the SMART data for ad6
(smartctl -a /dev/ad6).  Where does ad6 connect to the controller?  Do
you use any port-multipliers?  What was the system doing when ad6
detached?  Since the system ran for 24 hours, apparently without you
noticing that ad6 had detached, is ad6 part of a RAID?  If so, what is
the RAID configuration and technology?

Siis(4) and ahci(4) have also caused data loss, presumably by
blocking interrupts for too long.

You're still refusing to provide any useful information that might
allow us to locate the supposed problem.

Improving these drivers would be wonderful. But better yet,
can we please find a way to fix the underlying problem?

What underlying problem?

When a device driver handles an interrupt, it needs to block
further interrupts while it modifies its data structures. Otherwise
another interrupt coming in might cause it to mangle the data.
Right? But! Why does it need to block interrupts for everything?

That depends on how the interrupts are laid out in the hardware.  One
popular approach on cheap motherboards is to have lots of different
devices sharing the same interrupt.  In this case, an interrupt
generated by one device can block interrupts by all other devices
sharing that interrupt.

Alternately, why couldn't the data structures be protected with
a mutex? Then the drivers shouldn't have to block even themselves.

Alternately, why can't drivers have a polling option?

Your patches implementing this functionality appear to have gotten
detached from your mail.  Could you please resend them.  Note that
several ethernet drivers already have a polling option (intended to
avoid livelock issues at high traffic levels on primitive NICs).

Current machines can have multiple disks, multiple Ethernets,
multiple pretty-much-any-device, multiple CPUs, etc. etc.

Which is why it's important to have complete details of the system
when reporting issues since the problem may be caused by an
unexpected interaction between the components.

have this absurd bottleneck where the device drivers bring
everything to a screaching halt every time an interrupt happens.

So you keep claiming without producing any evidence.  Can you please
point to the code that does this.

On 2012-Dec-26 03:48:04 -0500, Dieter BSD dieter...@engineer.com wrote:
They are doing *something* that completely locks out everything else.
It is always a device driver.

So far, you have failed to provide any details to back this claim up.

Hard to imagine locking everything out for 19 minutes without being
in a loop.

I can think of several possibilities:
- broken controller locking up the bus
- deadlock
- clocks stopping (I've seen this in a different scenario)

Would several different drivers have this same bug?

You haven't provided any evidence of a software bug.  If you're seeing
the saem problem across lots of different devices, it suggests a
hardware problem.

I've only caught it hanging forever once. It only takes a few
milliseconds to cause incoming data to be lost,

I'm not sure what you mean by this.  FreeBSD is not a real-time
operating system and so offers no guarantees on how long it will
take before incoming data will be processed.  If you have an
application that relies on incoming data being processed within
milliseconds, you may need to do some redesign.

BTW, how do I break into the debugger and gather data when all of
the devices are locked out, including the console?

Firewire?  Have you verified that the console is locked up and
you can't enter the debugger?

The ata controller is soldered to the mainboard, a gazillion pins
I'm sure, and no doubt requires very specialized equipment to replace,
and I don't know of any pin-compatable replacements. Besides the
hardware itself has never caused any problems. The problem is caused
by the software, it is the software that needs to be fixed.

The limited information you have provided points to a hardware fault,
not a software bug.  If you have evidence that it's a software bug,
please provide it.

Ata isn't maintained? Why the bleep not? Disk drivers are essential.

ata(4) _is_ maintained.  Your particular obsolete ATA controller may
not be.

I was under the impression that siis(4) and ahci(4) were actively
maintained? I'm running four sata controllers using three different
drivers and all 

Cross Compiling of ports Makefiles.

2012-12-26 Thread Michael Vale
Hi, 

For those of you who are aware I’ve been implementing a complete 
cross-compiling series of functions to ports makefiles.

I had a good 3+ week break since my last email with a patch to show, and I’ve 
totally re-written it and have started from scratch.  Not including any of 
Ray’s Zrouter code either.

While it’s still a work in progress, i have outlined the entire system to 
produce target installs into the same staging directory as a bsd system ready 
to be flashed onto NAND for embedded, complete with pkg registry and ldconfig, 
everything has been thought of.   - The reason I have chosen this method for 
the ports to be installed into a tree is so they can be compliled after 
build/install kernel/world and be combined into one firmware image seemlessly.  
Some ports won’t just be optional applications for future embedded firmware 
images, they’ll be an integral part of it.  The goal here is to be able to 
build complete firmware images in one fowl swoop.  Perhaps beyond the scope 
most of you out there but I may wish to pick and choose exclude required parts 
of the BSD system and replace them with the busybox port and replace libc with 
google’s Bionic, uClibc or even musl.  This cannot be achieved currently with 
the likes of tinderbox and pourdiere

It will still be possible to build packages though.

Due to the nature of cross building first i’ll lay out the options and then 
tell you which one I am implementing first as there are reasons for having 
different build-enviornments/toolchains.

Ok, firstly I was going to give you all detail of all possible cross-compiling 
scenarios as I outline them. but I’ll have you know it’s much of a muchness, 
there is the pros and cons to each and every different step, the one i’m about 
to put to you now is the most feature complete and quickest to implement.  That 
doesn’t mean building without a DESTDIR JAIL in the future and just using the 
build system and it’s tools without a new toolchain doesn’t make sense 
(sometimes it does!) and that i’m not going to do it or that I’m not going to 
do a full '’Canadian Cross’.

Ultimately as a goal the minimal command do invoke cross compliation is 
TARGET(_ARCH)=${ARCH} make.

This could go on for hours, so after just deleted to extra paragraphs, i’m 
going to summerise.

first we check for CLANG (as the x-compiler) or if we need to install xdev (bsd 
make of gcc compiled for target arch).
(ok so some of this wont be in Makefile order (upside down and back to front), 
but im just spitting it out as it comes)
if GNU configure is used, it usually pretty good at detecting the compilers 
executable path from the TARGET triple alone, for worse case scenario also set 
${CC}’s path at the beginning of global env ${PATH} to override any subsequent.

pre-chroot: is mostly used to declare global env variables to keep the build 
from failing and making sure the install will complete.

do-chroot: and we have to firstly install and BUILD_DEPENDS, remember these can 
be libraries too and they have to be built with the build machines usual stuff 
and installed in their usual place (lucky we are using a CHROOTED JAIL here! we 
could easy make a mess otherwise) remembering sometimes some depends can be 
both a BUILD dep AND a RUN dep to the TARGET.  That’s okay, they should always 
be declared as correctly and never have to cross-compile a BUILD depend.  
However a BUILD depend can be build twice, (once for the build system) and 
again (as a TARGET) for the TARGET as a RUN depend for the TARGET.

The beauty of doing this work is we can now treat the lib and run depends more 
suitably.  During this process we can strip the libs, exclude the headers and 
change the directory structure to one, save on inodes, and second pkg register, 
libtool and ld require the files are installed into the root tree correctly in 
order for them to build valid databases and register them. Now, BUILD/HOST 
system has already had it’s tail cut off by DESTDIR.  Now there is plenty of 
ways we can install everything into a valid sub-directory and have DESTDIR 
still considered ROOT and PREFIX or LOCALDIR doesn’t have some obscure 
prepending directory that doesn’t exist in the CROSS_STAGING_ROOT.  Some ways 
include adding a variable in bsd.lib.mk and in every single one of make’s 
install targets between ${DESTDIR} and ${LOCALBASE} or ${PREFIX}.  And we could 
include if statements for cross, this would leave it at that and we could go 
ahead and simply install into a sub-directory before pkg, ldconfig and firmware 
image packing occurs, but I’d rather keep all cross-building to bsd.cross.mk 
and include it in bsd.port.mk and instead within DESTDIR do-chroot: re-define 
${DESTDIR} as ${_bldroot}${DESTDIR} and all TARGET_LIBS, RUN_DEPENDS and TARGET 
install in a CHROOTED=no chroot.

Doing the same thing could also prevent the need for a DESTDIR JAIL install at 
all and just use the real build machine’s build env, rather than a jail.  
Regardless.