Re: FreeBSD for serious performance?
If the driver is doing something daft like DELAY(x) in a fast interrupt handler which would lead to that behaviour, it should be fixed. If it's doing a DELAY(x) in a critical section, it shuld be fixed. They are doing *something* that completely locks out everything else. It is always a device driver. Now, it's quite likely you hit some kind of ata(4) bug which kept it in a tight loop Hard to imagine locking everything out for 19 minutes without being in a loop. So it was likely just spun in some high priority loop that nothing lower-priority could really do anything about. Would several different drivers have this same bug? The next time it happens, please break into the debugger and grab some debugging output. Show alllocks, ps, should be a good couple of things to start with. I've only caught it hanging forever once. It only takes a few milliseconds to cause incoming data to be lost, so I usually don't know about it until looking at the log file later. Not that I could jump into the debugger and gather data in a few milliseconds even if I knew when it was happening. BTW, how do I break into the debugger and gather data when all of the devices are locked out, including the console? I assume that once it recovers, there is no point in gathering data. Alternately - please find a currently actively maintained SATA chipset. The ata controller is soldered to the mainboard, a gazillion pins I'm sure, and no doubt requires very specialized equipment to replace, and I don't know of any pin-compatable replacements. Besides the hardware itself has never caused any problems. The problem is caused by the software, it is the software that needs to be fixed. Ata isn't maintained? Why the bleep not? Disk drivers are essential. I was under the impression that siis(4) and ahci(4) were actively maintained? I'm running four sata controllers using three different drivers and all three drivers lock out other drivers for too long when something unusual happens. And other, non-disk drivers have the same problem of locking out other drivers, even during normal operation. And this happens on yet other drivers on other people's hardware, not just mine. help migrate the nvidia chipset support out of ata(4) I've looked at several of FreeBSD's device drivers (including, as you might expect, ata, siis, and ahci) and I can't make heads or tails out of any of them. Back before FreeBSD existed, I did manage to make a significant improvement to a driver in a BSD-derived system, so I'm not a complete idiot. Several different drivers cause the same problem. Are they all making the same mistake? Or is there a problem in something they all use? Whether a design problem or an implementation bug. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: looking for someone to fix humanize_number (test cases included)
On Wed, Dec 26, 2012 at 12:00:01PM +, freebsd-hackers-requ...@freebsd.org wrote: Date: Tue, 25 Dec 2012 14:52:09 -0500 From: Eitan Adler li...@eitanadler.com To: freebsd-hackers@freebsd.org, John-Mark Gurney j...@funkthat.com Subject: Re: looking for someone to fix humanize_number (test cases included) Message-ID: caf6rxgkcodg2ep2pdxjkjcyqzbynre_tpt3cqeygwrtz6ak...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 On 25 December 2012 14:46, Clifton Royston clift...@volcano.org wrote: I correct myself: the function works fine, and there are no bugs I could find, though it's clear the man page could emphasize the correct usage a bit more. Can you submit a diff to the man page as well? I figure if you got confused at least 10 others got even more confused. I'd be happy to, and will do so soon. I would like to finish rereading and poking the code a little more first, so I understand and can document how scale actually works and what it's doing without autoscale set, which is the actual case which John's tests first brought up. Right now its results for some test cases I'm writing don't make much sense to me, particularly with HN_DIVISOR_1000. So far from find+grep under /usr/src it appears to me that every call to humanize_number() in the code base is correctly passing HN_AUTOSCALE - i.e. the ability to pass a specific scale is unused - which may be why this never came up. -- Clifton -- Clifton Royston -- clift...@iandicomputing.com / clift...@volcano.org President - I and I Computing * http://www.iandicomputing.com/ Custom programming, network design, systems and network consulting services ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD for serious performance?
Hm, can you come up with a reproducable scenario where this happening? A lot of times the issues with disk drivers being upset is due to bad or incorrectly seated SATA cables. We're willing to help you out if you're willing to delve into the driver. Just ask questions about how it works and you'll likely get help :) Adrian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD for serious performance?
On 2012-Dec-25 21:51:14 -0500, Dieter BSD dieter...@engineer.com wrote: ata(4) completely hung the system for 19 minutes (at which point I manually intervened, see the PR), probably an infinite loop. http://www.freebsd.org/cgi/query-pr.cgi?pr=170675 Which contains no useful information. You've even edited out system details that are automatically inserted by send-pr. Please provide a dmesg from a verbose boot of that machine. What brand/model motherboard? What add-in cards do yau have? What do you mean by completely hung? What did you try to do to provoke a response? Are you running a GENERIC kernel? If not, please provide your kernel configuration. Please provide the SMART data for ad6 (smartctl -a /dev/ad6). Where does ad6 connect to the controller? Do you use any port-multipliers? What was the system doing when ad6 detached? Since the system ran for 24 hours, apparently without you noticing that ad6 had detached, is ad6 part of a RAID? If so, what is the RAID configuration and technology? Siis(4) and ahci(4) have also caused data loss, presumably by blocking interrupts for too long. You're still refusing to provide any useful information that might allow us to locate the supposed problem. Improving these drivers would be wonderful. But better yet, can we please find a way to fix the underlying problem? What underlying problem? When a device driver handles an interrupt, it needs to block further interrupts while it modifies its data structures. Otherwise another interrupt coming in might cause it to mangle the data. Right? But! Why does it need to block interrupts for everything? That depends on how the interrupts are laid out in the hardware. One popular approach on cheap motherboards is to have lots of different devices sharing the same interrupt. In this case, an interrupt generated by one device can block interrupts by all other devices sharing that interrupt. Alternately, why couldn't the data structures be protected with a mutex? Then the drivers shouldn't have to block even themselves. Alternately, why can't drivers have a polling option? Your patches implementing this functionality appear to have gotten detached from your mail. Could you please resend them. Note that several ethernet drivers already have a polling option (intended to avoid livelock issues at high traffic levels on primitive NICs). Current machines can have multiple disks, multiple Ethernets, multiple pretty-much-any-device, multiple CPUs, etc. etc. Which is why it's important to have complete details of the system when reporting issues since the problem may be caused by an unexpected interaction between the components. have this absurd bottleneck where the device drivers bring everything to a screaching halt every time an interrupt happens. So you keep claiming without producing any evidence. Can you please point to the code that does this. On 2012-Dec-26 03:48:04 -0500, Dieter BSD dieter...@engineer.com wrote: They are doing *something* that completely locks out everything else. It is always a device driver. So far, you have failed to provide any details to back this claim up. Hard to imagine locking everything out for 19 minutes without being in a loop. I can think of several possibilities: - broken controller locking up the bus - deadlock - clocks stopping (I've seen this in a different scenario) Would several different drivers have this same bug? You haven't provided any evidence of a software bug. If you're seeing the saem problem across lots of different devices, it suggests a hardware problem. I've only caught it hanging forever once. It only takes a few milliseconds to cause incoming data to be lost, I'm not sure what you mean by this. FreeBSD is not a real-time operating system and so offers no guarantees on how long it will take before incoming data will be processed. If you have an application that relies on incoming data being processed within milliseconds, you may need to do some redesign. BTW, how do I break into the debugger and gather data when all of the devices are locked out, including the console? Firewire? Have you verified that the console is locked up and you can't enter the debugger? The ata controller is soldered to the mainboard, a gazillion pins I'm sure, and no doubt requires very specialized equipment to replace, and I don't know of any pin-compatable replacements. Besides the hardware itself has never caused any problems. The problem is caused by the software, it is the software that needs to be fixed. The limited information you have provided points to a hardware fault, not a software bug. If you have evidence that it's a software bug, please provide it. Ata isn't maintained? Why the bleep not? Disk drivers are essential. ata(4) _is_ maintained. Your particular obsolete ATA controller may not be. I was under the impression that siis(4) and ahci(4) were actively maintained? I'm running four sata controllers using three different drivers and all
Cross Compiling of ports Makefiles.
Hi, For those of you who are aware I’ve been implementing a complete cross-compiling series of functions to ports makefiles. I had a good 3+ week break since my last email with a patch to show, and I’ve totally re-written it and have started from scratch. Not including any of Ray’s Zrouter code either. While it’s still a work in progress, i have outlined the entire system to produce target installs into the same staging directory as a bsd system ready to be flashed onto NAND for embedded, complete with pkg registry and ldconfig, everything has been thought of. - The reason I have chosen this method for the ports to be installed into a tree is so they can be compliled after build/install kernel/world and be combined into one firmware image seemlessly. Some ports won’t just be optional applications for future embedded firmware images, they’ll be an integral part of it. The goal here is to be able to build complete firmware images in one fowl swoop. Perhaps beyond the scope most of you out there but I may wish to pick and choose exclude required parts of the BSD system and replace them with the busybox port and replace libc with google’s Bionic, uClibc or even musl. This cannot be achieved currently with the likes of tinderbox and pourdiere It will still be possible to build packages though. Due to the nature of cross building first i’ll lay out the options and then tell you which one I am implementing first as there are reasons for having different build-enviornments/toolchains. Ok, firstly I was going to give you all detail of all possible cross-compiling scenarios as I outline them. but I’ll have you know it’s much of a muchness, there is the pros and cons to each and every different step, the one i’m about to put to you now is the most feature complete and quickest to implement. That doesn’t mean building without a DESTDIR JAIL in the future and just using the build system and it’s tools without a new toolchain doesn’t make sense (sometimes it does!) and that i’m not going to do it or that I’m not going to do a full '’Canadian Cross’. Ultimately as a goal the minimal command do invoke cross compliation is TARGET(_ARCH)=${ARCH} make. This could go on for hours, so after just deleted to extra paragraphs, i’m going to summerise. first we check for CLANG (as the x-compiler) or if we need to install xdev (bsd make of gcc compiled for target arch). (ok so some of this wont be in Makefile order (upside down and back to front), but im just spitting it out as it comes) if GNU configure is used, it usually pretty good at detecting the compilers executable path from the TARGET triple alone, for worse case scenario also set ${CC}’s path at the beginning of global env ${PATH} to override any subsequent. pre-chroot: is mostly used to declare global env variables to keep the build from failing and making sure the install will complete. do-chroot: and we have to firstly install and BUILD_DEPENDS, remember these can be libraries too and they have to be built with the build machines usual stuff and installed in their usual place (lucky we are using a CHROOTED JAIL here! we could easy make a mess otherwise) remembering sometimes some depends can be both a BUILD dep AND a RUN dep to the TARGET. That’s okay, they should always be declared as correctly and never have to cross-compile a BUILD depend. However a BUILD depend can be build twice, (once for the build system) and again (as a TARGET) for the TARGET as a RUN depend for the TARGET. The beauty of doing this work is we can now treat the lib and run depends more suitably. During this process we can strip the libs, exclude the headers and change the directory structure to one, save on inodes, and second pkg register, libtool and ld require the files are installed into the root tree correctly in order for them to build valid databases and register them. Now, BUILD/HOST system has already had it’s tail cut off by DESTDIR. Now there is plenty of ways we can install everything into a valid sub-directory and have DESTDIR still considered ROOT and PREFIX or LOCALDIR doesn’t have some obscure prepending directory that doesn’t exist in the CROSS_STAGING_ROOT. Some ways include adding a variable in bsd.lib.mk and in every single one of make’s install targets between ${DESTDIR} and ${LOCALBASE} or ${PREFIX}. And we could include if statements for cross, this would leave it at that and we could go ahead and simply install into a sub-directory before pkg, ldconfig and firmware image packing occurs, but I’d rather keep all cross-building to bsd.cross.mk and include it in bsd.port.mk and instead within DESTDIR do-chroot: re-define ${DESTDIR} as ${_bldroot}${DESTDIR} and all TARGET_LIBS, RUN_DEPENDS and TARGET install in a CHROOTED=no chroot. Doing the same thing could also prevent the need for a DESTDIR JAIL install at all and just use the real build machine’s build env, rather than a jail. Regardless.