Re: SATA-performance: Linux vs. FreeBSD
Hello, Martin. Martin A. Fink wrote: TestOpenSuSE(AHCI) FreeBSD(AHCI) --- SSD(vfat 25GB) 41+/-2 MB/s at 4-10%15+/-0 MB/s at 2% CPU SSD(raw 25GB) 26+/-1 MB/s at 4-10% CPU48+/-0 MB/s at 1% CPU SSD(ext3 25GB) 39+/-5 MB/s at 10-15% CPU 34+/-0 MB/s at 14% CPU SSD(ext2 25GB) 42+/-1 MB/s at 10-15% CPU 32+/-0 MB/s at 10% CPU --- TestOpenSuSE (AHCI off) FreeBSD (AHCI off) --- SSD(vfat 25GB) 22+/-4 MB/s at 6-19% CPU-- SSD(raw 25GB) 33+/-4 MB/s at 7-14% CPU41+/-0 MB/s at 1% CPU SSD(ext2 25GB) 27+/-6 MB/s at 6-14% CPU-- --- Question 1: Can anybody explain to me, why writing to a SATA-I device with AHCI consumes so much CPU time using Linux, while it takes almost no CPU time on FreeBSD 6.2 ? Especially comparing values of writing to the raw device? Can't tell. AHCI needs very few MMIOs to perform each request. As Andi suggested, please do oprofile. It's easy. Question 2: Can anybody explain to me, why writing to a solid state disk (a kind of memory that always has the same constant bandwidth) has such big standard errors in writing rate using Linux (between 1 to 6 MB/s error) while FreeBSD gives an almost constant writing rate (as one would expect it for a SSD) ? The default iosched is heavily optimized for regular disks with moving head and for more usual workload. Requests are sometimes paused to wait for requests in adjacent area. Use deadline or noop for ssd. Also, try turn off NCQ. Some of early drives from major disk vendors had all kinds of issues with NCQ implementation. SSD firmwares don't tend to be of high quality. Question 3: Why is writing to a raw device in Linux slower than using e.g. ext2 ? And why is Linux writing rate much lower (-12.5 % for the best case) compared to writing rate of FreeBSD? As written above, the first thing I can think of is interaction with iosched. SSD and your workload are pretty unusual. Question 4: When writing to the SATA-II HDD Linux is around 10% slower than FreeBSD when using ext3, but around as fast as FreeBSD when writing raw. Why? Dunno much about that. Where's the test result? How can I improve the speed of Linux, Other ppl have pointed out but use /dev/sdX not the raw devices. If you use raw, you end up writing each chunk synchronously. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Arjan van de Ven wrote: The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has all drivers but access to harddisk is unpredictable and thus unreliable! What can I do?? there's several tunables you can do; 1) increase /sys/block//queue/nr_requests the linux default is on the low side 2) investigate other elevators; cfq is great for interactive use but not so great for max throughput. you can do this by echo'ing "deadline" into /sys/block//scheduler I'd suggest trying the noop scheduler with your ram based devices. I don't see why these devices would need clever scheduling. ...but prove me wrong if you will. I haven't tested this. echo noop > /sys/block//queue/scheduler If you don't need journaling EXT2 might be a good choice. But, I'd also like to re-iterate the XFS filesystem recommendation given several times now as well. There are many tunables that /may/ help during filesystem creation. Block size (-b) set to it's maximum would prob. help. If you're sure you can not encounter power issues: mount -t xfs -o nobarrier /dev/ /mount-point Here's some more general reading for ya: Troubleshooting Linux Performance Issues: http://www.phptr.com/articles/article.asp?p=481867&seqNum=2&rl=1 -- Jeffrey Hundstad - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
On 02/12/07 08:37, Martin A. Fink wrote: > :~> strace -c -T -o trace.out dd if=/dev/zero of=test.txt bs=10MB count=200 > > 200+0 Datensätze ein > 200+0 Datensätze aus > 20 bytes (2,0 GB) copied, 52,8632 seconds, 37,8 MB/s You might want to check the raw write & read speed to the device without a filesystem. Also, your previous email didn't include xfs. xfs has very good sustained write performance. dd if=/dev/zero of=/dev/sdX bs=10MB count=200 dd of=/dev/null if=/dev/sdX bs=10MB count=200 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
On Tue, Feb 13, 2007 at 11:25:27AM +, Alan wrote: > > So where is the difference between SATA-I and SATA-II ? > > All physical side if they are on the same controller when you do the > tests. Mostly latency, SATA-II is a highly confusing marketing term. It is /not/ a technical term. In some cases there are NO differences between SATA-I and SATA-II. You can find 1.5Gbps non-NCQ-supporting devices claiming SATA-II. Similarly, there is no "SATA version" word in the IDENTIFY DEVICE page, like there are "ATA version" words. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
> data actually did make it to the media, I wouldn't necessary assume > that it has. Given that it sounds like you really care about this, > I'd suggest that you explicitly testing this before making > assumptions. FreeBSD 6.1 appears to get it right for some subsets of devices so it seems a reasonable assumption at first glance - I did actually look the BSD bits up to check. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
On Tue, Feb 13, 2007 at 01:32:34PM +0100, Martin A. Fink wrote: > > Does the FreeBSD fsync sync to media ? Also what controller is being used > > here, and do you have EHCI USB support running ? > > Manual of FreeBSD fsync says it syncs to media. That didn't answer the question. With SATA in particular, just because you flush it to the *disk*, doesn't mean that you've flushed it to the *media*, unless the OS is explicitly giving an command to the disk to do so. If you haven't done any tests where you sync a huge amount of data on FreeBSD, and then immediate manually kick the power plug out of the wall, and then checking to make sure all of the data actually did make it to the media, I wouldn't necessary assume that it has. Given that it sounds like you really care about this, I'd suggest that you explicitly testing this before making assumptions. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Martin A. Fink wrote: >> The needed total bandwidth may be to high and at least the incoming part via > GigE may have serious overhead. >> 150MB/s in via (at least 2) GigE, without Zero-Copy there is another 150MB/s > memory to memory. >> Then there is the next 150MB/s memory to the discs, without Zero-Copy there > also another 150MB/s memory to memory. >> In total that's 300MB/s to 600MB/s without any processing. > > I dont understand your calculation: from 3 GE ports come around 50 MB/each. > These altogether 150MB/s have to be copied to memory. From there they will be > copied to disk. So we talk about 2x150 MB/s running through my system. That > is less than 2 PCIe lanes can handle... And there are more than 2 lanes > between north and south bridge It may be that the TCP/IP-Stack has to copy the data around. But someone that knows the inner workings would have to answer this. That may also depend on the used NIC. Also the data doesn't appear 'en bloc', but arrives over a period of time, so you have more or less big "gaps" in the processing. Especially the "gaps" can considerably lower total achievable bandwidth. A little naive fallacy (According to dict.leo.org a translation for: Milchmädchenrechnung): You get a package of work every (say) 1ms and you (say) need .2ms for processing, shoveling and writing to disc. Then there is no way you can saturate more than 1/5 of total theoretical bandwidth, because 80% of the time you are waiting for more work to come. -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Am Dienstag, 13. Februar 2007 13:24 schrieben Sie: > Martin A. Fink wrote: > > >> Also you have skipped the information how the images "arrive" on the system > > (PCI(e) card?), that may be important for an "end to end" view of the > > problem. > > > > Images arrive via Gigabit Ethernet. GigE Vision standard. (PCIe x4) > > The the next question is: ChipSet/Used Protocol/JumboFrames/(NAPI)/... . > > Have you already determined the load caused by this part? > Depending on the GigE-Chipset, and Protocol/JumboFrames/(NAPI)/..., the involved overhead can be quite serious. > > >> And what's also missing. What is "a long period of time". > >> Calculating best-case with the SSD: > >> 27GB divided by 30MB/s only gives a bit more than 15 Minutes. > >> And worst case with 50MB/s is less than 10 Minutes. > > > > Well. The testdrive has 27GB. The final drive will have 225 GB. And there will > > be 3 cameras and thus 3 disks. This means we talk about 140 MB/s for around > > 90 minutes. > > For space applications with low power but high performance this is a long > > time... ;-) > > The MB/CPU/RAM will be the one specified in the first mail? > My gut feeling says: Forget it. > > The needed total bandwidth may be to high and at least the incoming part via GigE may have serious overhead. > 150MB/s in via (at least 2) GigE, without Zero-Copy there is another 150MB/s memory to memory. > Then there is the next 150MB/s memory to the discs, without Zero-Copy there also another 150MB/s memory to memory. > In total that's 300MB/s to 600MB/s without any processing. I dont understand your calculation: from 3 GE ports come around 50 MB/each. These altogether 150MB/s have to be copied to memory. From there they will be copied to disk. So we talk about 2x150 MB/s running through my system. That is less than 2 PCIe lanes can handle... And there are more than 2 lanes between north and south bridge > > But on the other hand, hdparm -T says my system (Core2Duo E6700, FSB1066, 2GB DDR2-800 RAM, 32Bit) has a buffer-cache bandwidth around 4000MB/s. > As you don't said which FSB and Memory-Type you have i would guess that your system should reach between 2000MB/s and 3500MB/s of LINEAR(!) memory bandwidth. > (Total usable Memory-Bandwidth is unfortunately also dependent on usage pattern. Large & linear is not as important as with a rotating HDD, but it factors in) > > > > Btw. On the topic of filesystem and Linux performance: > SGI did a "really big" test some time ago width a big iron having 24 Itanium2-CPUs in 12 nodes, and 12*2 GB of ram and having 256 discs using XFS(Which is from SGI!). > The pdf-file is here: > http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-paper.pdf > > According the the paper the system had a theoretical peak IO-performance of 11.5 GB/s and practically peaked at 10.7GB/s reading and 8.9GB/s writing. > IOW Linux and XFS CAN perform quite well, but the system has to have enough muscle for the job. > And since the paper (and Kernel 2.6.5) the development of Linux hasn't stopped. > > > > -- > Real Programmers consider "what you see is what you get" to be just as > bad a concept in Text Editors as it is in women. No, the Real Programmer > wants a "you asked for it, you got it" text editor -- complicated, > cryptic, powerful, unforgiving, dangerous. > > -- Dipl. Physiker Martin Anton Fink Max Planck Institute for extraterrestrial Physics Giessenbachstrasse 85741 Garching Germany Tel. +49-(0)89-3-3645 Fax. +49-(0)89-3-3569 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Am Dienstag, 13. Februar 2007 12:25 schrieben Sie: > > Well they do. The Flash disk I have (SATA-I) is capable of 48 MB/s and this > > value is reached over the whole disk size by windows as well as by FreeBSD. > > See my test results in the first thread. > > Ok a flash disk should be more stable > > > My Seagate Barracuda Harddisk drive (SATA-II) starts with 76 MB/s and > > decreases linearly to 35 MB/s due to the fact that it has to write to a > > rotating disk. But on a flash disk there is nothing rotating... > > The hard disk one isn't guaranteed or stable but the flash especially if > it is aimed at it ought to behave. > > > So where is the difference between SATA-I and SATA-II ? > > All physical side if they are on the same controller when you do the > tests. Mostly latency, > > > And why is FreeBSD able to write with constant rates (the complete 25 GB, all > > with 48+/-0.1 MB/s) but Linux 2.6.18 not ? > > Does the FreeBSD fsync sync to media ? Also what controller is being used > here, and do you have EHCI USB support running ? Manual of FreeBSD fsync says it syncs to media. I used the same controller: Same computer, same harddisk. two partitions on the system disk, one for linux, one for freebsd. EHCI: ehci_hcd :00:1d.7: EHCI Host Controller ehci_hcd :00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: Product: EHCI Host Controller AHCI ahci :00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode > > > With a dedicated (rotating) SATA II device, using the first 70% of disk space > > no problem -- tested ! With a SATA-I device only a problem with Linux 2.6.18 > > I suspect the SATA-1 itself may not be the decider but something else - > eg the hard disk using NCQ, which would cover up any latency related > problems. > > > Journaling of data: you are right, ext2 performs better than ext3. > > And ext3 in writeback mode ought in theory (but practice is always > harder ;)) be faster than ext2. > > -- Dipl. Physiker Martin Anton Fink Max Planck Institute for extraterrestrial Physics Giessenbachstrasse 85741 Garching Germany Tel. +49-(0)89-3-3645 Fax. +49-(0)89-3-3569 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Martin A. Fink wrote: >> Also you have skipped the information how the images "arrive" on the system > (PCI(e) card?), that may be important for an "end to end" view of the > problem. > > Images arrive via Gigabit Ethernet. GigE Vision standard. (PCIe x4) The the next question is: ChipSet/Used Protocol/JumboFrames/(NAPI)/... . Have you already determined the load caused by this part? Depending on the GigE-Chipset, and Protocol/JumboFrames/(NAPI)/..., the involved overhead can be quite serious. >> And what's also missing. What is "a long period of time". >> Calculating best-case with the SSD: >> 27GB divided by 30MB/s only gives a bit more than 15 Minutes. >> And worst case with 50MB/s is less than 10 Minutes. > > Well. The testdrive has 27GB. The final drive will have 225 GB. And there > will > be 3 cameras and thus 3 disks. This means we talk about 140 MB/s for around > 90 minutes. > For space applications with low power but high performance this is a long > time... ;-) The MB/CPU/RAM will be the one specified in the first mail? My gut feeling says: Forget it. The needed total bandwidth may be to high and at least the incoming part via GigE may have serious overhead. 150MB/s in via (at least 2) GigE, without Zero-Copy there is another 150MB/s memory to memory. Then there is the next 150MB/s memory to the discs, without Zero-Copy there also another 150MB/s memory to memory. In total that's 300MB/s to 600MB/s without any processing. But on the other hand, hdparm -T says my system (Core2Duo E6700, FSB1066, 2GB DDR2-800 RAM, 32Bit) has a buffer-cache bandwidth around 4000MB/s. As you don't said which FSB and Memory-Type you have i would guess that your system should reach between 2000MB/s and 3500MB/s of LINEAR(!) memory bandwidth. (Total usable Memory-Bandwidth is unfortunately also dependent on usage pattern. Large & linear is not as important as with a rotating HDD, but it factors in) Btw. On the topic of filesystem and Linux performance: SGI did a "really big" test some time ago width a big iron having 24 Itanium2-CPUs in 12 nodes, and 12*2 GB of ram and having 256 discs using XFS(Which is from SGI!). The pdf-file is here: http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-paper.pdf According the the paper the system had a theoretical peak IO-performance of 11.5 GB/s and practically peaked at 10.7GB/s reading and 8.9GB/s writing. IOW Linux and XFS CAN perform quite well, but the system has to have enough muscle for the job. And since the paper (and Kernel 2.6.5) the development of Linux hasn't stopped. -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
On Tue, 13 February 2007 11:29:18 +0100, Martin A. Fink wrote: > > Please Read Carefully! I talk about flash disk, not normal harddisks. There > are no mechanical parts in flash disks, only flash memory. And therefore > 48MB/s is excellent (compared to all other available disks) > > [...] > > Well. The testdrive has 27GB. The final drive will have 225 GB. And there > will > be 3 cameras and thus 3 disks. This means we talk about 140 MB/s for around > 90 minutes. Do you have any numbers on the performance for the final drive? Single flash chips are relatively slow, the high bandwidth is usually achieved by writing in parallel to several of them. With the bigger drive you get more chips and the manufacturer could run more of them in parallel. Jörn -- With a PC, I always felt limited by the software available. On Unix, I am limited only by my knowledge. -- Peter J. Schoenster - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
On Tue, 13 February 2007 11:27:58 +, Alan wrote: > > isn't yet a heavily optimised libata path. Secondly erase block size > matters with flash drives so the bigger each I/O the better erase block > behaviour we should get. Although that should max out somewhere between 16KiB and 128KiB, depending on the chips being used. Jörn -- If you're willing to restrict the flexibility of your approach, you can almost always do something better. -- John Carmack - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
> there's several tunables you can do; > 1) increase /sys/block//queue/nr_requests >the linux default is on the low side > 5) echo a larger value into /sys/block//queue/max_sectors_kb >the default seems to be 512 which is... really low. The hw max is in >another file in that directory; if you want max throughput set the >max_sectors_kb value to the hw max. (you pay in terms of fairness for There are two more factors that play into #1 and #5. Firstly there is a per command completion overhead in ATA without NCQ being active and that isn't yet a heavily optimised libata path. Secondly erase block size matters with flash drives so the bigger each I/O the better erase block behaviour we should get. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
> Well they do. The Flash disk I have (SATA-I) is capable of 48 MB/s and this > value is reached over the whole disk size by windows as well as by FreeBSD. > See my test results in the first thread. Ok a flash disk should be more stable > My Seagate Barracuda Harddisk drive (SATA-II) starts with 76 MB/s and > decreases linearly to 35 MB/s due to the fact that it has to write to a > rotating disk. But on a flash disk there is nothing rotating... The hard disk one isn't guaranteed or stable but the flash especially if it is aimed at it ought to behave. > So where is the difference between SATA-I and SATA-II ? All physical side if they are on the same controller when you do the tests. Mostly latency, > And why is FreeBSD able to write with constant rates (the complete 25 GB, all > with 48+/-0.1 MB/s) but Linux 2.6.18 not ? Does the FreeBSD fsync sync to media ? Also what controller is being used here, and do you have EHCI USB support running ? > With a dedicated (rotating) SATA II device, using the first 70% of disk space > no problem -- tested ! With a SATA-I device only a problem with Linux 2.6.18 I suspect the SATA-1 itself may not be the decider but something else - eg the hard disk using NCQ, which would cover up any latency related problems. > Journaling of data: you are right, ext2 performs better than ext3. And ext3 in writeback mode ought in theory (but practice is always harder ;)) be faster than ext2. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Am Dienstag, 13. Februar 2007 11:16 schrieben Sie: > Martin A. Fink wrote: > > Am Dienstag, 13. Februar 2007 00:31 schrieben Sie: > >> Martin A. Fink wrote: > >>> I have to store big amounts of data coming from 2 digital cameras to disk. > >>> Thus I have to write blocks of around 1 MB at 30 to 50 frames per second > >>> for > >>> a long period of time. So it is important for me that the harddisk drive > >>> is > >>> reliable in the sense of "if it is capable of 50 MB/s then it should > >>> operate > >>> at this speed. Constantly." > >> The good old handful of suggestions: > >> > >> - Use a dedicated disc for the task. > > > > I used a dedicated disk for this task. No one else besides the task is writing > > to it! > > OK. > > >> - Use an empty disc so there is no fragmentation. > > > > All tests were performed on empty disk! > > OK. > > >> - Buy a bigger disk, they have high bandwidths. > > > > I have a flash disk from a manufacturer who grants me 48 MB/s. And FreeBSD as > > well as Windows reach this value. Only Linux 2.6.18 is far away from it (42 > > MB/s) > > Even 48MB/s is quite low. > I've reached up to 70MB/s with a single 500GB Seagate model and even my older HDDs all reach 60MB/s (at least on the outer cylinders) > But i haven't tested any "sync/fsync" in between, only after. Please Read Carefully! I talk about flash disk, not normal harddisks. There are no mechanical parts in flash disks, only flash memory. And therefore 48MB/s is excellent (compared to all other available disks) > > >> - Buy a more "specialized" disc. > > > > see above > > > >> for e.x.: Western Digital Raptor X(*) a 150GB, 10-KRPM S-ATA disc. > >> - Buy several discs and use RAID 0 > >> or alternate between discs when writing. > > > > What I have to build is an application for the International Space Station > > ISS. I am limited with power and space. So If the disk is able to write > > constantly 48 MB/s then the Operating System should do this! > > OK. That appears to be a serious constraint. > Do HDDs cope well with zero gravity? Yes and no. Yes: standard desktop HDDs are unproblematic. Laptop HDDs have g-force shock hardware that works on zero-g detection and thus Laptop HDDs can't be used in space. At least modern ones can't... > At least the SSD won't have a problem with that. ;-) > > > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has > > all drivers but access to harddisk is unpredictable and thus unreliable! > > What can I do?? > > Personally i haven't had such bad write speeds in years. Taking USB connected and/or encrypted partitions aside. > But on the other hand: I don't sync(fsync) until i have to. If you don't have to - no problem. But if you use filesystem you do a fsync every time you close the file (and filesize is less then 1-2 GB) > And personally i have good (and constant bandwidth) experience using XFS as a filesystem. > (I have 41 HDDs with a total capacity of 10.5 TB, performance is quite important for me.) > > Also you have skipped the information how the images "arrive" on the system (PCI(e) card?), that may be important for an "end to end" view of the problem. Images arrive via Gigabit Ethernet. GigE Vision standard. (PCIe x4) > > And what's also missing. What is "a long period of time". > Calculating best-case with the SSD: > 27GB divided by 30MB/s only gives a bit more than 15 Minutes. > And worst case with 50MB/s is less than 10 Minutes. Well. The testdrive has 27GB. The final drive will have 225 GB. And there will be 3 cameras and thus 3 disks. This means we talk about 140 MB/s for around 90 minutes. For space applications with low power but high performance this is a long time... ;-) > > > > > > -- > Real Programmers consider "what you see is what you get" to be just as > bad a concept in Text Editors as it is in women. No, the Real Programmer > wants a "you asked for it, you got it" text editor -- complicated, > cryptic, powerful, unforgiving, dangerous. > > -- Dipl. Physiker Martin Anton Fink Max Planck Institute for extraterrestrial Physics Giessenbachstrasse 85741 Garching Germany Tel. +49-(0)89-3-3645 Fax. +49-(0)89-3-3569 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
On Tue, 2007-02-13 at 12:18 +0100, Andi Kleen wrote: > Arjan van de Ven <[EMAIL PROTECTED]> writes: > > > > > > > > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux > > > has > > > all drivers but access to harddisk is unpredictable and thus unreliable! > > > What can I do?? > > > > > > there's several tunables you can do; > > [...] Well Linux certainly should perform better out of the box > on such a simple configuration. no argument from me there; first need to find out which piece is wrong > > Something is wrong especially when the CPU usage is so high. I'll buy that, yet there's plenty of cpu time available so that shouldn't be all that much of a limit on the throughput... there's still headroom -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Arjan van de Ven <[EMAIL PROTECTED]> writes: > > > > > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux > > has > > all drivers but access to harddisk is unpredictable and thus unreliable! > > What can I do?? > > > there's several tunables you can do; [...] Well Linux certainly should perform better out of the box on such a simple configuration. Something is wrong especially when the CPU usage is so high. That is why I suggested oprofile. Perhaps contact linux-ide@vger.kernel.org (if the results show driver problems) and [EMAIL PROTECTED] (otherwise) with the results. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Martin A. Fink wrote: > Am Dienstag, 13. Februar 2007 00:31 schrieben Sie: >> Martin A. Fink wrote: >>> I have to store big amounts of data coming from 2 digital cameras to disk. >>> Thus I have to write blocks of around 1 MB at 30 to 50 frames per second >>> for >>> a long period of time. So it is important for me that the harddisk drive >>> is >>> reliable in the sense of "if it is capable of 50 MB/s then it should >>> operate >>> at this speed. Constantly." >> The good old handful of suggestions: >> >> - Use a dedicated disc for the task. > > I used a dedicated disk for this task. No one else besides the task is > writing > to it! OK. >> - Use an empty disc so there is no fragmentation. > > All tests were performed on empty disk! OK. >> - Buy a bigger disk, they have high bandwidths. > > I have a flash disk from a manufacturer who grants me 48 MB/s. And FreeBSD as > well as Windows reach this value. Only Linux 2.6.18 is far away from it (42 > MB/s) Even 48MB/s is quite low. I've reached up to 70MB/s with a single 500GB Seagate model and even my older HDDs all reach 60MB/s (at least on the outer cylinders) But i haven't tested any "sync/fsync" in between, only after. >> - Buy a more "specialized" disc. > > see above > >> for e.x.: Western Digital Raptor X(*) a 150GB, 10-KRPM S-ATA disc. >> - Buy several discs and use RAID 0 >> or alternate between discs when writing. > > What I have to build is an application for the International Space Station > ISS. I am limited with power and space. So If the disk is able to write > constantly 48 MB/s then the Operating System should do this! OK. That appears to be a serious constraint. Do HDDs cope well with zero gravity? At least the SSD won't have a problem with that. ;-) > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has > all drivers but access to harddisk is unpredictable and thus unreliable! > What can I do?? Personally i haven't had such bad write speeds in years. Taking USB connected and/or encrypted partitions aside. But on the other hand: I don't sync(fsync) until i have to. And personally i have good (and constant bandwidth) experience using XFS as a filesystem. (I have 41 HDDs with a total capacity of 10.5 TB, performance is quite important for me.) Also you have skipped the information how the images "arrive" on the system (PCI(e) card?), that may be important for an "end to end" view of the problem. And what's also missing. What is "a long period of time". Calculating best-case with the SSD: 27GB divided by 30MB/s only gives a bit more than 15 Minutes. And worst case with 50MB/s is less than 10 Minutes. -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
> > > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has > all drivers but access to harddisk is unpredictable and thus unreliable! > What can I do?? there's several tunables you can do; 1) increase /sys/block//queue/nr_requests the linux default is on the low side 2) investigate other elevators; cfq is great for interactive use but not so great for max throughput. you can do this by echo'ing "deadline" into /sys/block//scheduler 3) make sure ext3 is set to "data=writeback"; the default journalling mode is very strict, fine for smallish files but for multi-gigabyte it'll start to hurt 4) try to use iostat -x /dev/ 1 to see what values avg-rq and avg-qu are.. avg-rq should be at least several hundred if not more. 5) echo a larger value into /sys/block//queue/max_sectors_kb the default seems to be 512 which is... really low. The hw max is in another file in that directory; if you want max throughput set the max_sectors_kb value to the hw max. (you pay in terms of fairness for this; it's the eternal fairness/latency versus throughput tradeoff) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Am Montag, 12. Februar 2007 20:08 schrieben Sie: > On Mon, 12 Feb 2007 18:56:29 +0100 > "Martin A. Fink" <[EMAIL PROTECTED]> wrote: > > > I have to store big amounts of data coming from 2 digital cameras to disk. > > Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for > > a long period of time. So it is important for me that the harddisk drive is > > reliable in the sense of "if it is capable of 50 MB/s then it should operate > > at this speed. Constantly." > > Hard disks don't do this. They support operations/second based upon > physical and rotational latency constraints, vibration levels, mechanism, > internal layout policy and the need to do housekeeping. Well they do. The Flash disk I have (SATA-I) is capable of 48 MB/s and this value is reached over the whole disk size by windows as well as by FreeBSD. See my test results in the first thread. My Seagate Barracuda Harddisk drive (SATA-II) starts with 76 MB/s and decreases linearly to 35 MB/s due to the fact that it has to write to a rotating disk. But on a flash disk there is nothing rotating... So where is the difference between SATA-I and SATA-II ? And why is FreeBSD able to write with constant rates (the complete 25 GB, all with 48+/-0.1 MB/s) but Linux 2.6.18 not ? > > If you have an ATA7 drive with suitable firmware sets you can talk to it > directly via the SG_IO interface and use the streaming feature set which > is quite different to filesystem type operations and lets you ask the > drive to do this sort of stuff - if you can find any general PC firmware > ones that support it anyway. > > I'm not sure you'll get 50MB/sec sustained to work although you might > with a good current drive used for nothing else, a linear stream of data > (no seeking and file system overhead), and a non PCI controller (PCI > Express, host chipset bus etc). With a dedicated (rotating) SATA II device, using the first 70% of disk space no problem -- tested ! With a SATA-I device only a problem with Linux 2.6.18 > > If you are using a file system then the more you fsync the more I'd > expect you to see stalling as you keep draining whats effectively an 8MB > plus pipeline on a modern drive precisely because fsync does "hitting > disk" guarantees. You also want to be sure you are not journalling data. That is true. Thus i do the sync only after every 1GB of written data. That is not to often in my eyes... Journaling of data: you are right, ext2 performs better than ext3. Martin > > Alan > > > -- Dipl. Physiker Martin Anton Fink Max Planck Institute for extraterrestrial Physics Giessenbachstrasse 85741 Garching Germany Tel. +49-(0)89-3-3645 Fax. +49-(0)89-3-3569 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Am Dienstag, 13. Februar 2007 00:31 schrieben Sie: > Martin A. Fink wrote: > > I have to store big amounts of data coming from 2 digital cameras to disk. > > Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for > > a long period of time. So it is important for me that the harddisk drive is > > reliable in the sense of "if it is capable of 50 MB/s then it should operate > > at this speed. Constantly." > > The good old handful of suggestions: > > - Use a dedicated disc for the task. I used a dedicated disk for this task. No one else besides the task is writing to it! > - Use an empty disc so there is no fragmentation. All tests were performed on empty disk! > - Buy a bigger disk, they have high bandwidths. I have a flash disk from a manufacturer who grants me 48 MB/s. And FreeBSD as well as Windows reach this value. Only Linux 2.6.18 is far away from it (42 MB/s) > - Buy a more "specialized" disc. see above > for e.x.: Western Digital Raptor X(*) a 150GB, 10-KRPM S-ATA disc. > - Buy several discs and use RAID 0 > or alternate between discs when writing. What I have to build is an application for the International Space Station ISS. I am limited with power and space. So If the disk is able to write constantly 48 MB/s then the Operating System should do this! > - use XFS. AFAIK XFS has about the best "large file" and "high > bandwidth" characteristics. > - that with XFS you can preallocate the files doesn't seem relevant in > this case. It's more for the case that you write several files > simultaneously over a longer period of time. > - Write to one large file and separate the individual files later. > > if you are sure that you don't get a power-failure: > - Disable Write-Barriers, especially on a logging-filesystem. > - Enable write-caching. > (hdparm doesn't appear to be able to do that with a SATA-disc, but > blktool appears to be able to) > The later has a good chance of corrupting your filesystem when you do > get a power-failure!!! > > > > *: > I don't think you want something from the server-line, > SCSI/FibreChannel/...? > IIRC i read a something about the first 100MB/s disc with in the 15-KRPM > league. Power consumption! See above. > > Bis denn > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has all drivers but access to harddisk is unpredictable and thus unreliable! What can I do?? > -- > Real Programmers consider "what you see is what you get" to be just as > bad a concept in Text Editors as it is in women. No, the Real Programmer > wants a "you asked for it, you got it" text editor -- complicated, > cryptic, powerful, unforgiving, dangerous. > > -- Dipl. Physiker Martin Anton Fink Max Planck Institute for extraterrestrial Physics Giessenbachstrasse 85741 Garching Germany Tel. +49-(0)89-3-3645 Fax. +49-(0)89-3-3569 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Martin A. Fink wrote: > I have to store big amounts of data coming from 2 digital cameras to disk. > Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for > a long period of time. So it is important for me that the harddisk drive is > reliable in the sense of "if it is capable of 50 MB/s then it should operate > at this speed. Constantly." The good old handful of suggestions: - Use a dedicated disc for the task. - Use an empty disc so there is no fragmentation. - Buy a bigger disk, they have high bandwidths. - Buy a more "specialized" disc. for e.x.: Western Digital Raptor X(*) a 150GB, 10-KRPM S-ATA disc. - Buy several discs and use RAID 0 or alternate between discs when writing. - use XFS. AFAIK XFS has about the best "large file" and "high bandwidth" characteristics. - that with XFS you can preallocate the files doesn't seem relevant in this case. It's more for the case that you write several files simultaneously over a longer period of time. - Write to one large file and separate the individual files later. if you are sure that you don't get a power-failure: - Disable Write-Barriers, especially on a logging-filesystem. - Enable write-caching. (hdparm doesn't appear to be able to do that with a SATA-disc, but blktool appears to be able to) The later has a good chance of corrupting your filesystem when you do get a power-failure!!! *: I don't think you want something from the server-line, SCSI/FibreChannel/...? IIRC i read a something about the first 100MB/s disc with in the 15-KRPM league. Bis denn -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Hi Alan et al. On Mon, 2007-02-12 at 19:08 +, Alan wrote: > I'm not sure you'll get 50MB/sec sustained to work although you might > with a good current drive used for nothing else, a linear stream of data > (no seeking and file system overhead), and a non PCI controller (PCI > Express, host chipset bus etc). That's Suspend2's usage pattern when given a whole partition, so I can state without reservation you can get maximum throughput under those circumstances, even with a PCI controller. Swsusp should do about the same too. Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
On Mon, 12 Feb 2007 18:56:29 +0100 "Martin A. Fink" <[EMAIL PROTECTED]> wrote: > I have to store big amounts of data coming from 2 digital cameras to disk. > Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for > a long period of time. So it is important for me that the harddisk drive is > reliable in the sense of "if it is capable of 50 MB/s then it should operate > at this speed. Constantly." Hard disks don't do this. They support operations/second based upon physical and rotational latency constraints, vibration levels, mechanism, internal layout policy and the need to do housekeeping. If you have an ATA7 drive with suitable firmware sets you can talk to it directly via the SG_IO interface and use the streaming feature set which is quite different to filesystem type operations and lets you ask the drive to do this sort of stuff - if you can find any general PC firmware ones that support it anyway. I'm not sure you'll get 50MB/sec sustained to work although you might with a good current drive used for nothing else, a linear stream of data (no seeking and file system overhead), and a non PCI controller (PCI Express, host chipset bus etc). If you are using a file system then the more you fsync the more I'd expect you to see stalling as you keep draining whats effectively an 8MB plus pipeline on a modern drive precisely because fsync does "hitting disk" guarantees. You also want to be sure you are not journalling data. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Martin A. Fink wrote: > This means, that the CPU is only 7.3 of 52.8 seconds working. ... > It looks like > the SATA driver simply blocks the CPU while doing whatever... The system sleeps while waiting for the disk (actually, for the SATA host port) to be done with its work. As Andi explained, if the system gives the disk a small task, waits for the task to be completed, then gives it a next task and so on, latencies add up and eat into effective bandwidth. Give the disk a whole set of tasks so that - it has immediately something new to do when it finished one task, - deep pipes are not mostly empty due to "bubbles" in the pipe, - tasks can be reordered to be executed in optimized manner for good bandwidth utilization (if software/ firmware/ hardware is present which supports this; e.g. the Linux kernel itself), etc. Also make each task large so that the ratio of protocol overhead to net data payload stays minimal. -- Stefan Richter -=-=-=== --=- -==-- http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
On 2/12/07, Martin A. Fink <[EMAIL PROTECTED]> wrote: Am Montag, 12. Februar 2007 19:41 schrieben Sie: I have to store big amounts of data coming from 2 digital cameras to disk. Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for a long period of time. So it is important for me that the harddisk drive is reliable in the sense of "if it is capable of 50 MB/s then it should operate at this speed. Constantly." Ah, here is a misunderstanding, I think. By default, Linux won't start writing out dirty buffers until something like 40% of memory is used. This is to help common workloads where many temporary files are created and destroyed, or even data that gets written then overwritten shortly after. If the kernel were to immediately write out that dirty data, it would be slower than leaving it in memory for those workloads. But since that isn't best for everyone, there's a parameter that controls that dirty threshold. Setting that to a lower value will help even out the writeout, and start it early, just as you seem to be requesting. Hmm, it may be one of: /proc/sys/vm/dirty_ratio /proc/sys/vm/dirty_background_ratio Try tweaking those to much lower values and see if that helps. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Am Montag, 12. Februar 2007 19:41 schrieben Sie: > "Martin A. Fink" <[EMAIL PROTECTED]> writes: > > Your mailer seems to be broken. It drops cc. > > > > If you call fsync in BSD then you get what you expect. anything that is still > > not on disk will be written. Afterwards fsync returns... So this should be > > the same like with linux?! > > Not necessarily. The disk may buffer additionally. Handling that > differs widely, but modern Linux forces flushes to platter if the hardware support > it. > > > But the big question still is -- buffered or not -- where do the big > > variations within linux come frome? I am not writing small blocks. I write > > huge amounts of data. > > 1MB is nowhere near huge by modern standards. Many IO subsystems are > only happy with multi MB requests. > > > So the buffer will always be full. > > Hardly. Especially not if you do synchronous fsync inbetween. Well no. I write 1 GB in blocks of 1 MB. After that I call fsync. Then I process the next Gigabyte... > > > If I use a normal SATA-II disk, there are no differences between BSD and Linux > > when writing to the raw device... So it cant be a buffer-problem alone. > > Yes that is something that needs to be investigated. That is why I suggested > oprofile if your assertation of a more CPU overhead on Linux is true. > > > I still don't understand the buffer argument. If one writes 25 GB in blocks of > > 1 MB your buffer should be always full... > > Your mental model of a IO subsystem seems to be quite off. > Think what happens when you fsync and submit synchronously. See above, how I do writing. > > It's like sending something down a long pipe and waiting until it arrives > at the bottom and you hear the echo of the impact. Then only then you send again. > There will be always long periods when the pipe will be empty. > > If you use large enough blocks these gaps will be quite small and > might effectively become unimportant, but 1MB is nowhere near big enough > for that. I tested this: When I write in blocks of 8kB or less the effect you describe happens. But above 100kB blocksize there is no more increase of speed. > > > Is there a buffered io device that I can use, but that does not use a > > filesystem? > > /dev/sdX*. However it has some other issues that also don't make > it ideal. File systems are usually best. My experience with filesystems is: I write some data and the write-function returns nearly immediatelly. So I write again. Sometimes it returns only after some 100-300ms. I think this happens always then when the buffer is full and thus linux starts to write to disk. After this happend, it returns again nearly immediatelly and after another while the same trouble happen again. But not in a regular order... I have to store big amounts of data coming from 2 digital cameras to disk. Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for a long period of time. So it is important for me that the harddisk drive is reliable in the sense of "if it is capable of 50 MB/s then it should operate at this speed. Constantly." > > -Andi > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
System Details: dmesg: (parts) Bootdata ok (command line is root=/dev/sda7 vga=0x31aresume=/dev/sda5 splash=silent) Linux version 2.6.18.2-34-default ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006 ... Using ACPI (MADT) for SMP configuration information ... Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz stepping 06 Brought up 2 CPUs ... ACPI: Processor [CPU1] (supports 8 throttling states) ACPI: Processor [CPU2] (supports 8 throttling states) ... ICH7: IDE controller at PCI slot :00:1f.1 GSI 18 sharing vector 0xD9 and IRQ 18 ACPI: PCI Interrupt :00:1f.1[A] -> GSI 22 (level, low) -> IRQ 217 ICH7: chipset revision 1 ICH7: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio Probing IDE interface ide0... hda: HL-DT-STDVD-RAM GSA-H22N, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... libata version 2.00 loaded. ahci :00:1f.2: version 2.0 GSI 19 sharing vector 0xE1 and IRQ 19 ACPI: PCI Interrupt :00:1f.2[B] -> GSI 23 (level, low) -> IRQ 225 PCI: Setting latency timer of device :00:1f.2 to 64 ahci :00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode ahci :00:1f.2: flags: 64bit ncq led clo pio slum part ata1: SATA max UDMA/133 cmd 0xC2026D00 ctl 0x0 bmdma 0x0 irq 233 ata2: SATA max UDMA/133 cmd 0xC2026D80 ctl 0x0 bmdma 0x0 irq 233 ata3: SATA max UDMA/133 cmd 0xC2026E00 ctl 0x0 bmdma 0x0 irq 233 ata4: SATA max UDMA/133 cmd 0xC2026E80 ctl 0x0 bmdma 0x0 irq 233 scsi0 : ahci ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7, max UDMA/133, 156301488 sectors: LBA48 NCQ (depth 31/32) ata1.00: ata1: dev 0 multi count 16 ata1.00: configured for UDMA/133 scsi1 : ahci ata2: SATA link down (SStatus 0 SControl 300) scsi2 : ahci ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: ATA-6, max UDMA/100, 57337056 sectors: LBA ata3.00: ata3: dev 0 multi count 1 ata3.00: applying bridge limits ata3.00: configured for UDMA/100 scsi3 : ahci ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata4.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48 NCQ (depth 31/32) ata4.00: ata4: dev 0 multi count 16 ata4.00: configured for UDMA/133 Vendor: ATA Model: ST380811ASRev: 3.AA Losing some ticks... checking if CPU frequency changed. Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 > sda2: sd 0:0:0:0: Attached scsi disk sda Vendor: ATA Model: Adtron A25FB-28G Rev: BF22 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 57337056 512-byte hdwr sectors (29357 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write through SCSI device sdb: 57337056 512-byte hdwr sectors (29357 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write through sdb: sdb1 sd 2:0:0:0: Attached scsi disk sdb Vendor: ATA Model: ST3250820AS Rev: 3.AA Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB) sdc: Write Protect is off sdc: Mode Sense: 00 3a 00 00 SCSI device sdc: drive cache: write back sd 0:0:0:0: Attached scsi generic sg0 type 0 sd 2:0:0:0: Attached scsi generic sg1 type 0 SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB) sdc: Write Protect is off sdc: Mode Sense: 00 3a 00 00 SCSI device sdc: drive cache: write back sdc: sd 3:0:0:0: Attached scsi disk sdc sd 3:0:0:0: Attached scsi generic sg2 type 0 ... strace output: % time seconds usecs/call callserrors syscall -- --- --- - - 73.73 49.9040491947 25627 write 25.66 17.365062 69460225 fsync 0.620.416500 59500 7 close 0.000.00 0 4 read 0.000.00 0 7 open 0.000.00 0 5 fstat 0.000.00 016 mmap 0.000.00 0 7 mprotect 0.000.00 0 1 munmap 0.000.00 0 3 brk 0.000.00 0 1 1 access 0.000.00 0 1 execve 0.000.00 0 1 uname 0.000.00 0
Re: SATA-performance: Linux vs. FreeBSD
"Martin A. Fink" <[EMAIL PROTECTED]> writes: Your mailer seems to be broken. It drops cc. > > If you call fsync in BSD then you get what you expect. anything that is still > not on disk will be written. Afterwards fsync returns... So this should be > the same like with linux?! Not necessarily. The disk may buffer additionally. Handling that differs widely, but modern Linux forces flushes to platter if the hardware support it. > But the big question still is -- buffered or not -- where do the big > variations within linux come frome? I am not writing small blocks. I write > huge amounts of data. 1MB is nowhere near huge by modern standards. Many IO subsystems are only happy with multi MB requests. > So the buffer will always be full. Hardly. Especially not if you do synchronous fsync inbetween. > If I use a normal SATA-II disk, there are no differences between BSD and > Linux > when writing to the raw device... So it cant be a buffer-problem alone. Yes that is something that needs to be investigated. That is why I suggested oprofile if your assertation of a more CPU overhead on Linux is true. > I still don't understand the buffer argument. If one writes 25 GB in blocks > of > 1 MB your buffer should be always full... Your mental model of a IO subsystem seems to be quite off. Think what happens when you fsync and submit synchronously. It's like sending something down a long pipe and waiting until it arrives at the bottom and you hear the echo of the impact. Then only then you send again. There will be always long periods when the pipe will be empty. If you use large enough blocks these gaps will be quite small and might effectively become unimportant, but 1MB is nowhere near big enough for that. > Is there a buffered io device that I can use, but that does not use a > filesystem? /dev/sdX*. However it has some other issues that also don't make it ideal. File systems are usually best. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Some more info: :~> strace -c -T -o trace.out dd if=/dev/zero of=test.txt bs=10MB count=200 200+0 Datensätze ein 200+0 Datensätze aus 20 bytes (2,0 GB) copied, 52,8632 seconds, 37,8 MB/s test.txt: % time seconds usecs/call callserrors syscall -- --- --- - - 93.266.845265 33555 204 write 6.410.470283 117574018 open 0.320.023687 116 205 read 0.000.000149 916 mmap2 0.000.000119 40 3 munmap 0.000.81 324 close 0.000.68 611 old_mmap 0.000.64 320 fstat64 0.000.40 410 rt_sigaction 0.000.36 12 3 madvise 0.000.14 7 2 clock_gettime 0.000.10 3 3 brk 0.000.08 8 1 _sysctl 0.000.07 7 1 1 access 0.000.06 6 1 mprotect 0.000.05 5 1 futex 0.000.04 4 1 uname 0.000.04 4 1 _llseek 0.000.03 3 1 rt_sigprocmask 0.000.03 3 1 getrlimit 0.000.03 3 1 set_thread_area 0.000.03 3 1 set_tid_address -- --- --- - - 100.007.339862 55119 total This means, that the CPU is only 7.3 of 52.8 seconds working. This is what one can hear: If I run programs where the time they need is the same time as strace says, then I have 100% CPU load and the cpu fan starts to blow heavily. In the case here, the heat fan does not do anything. It looks like the SATA driver simply blocks the CPU while doing whatever... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
Am Montag, 12. Februar 2007 18:04 schrieb Andi Kleen: > "Martin A. Fink" <[EMAIL PROTECTED]> writes: > > > > What I did: > > I wrote blocks of 1 MB size to file. Each 1 GB I made a fsync and took the > > time. For those tests with filesystems I wrote files of 1 GB size, otherwise > > I just wrote to the raw device. > > Newer Linux versions depending on the disk and the file system will tell > the disk to flush the buffers to disk on fsync. FreeBSD might or might not > do that, but if it doesn't it would explain the difference. If you call fsync in BSD then you get what you expect. anything that is still not on disk will be written. Afterwards fsync returns... So this should be the same like with linux?! > > > > > Results: -1- > > > > TestOpenSuSE(AHCI) > > FreeBSD(AHCI) > > --- > > SSD(vfat 25GB) 41+/-2 MB/s at 4-10%15+/-0 > > MB/s at 2% CPU > > vfat is certainly not a performance optimized file system. That is just a minor test. > > > SSD(raw 25GB) 26+/-1 MB/s at 4-10% CPU48+/-0 MB/s at > > 1% CPU The above line is what makes me wondering !!! > > SSD(ext3 25GB) 39+/-5 MB/s at 10-15% CPU 34+/-0 MB/s at > > 14% CPU > > SSD(ext2 25GB) 42+/-1 MB/s at 10-15% CPU 32+/-0 MB/s at > > 10% CPU > > > You could use oprofile (http://oprofile.sourceforge.net) to find out > where the CPU is being used. > > > > --- > > > > TestOpenSuSE (AHCI off) > > FreeBSD (AHCI off) > > --- > > SSD(vfat 25GB) 22+/-4 MB/s at 6-19% CPU-- > > SSD(raw 25GB) 33+/-4 MB/s at 7-14% CPU41+/-0 MB/s at > > 1% CPU > > I remember vaguely (but I might be wrong here) the standard block > character devices on FreeBSD are buffered, while raw is truly > unbuffered on Linux. Naive programs (no optimized IO threads or aio) > on truly unbuffered devices tend to perform poorly because they > don't do any write behind. But the big question still is -- buffered or not -- where do the big variations within linux come frome? I am not writing small blocks. I write huge amounts of data. So the buffer will always be full. And: Linux is even slower then BSD if it can use a buffer. The maximum performance of Linux is 42 MB/s (buffered) while the maximum performance of BSD is 48 MB/s (buffered or not -- i don't know). If I use a normal SATA-II disk, there are no differences between BSD and Linux when writing to the raw device... So it cant be a buffer-problem alone. > > It might also useful if you post the libata related parts of your > boot log. > > > > Question 2: > > Can anybody explain to me, why writing to a solid state disk (a kind of > > memory > > that always has the same constant bandwidth) has such big standard errors > > in > > writing rate using Linux (between 1 to 6 MB/s error) while FreeBSD gives > > an > > almost constant writing rate (as one would expect it for a SSD) ? > > Could be buffered vs unbuffered. Unbuffered single threaded writes > tend to be quite variable. This does not answer the big variation when writing with ext3 of +/- 5 MB/s. I still don't understand the buffer argument. If one writes 25 GB in blocks of 1 MB your buffer should be always full... > > > Question 3: > > Why is writing to a raw device in Linux slower than using e.g. ext2 ? And why > > is Linux writing rate much lower (-12.5 % for the best case) compared to > > writing rate of FreeBSD? > > It's really hard to make raw io perform well without complicated > efforts because nobody will hide the IO latencies. That is why > buffered IO is normally recommend Is there a buffered io device that I can use, but that does not use a filesystem? > > -Andi > -- Dipl. Physiker Martin Anton Fink Max Planck Institute for extraterrestrial Physics Giessenbachstrasse 85741 Garching Germany Tel. +49-(0)89-3-3645 Fax. +49-(0)89-3-3569 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA-performance: Linux vs. FreeBSD
"Martin A. Fink" <[EMAIL PROTECTED]> writes: > > What I did: > I wrote blocks of 1 MB size to file. Each 1 GB I made a fsync and took the > time. For those tests with filesystems I wrote files of 1 GB size, otherwise > I just wrote to the raw device. Newer Linux versions depending on the disk and the file system will tell the disk to flush the buffers to disk on fsync. FreeBSD might or might not do that, but if it doesn't it would explain the difference. > > Results: -1- > > Test OpenSuSE(AHCI) > FreeBSD(AHCI) > --- > SSD(vfat 25GB)41+/-2 MB/s at 4-10%15+/-0 > MB/s at 2% CPU vfat is certainly not a performance optimized file system. > SSD(raw 25GB) 26+/-1 MB/s at 4-10% CPU48+/-0 MB/s at > 1% CPU > SSD(ext3 25GB)39+/-5 MB/s at 10-15% CPU 34+/-0 MB/s at > 14% CPU > SSD(ext2 25GB)42+/-1 MB/s at 10-15% CPU 32+/-0 MB/s at > 10% CPU You could use oprofile (http://oprofile.sourceforge.net) to find out where the CPU is being used. > --- > > Test OpenSuSE (AHCI off) FreeBSD > (AHCI off) > --- > SSD(vfat 25GB)22+/-4 MB/s at 6-19% CPU-- > SSD(raw 25GB)33+/-4 MB/s at 7-14% CPU41+/-0 MB/s at > 1% CPU I remember vaguely (but I might be wrong here) the standard block character devices on FreeBSD are buffered, while raw is truly unbuffered on Linux. Naive programs (no optimized IO threads or aio) on truly unbuffered devices tend to perform poorly because they don't do any write behind. It might also useful if you post the libata related parts of your boot log. > > Question 2: > Can anybody explain to me, why writing to a solid state disk (a kind of > memory > that always has the same constant bandwidth) has such big standard errors in > writing rate using Linux (between 1 to 6 MB/s error) while FreeBSD gives an > almost constant writing rate (as one would expect it for a SSD) ? Could be buffered vs unbuffered. Unbuffered single threaded writes tend to be quite variable. > Question 3: > Why is writing to a raw device in Linux slower than using e.g. ext2 ? And why > is Linux writing rate much lower (-12.5 % for the best case) compared to > writing rate of FreeBSD? It's really hard to make raw io perform well without complicated efforts because nobody will hide the IO latencies. That is why buffered IO is normally recommend -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SATA-performance: Linux vs. FreeBSD
Dear all, I did some performance tests that made me really wonder: My Hardware: Asus P5LD2 board with Intel i945P chipset, ICH7R southbridge CPU Intel Core 2 Duo E6300 at 1.86 GHz, 2 MB Cache 1 GB RAM My Software: OpenSuSE 10.2 with Linux kernel 2.6.18, x86-64 architecture FreeBSD 6.2 Testdrives: 1. HDD: Seagate ST3250820AS RPM 7200.9, 8 MB Cache, 250 GB, SATA-II (Harddisk Drive) 2. SSD: Adtron AF25FB, 27GB, SATA Revision 1.0a (Solid State Disk) What I did: I wrote blocks of 1 MB size to file. Each 1 GB I made a fsync and took the time. For those tests with filesystems I wrote files of 1 GB size, otherwise I just wrote to the raw device. Results: -1- TestOpenSuSE(AHCI) FreeBSD(AHCI) --- SSD(vfat 25GB) 41+/-2 MB/s at 4-10%15+/-0 MB/s at 2% CPU SSD(raw 25GB) 26+/-1 MB/s at 4-10% CPU48+/-0 MB/s at 1% CPU SSD(ext3 25GB) 39+/-5 MB/s at 10-15% CPU 34+/-0 MB/s at 14% CPU SSD(ext2 25GB) 42+/-1 MB/s at 10-15% CPU 32+/-0 MB/s at 10% CPU --- TestOpenSuSE (AHCI off) FreeBSD (AHCI off) --- SSD(vfat 25GB) 22+/-4 MB/s at 6-19% CPU-- SSD(raw 25GB) 33+/-4 MB/s at 7-14% CPU41+/-0 MB/s at 1% CPU SSD(ext2 25GB) 27+/-6 MB/s at 6-14% CPU-- --- Question 1: Can anybody explain to me, why writing to a SATA-I device with AHCI consumes so much CPU time using Linux, while it takes almost no CPU time on FreeBSD 6.2 ? Especially comparing values of writing to the raw device? Question 2: Can anybody explain to me, why writing to a solid state disk (a kind of memory that always has the same constant bandwidth) has such big standard errors in writing rate using Linux (between 1 to 6 MB/s error) while FreeBSD gives an almost constant writing rate (as one would expect it for a SSD) ? Question 3: Why is writing to a raw device in Linux slower than using e.g. ext2 ? And why is Linux writing rate much lower (-12.5 % for the best case) compared to writing rate of FreeBSD? Question 4: When writing to the SATA-II HDD Linux is around 10% slower than FreeBSD when using ext3, but around as fast as FreeBSD when writing raw. Why? How can I improve the speed of Linux, Thanks for advices Martin PS: part of my testcode: int fd=open(fileName, O_WRONLY | O_CREAT | O_TRUNC, 0666); (void)gettimeofday(&start, 0); for (long bl=0; bl < blocksPerGigaByte; ++bl) write(fd, block, blockSize); fsync(fd); (void)gettimeofday(&ende, 0); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/