-----Original Message----- From: Gilad Ben-Yossef [mailto:[EMAIL PROTECTED] Sent: Monday, December 06, 2004 12:02 PM To: Gershon Geva Subject: Re: bdflush
Gershon Geva wrote: > regarding http://www-106.ibm.com/developerworks/linux/library/l-fs8.html : > Thanks, it's a very interesting article, though the data=journal > performace > Abnormality is quite disturbing - it means we completely misunderstand > something about ext3. Well, not really. I am willing to bet that the entire "loss" for writing the journal is covered up ten fold by a better utilized (or "hot" as it's called) cache. This occurs is many other areas. For example - Intel designed a network card and a patch to Linux that allowed sending packets from directly from the card to a user space buffer, without going via kernel buffers and getting copied to user space. They expected to gain performance because they saved the entire data being copied twice (from card to kernel buffer and from kernel buffer to user space) but in reality they got worse performance because copying the data around made the cache hot with the data which turned out to compensate the price of copying the data twice and then some. > > Regarding the bdflush values, we want to join as many writes as > possible, > Obviously. We currently put in /etc/sysctl.comf : vm.bdflush=200 1200 > 1024 1024 15 5000 1000 1884 2. These values don't make sense to me at all. You're sure we're talking a 2.4 kernel? For example. the range of the first value is a percentage between 0 to 100, so 200 doesn't make any sense and I'm guessing will actually get set to the default value, which is 30. The fifth value asks the kupdate to wake up and try to trim the cache every 15ms, which is VERY aggressive, but then the very next parameter (6th one) allows a page to get 50 seconds(!) old before kupdate will evacuate it from the cache. The 7th parameter is yet again a percent value whose range is from 0 to 100, where you have 1000 (which I'm guessing would mean the default of 60). The 8th parameter is again a percentage range, where you have 1884. In short, to paraphrase one of my favorite movies - I don't think these values means what you think they mean... ;-) You might want to consult the file usr/src/linux-2.4/Documentation/sysctl/vm.txt from the Linux source. > > An unwanted behavior we saw was that each KB we write translates into 2 > blocks. We detected it using iostat and iostat -k. > > The default block size should be 4KB, so what is going on? Pure guess - one block for data and other for meta data/journal? Hope this helps, Gilad -- Gilad Ben-Yossef <[EMAIL PROTECTED]> Codefidence. A name you can trust(tm) Web: http://codefidence.com | SIP: [EMAIL PROTECTED] Tel: +972.9.8650475 ext. 201 | Fax: +972.9.8850643 "I am Jack's Overwritten Stack Pointer" -- Hackers Club, the movie ================================================================To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
