Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Am Mittwoch, 13. Mai 2015, 13:38:24 schrieb Daniel Phillips: > On Wednesday, May 13, 2015 1:25:38 PM PDT, Martin Steigerwald wrote: > > Am Mittwoch, 13. Mai 2015, 12:37:41 schrieb Daniel Phillips: > >> On 05/13/2015 12:09 PM, Martin Steigerwald wrote: ... > > > > Daniel, if you want to change the process of patch review and > > inclusion into > > the kernel, model an example of how you would like it to be. This has > > way > > better chances to inspire others to change their behaviors themselves > > than accusing them of bad faith. > > > > Its yours to choose. > > > > What outcome do you want to create? > > The outcome I would like is: > > * Everybody has a good think about what has gone wrong in the past, > not only with troublesome submitters, but with mutual respect and > collegial conduct. > > * Tux3 is merged on its merits so we get more developers and > testers and move it along faster. > > * I left LKML better than I found it. > > * Group hugs > > Well, group hugs are optional, that one would be situational. Great stuff! Looking forward to it. Thank you, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Wednesday, May 13, 2015 1:25:38 PM PDT, Martin Steigerwald wrote: Am Mittwoch, 13. Mai 2015, 12:37:41 schrieb Daniel Phillips: On 05/13/2015 12:09 PM, Martin Steigerwald wrote: ... Daniel, if you want to change the process of patch review and inclusion into the kernel, model an example of how you would like it to be. This has way better chances to inspire others to change their behaviors themselves than accusing them of bad faith. Its yours to choose. What outcome do you want to create? The outcome I would like is: * Everybody has a good think about what has gone wrong in the past, not only with troublesome submitters, but with mutual respect and collegial conduct. * Tux3 is merged on its merits so we get more developers and testers and move it along faster. * I left LKML better than I found it. * Group hugs Well, group hugs are optional, that one would be situational. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Am Mittwoch, 13. Mai 2015, 12:37:41 schrieb Daniel Phillips: > On 05/13/2015 12:09 PM, Martin Steigerwald wrote: > > Daniel, what are you trying to achieve here? > > > > I thought you wanted to create interest for your filesystem and > > acceptance for merging it. > > > > What I see you are actually creating tough is something different. > > > > Is what you see after you send your mails really what you want to see? > > If > > not… why not? And if you seek change, where can you create change? > > That is the question indeed, whether to try and change the system > while merging, or just keep smiling and get the job done. The problem > is, I am just too stupid to realize that I can't change the system, > which is famously unpleasant for submitters. > > > I really like to see Tux3 inside the kernel for easier testing, yet I > > also see that the way you, in your oppinion, "defend" it, does not seem > > to move that goal any closer, quite the opposite. It triggers polarity > > and resistance. > > > > I believe it to be more productive to work together with the people who > > will decide about what goes into the kernel and the people whose > > oppinions are respected by them, instead of against them. > > Obviously true. > > > "Assume good faith" can help here. No amount of accusing people of bad > > intention will change them. The only thing you have the power to change > > is your approach. You absolutely and ultimately do not have the power > > to change other people. You can´t force Tux3 in by sheer willpower or > > attacking people. > > > > On any account for anyone discussing here: I believe that any personal > > attacks, counter-attacks or "you are wrong" kind of speech will not help > > to move this discussion out of the circling it seems to be in at the > > moment. > Thanks for the sane commentary. I have the power to change my behavior. > But if nobody else changes their behavior, the process remains just as > unpleasant for us as it ever was (not just me!). Obviously, this is > not the first time I have been through this, and it has never been > pleasant. After a while, contributors just get tired of the grind and > move on to something more fun. I know I did, and I am far from the > only one. Daniel, if you want to change the process of patch review and inclusion into the kernel, model an example of how you would like it to be. This has way better chances to inspire others to change their behaviors themselves than accusing them of bad faith. Its yours to choose. What outcome do you want to create? -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Wednesday, May 13, 2015 1:02:34 PM PDT, Jeremy Allison wrote: On Wed, May 13, 2015 at 12:37:41PM -0700, Daniel Phillips wrote: On 05/13/2015 12:09 PM, Martin Steigerwald wrote: ... Daniel, please listen to Martin. He speaks a fundamental truth here. As you know, I am also interested in Tux3, and would love to see it as a filesystem option for NAS servers using Samba. But please think about the way you're interacting with people on the list, and whether that makes this outcome more or less likely. Thanks Jeremy, that means more from you than anyone. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Wed, May 13, 2015 at 12:37:41PM -0700, Daniel Phillips wrote: > On 05/13/2015 12:09 PM, Martin Steigerwald wrote: > > > "Assume good faith" can help here. No amount of accusing people of bad > > intention will change them. The only thing you have the power to change is > > your approach. You absolutely and ultimately do not have the power to > > change > > other people. You can´t force Tux3 in by sheer willpower or attacking > > people. > > > > On any account for anyone discussing here: I believe that any personal > > attacks, counter-attacks or "you are wrong" kind of speech will not help to > > move this discussion out of the circling it seems to be in at the moment. > > Thanks for the sane commentary. I have the power to change my behavior. > But if nobody else changes their behavior, the process remains just as > unpleasant for us as it ever was (not just me!). Obviously, this is > not the first time I have been through this, and it has never been > pleasant. After a while, contributors just get tired of the grind and > move on to something more fun. I know I did, and I am far from the > only one. Daniel, please listen to Martin. He speaks a fundamental truth here. As you know, I am also interested in Tux3, and would love to see it as a filesystem option for NAS servers using Samba. But please think about the way you're interacting with people on the list, and whether that makes this outcome more or less likely. Cheers, Jeremy. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/13/2015 12:09 PM, Martin Steigerwald wrote: > Daniel, what are you trying to achieve here? > > I thought you wanted to create interest for your filesystem and acceptance > for merging it. > > What I see you are actually creating tough is something different. > > Is what you see after you send your mails really what you want to see? If > not… why not? And if you seek change, where can you create change? That is the question indeed, whether to try and change the system while merging, or just keep smiling and get the job done. The problem is, I am just too stupid to realize that I can't change the system, which is famously unpleasant for submitters. > I really like to see Tux3 inside the kernel for easier testing, yet I also > see that the way you, in your oppinion, "defend" it, does not seem to move > that goal any closer, quite the opposite. It triggers polarity and > resistance. > > I believe it to be more productive to work together with the people who will > decide about what goes into the kernel and the people whose oppinions are > respected by them, instead of against them. Obviously true. > "Assume good faith" can help here. No amount of accusing people of bad > intention will change them. The only thing you have the power to change is > your approach. You absolutely and ultimately do not have the power to change > other people. You can´t force Tux3 in by sheer willpower or attacking > people. > > On any account for anyone discussing here: I believe that any personal > attacks, counter-attacks or "you are wrong" kind of speech will not help to > move this discussion out of the circling it seems to be in at the moment. Thanks for the sane commentary. I have the power to change my behavior. But if nobody else changes their behavior, the process remains just as unpleasant for us as it ever was (not just me!). Obviously, this is not the first time I have been through this, and it has never been pleasant. After a while, contributors just get tired of the grind and move on to something more fun. I know I did, and I am far from the only one. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Am Dienstag, 12. Mai 2015, 18:26:28 schrieb Daniel Phillips: > On 05/12/2015 03:35 PM, David Lang wrote: > > On Tue, 12 May 2015, Daniel Phillips wrote: > >> On 05/12/2015 02:30 PM, David Lang wrote: > >>> You need to get out of the mindset that Ted and Dave are Enemies that > >>> you need to overcome, they are friendly competitors, not Enemies. > >> > >> You are wrong about Dave These are not the words of any friend: > >> "I don't think I'm alone in my suspicion that there was something > >> stinky about your numbers." -- Dave Chinner > > > > > > > > you are looking for offense. That just means that something is wrong > > with them, not that they were deliberatly falsified. > > I am not mistaken. Dave made sure to eliminate any doubt about > what he meant. He said "Oh, so nicely contrived. But terribly > obvious now that I've found it" among other things. Daniel, what are you trying to achieve here? I thought you wanted to create interest for your filesystem and acceptance for merging it. What I see you are actually creating tough is something different. Is what you see after you send your mails really what you want to see? If not… why not? And if you seek change, where can you create change? I really like to see Tux3 inside the kernel for easier testing, yet I also see that the way you, in your oppinion, "defend" it, does not seem to move that goal any closer, quite the opposite. It triggers polarity and resistance. I believe it to be more productive to work together with the people who will decide about what goes into the kernel and the people whose oppinions are respected by them, instead of against them. "Assume good faith" can help here. No amount of accusing people of bad intention will change them. The only thing you have the power to change is your approach. You absolutely and ultimately do not have the power to change other people. You can´t force Tux3 in by sheer willpower or attacking people. On any account for anyone discussing here: I believe that any personal attacks, counter-attacks or "you are wrong" kind of speech will not help to move this discussion out of the circling it seems to be in at the moment. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/13/2015 06:08 AM, Mike Galbraith wrote: > On Wed, 2015-05-13 at 04:31 -0700, Daniel Phillips wrote: >> Third possibility: build from our repository, as Mike did. > > Sorry about that folks. I've lost all interest, it won't happen again. Thanks for your valuable contribution. Now we are seeing a steady of stream of people heading to the repository, after you showed it could be done. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Wed, 2015-05-13 at 04:31 -0700, Daniel Phillips wrote: > Third possibility: build from our repository, as Mike did. Sorry about that folks. I've lost all interest, it won't happen again. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/13/2015 04:31 AM, Daniel Phillips wrote: Let me be the first to catch that arithmetic error > Let's say our delta size is 400MB (typical under load) and we leave > a "nice big gap" of 112 MB after flushing each one. Let's say we do > two thousand of those before deciding that we have enough information > available to switch to some smarter strategy. We used one GB of a > a 4TB disk, say. The media transfer rate decreased by a factor of: > > (1 - 2/1000) = .2%. Ahem, no, we used 1/8th of the disk. The time/data rate increased from unity to 1.125, for an average of 1.0625 across the region. If we only use 1/10th of the disk instead, by not leaving gaps, then the average time/data across the region is 1.05. The difference is, 1.0625 - 1.05, so the gap strategy increases media transfer time by 1.25%, which is not significant compared to the performance deficit in question of 400%. So, same argument: change in media transfer rate is just a distraction from the original question. In any case, we probably want to start using a smarter strategy sooner than 1000 commits, maybe after ten or a hundred commits, which would make the change in media transfer rate even less relevant. The thing is, when data first starts landing on media, we do not have much information about what the long term load will be. So just analyze the clues we have in the early commits and put those early deltas onto disk in the most efficient format, which for Tux3 seems to be linear per delta. There would be exceptions, but that is the common case. Then get smarter later. The intent is to get the best of both: early efficiency, and long term nice aging behavior. I do not accept the proposition that one must be sacrificed for the other, I find that reasoning faulty. > The performance deficit in question and the difference in media rate are > three orders of magnitude apart, does that justify the term "similar or > identical?". Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/13/2015 12:25 AM, Pavel Machek wrote: > On Mon 2015-05-11 16:53:10, Daniel Phillips wrote: >> Hi Pavel, >> >> On 05/11/2015 03:12 PM, Pavel Machek wrote: > It is a fact of life that when you change one aspect of an intimately > interconnected system, > something else will change as well. You have naive/nonexistent free space > management now; when you > design something workable there it is going to impact everything else > you've already done. It's an > easy bet that the impact will be negative, the only question is to what > degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. >>> >>> Umm, are you sure. If "some areas of disk are faster than others" is >>> still true on todays harddrives, the gaps will decrease the >>> performance (as you'll "use up" the fast areas more quickly). >> >> That's why I hedged my claim with "similar or identical". The >> difference in media speed seems to be a relatively small effect > > When you knew it can't be identical? That's rather confusing, right? Maybe. The top of thread is about a measured performance deficit of a factor of five. Next to that, a media transfer rate variation by a factor of two already starts to look small, and gets smaller when scrutinized. Let's say our delta size is 400MB (typical under load) and we leave a "nice big gap" of 112 MB after flushing each one. Let's say we do two thousand of those before deciding that we have enough information available to switch to some smarter strategy. We used one GB of a a 4TB disk, say. The media transfer rate decreased by a factor of: (1 - 2/1000) = .2%. The performance deficit in question and the difference in media rate are three orders of magnitude apart, does that justify the term "similar or identical?". > Perhaps you should post more details how your benchmark is structured > next time, so we can see you did not make any trivial mistakes...? Makes sense to me, though I do take considerable care to ensure that my results are reproducible. That is born out by the fact that Mike did reproduce, albeit from the published branch, which is a bit behind current work. And he went on to do some original testing of his own. I had no idea Tux3 was so much faster than XFS on the Git self test, because we never specifically tested anything like that, or optimized for it. Of course I was interested in why. And that was not all, Mike also noticed a really interesting fact about latency that I failed to reproduce. That went on to the list of things to investigate as time permits. I reproduced Mike's results according to his description, by actually building Git in the VM and running the selftests just to see if the same thing happened, which it did. I didn't think that was worth mentioning at the time, because if somebody publishes benchmarks, my first instinct is to trust them. Trust and verify. > Or just clean the code up so that it can get merged, so that we can > benchmark ourselves... Third possibility: build from our repository, as Mike did. Obviously, we need to merge to master so the build process matches the Wiki. But Hirofumi is busy with other things, so please be patient. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon 2015-05-11 16:53:10, Daniel Phillips wrote: > Hi Pavel, > > On 05/11/2015 03:12 PM, Pavel Machek wrote: > >>> It is a fact of life that when you change one aspect of an intimately > >>> interconnected system, > >>> something else will change as well. You have naive/nonexistent free space > >>> management now; when you > >>> design something workable there it is going to impact everything else > >>> you've already done. It's an > >>> easy bet that the impact will be negative, the only question is to what > >>> degree. > >> > >> You might lose that bet. For example, suppose we do strictly linear > >> allocation > >> each delta, and just leave nice big gaps between the deltas for future > >> expansion. Clearly, we run at similar or identical speed to the current > >> naive > >> strategy until we must start filling in the gaps, and at that point our > >> layout > >> is not any worse than XFS, which started bad and stayed that way. > > > > Umm, are you sure. If "some areas of disk are faster than others" is > > still true on todays harddrives, the gaps will decrease the > > performance (as you'll "use up" the fast areas more quickly). > > That's why I hedged my claim with "similar or identical". The > difference in media speed seems to be a relatively small effect When you knew it can't be identical? That's rather confusing, right? Perhaps you should post more details how your benchmark is structured next time, so we can see you did not make any trivial mistakes...? Or just clean the code up so that it can get merged, so that we can benchmark ourselves... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue 2015-05-12 13:54:58, Daniel Phillips wrote: > On 05/12/2015 11:39 AM, David Lang wrote: > > On Mon, 11 May 2015, Daniel Phillips wrote: > >>> ...it's the mm and core kernel developers that need to > >>> review and accept that code *before* we can consider merging tux3. > >> > >> Please do not say "we" when you know that I am just as much a "we" > >> as you are. Merging Tux3 is not your decision. The people whose > >> decision it actually is are perfectly capable of recognizing your > >> agenda for what it is. > >> > >> http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM > >> "XFS Developer Takes Shots At Btrfs, EXT4" > > > > umm, Phoronix has no input on what gets merged into the kernel. they also > > hae a reputation for > > trying to turn anything into click-bait by making it sound like a fight > > when it isn't. > > Perhaps you misunderstood. Linus decides what gets merged. Andrew > decides. Greg decides. Dave Chinner does not decide, he just does > his level best to create the impression that our project is unfit > to merge. Any chance there might be an agenda? Dunno. _Your_ agenda seems to be "attack other maintainers so much that you can later claim they are biased". Not going to work, sorry. > > As Dave says above, it's not the other filesystem people you have to > > convince, it's the core VFS and > > Memory Mangement folks you have to convince. You may need a little > > benchmarking to show that there > > is a real advantage to be gained, but the real discussion is going to be on > > the impact that page > > forking is going to have on everything else (both in complexity and in > > performance impact to other > > things) > > Yet he clearly wrote "we" as if he believes he is part of it. > > Now that ENOSPC is done to a standard way beyond what Btrfs had > when it was merged, the next item on the agenda is writeback. That > involves us and VFS people as you say, and not Dave Chinner, who > only intends to obstruct the process as much as he possibly can. He Why would he do that? Aha, maybe because you keep attacking him all the time. Or maybe because your code is not up to the kernel standards. You want to claim it is the former, but it really looks like the latter. Just stop doing that. You are not creating nice atmosphere and you are not getting tux3 being merged in any way. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/12/2015 03:35 PM, David Lang wrote: > On Tue, 12 May 2015, Daniel Phillips wrote: >> On 05/12/2015 02:30 PM, David Lang wrote: >>> You need to get out of the mindset that Ted and Dave are Enemies that you >>> need to overcome, they are >>> friendly competitors, not Enemies. >> >> You are wrong about Dave These are not the words of any friend: >> >> "I don't think I'm alone in my suspicion that there was something >> stinky about your numbers." -- Dave Chinner > > you are looking for offense. That just means that something is wrong with > them, not that they were > deliberatly falsified. I am not mistaken. Dave made sure to eliminate any doubt about what he meant. He said "Oh, so nicely contrived. But terribly obvious now that I've found it" among other things. Good work, Dave. Never mind that we did not hide it. Let's look at some more of the story. Hirofumi ran the test and I posted the results and explained the significant. I did not even know that dbench had fsyncs at that time, since I had never used it myself, nor that Hirofumi had taken them out in order to test the things he was interested in. Which turned out to be very interesting, don't you agree? Anyway, Hirofumi followed up with a clear explanation, here: http://phunq.net/pipermail/tux3/2013-May/002022.html Instead of accepting that, Dave chose to ride right over it and carry on with his thinly veiled allegations of intellectual fraud, using such words as "it's deceptive at best." Dave managed to insult two people that day. Dave dismissed the basic breakthrough we had made as "silly marketing fluff". By now I hope you understand that the result in question was anything but silly marketing fluff. There are real, technical reasons that Tux3 wins benchmarks, and the specific detail that Dave attacked so ungraciously is one of them. Are you beginning to see who the victim of this mugging was? >> Basically allegations of cheating. And wrong. Maybe Dave just >> lives in his own dreamworld where everybody is out to get him, so >> he has to attack people he views as competitors first. > > you are the one doing the attacking. Defending, not attacking. There is a distinction. > Please stop. Take a break if needed, and then get back to > producing software rather than complaining about how everyone is out to get > you. Dave is not "everyone", and a "shut up" will not fix this. What will fix this is a simple, professional statement that an error was made, that there was no fraud or anything even remotely resembling it, and that instead a technical contribution was made. It is not even important that it come from Dave. But it is important that the aspersions that were cast be recognized for what they were. By the way, do you remember the scene from "Unforgiven" where the sherrif is kicking the guy on the ground and saying "I'm not kicking you?" It feels like that. As far as who should take a break goes, note that either of us can stop the thread. Does it necessarily have to be me? If you would prefer some light reading, you could read "How fast can we fail?", which I believe is relevant to the question of whether Tux3 is mergeable or not. https://lkml.org/lkml/2015/5/12/663 Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/12/2015 02:30 PM, David Lang wrote: > On Tue, 12 May 2015, Daniel Phillips wrote: >> Phoronix published a headline that identifies Dave Chinner as >> someone who takes shots at other projects. Seems pretty much on >> the money to me, and it ought to be obvious why he does it. > > Phoronix turns any correction or criticism into an attack. Phoronix gets attacked in an unseemly way by a number of people in the developer community who should behave better. You are doing it yourself, seemingly oblivious to the valuable role that the publication plays in our community. Google for filesystem benchmarks. Where do you find them? Right. Not to mention the Xorg coverage, community issues, etc etc. The last thing we need is a monoculture in Linux news, and we are dangerously close to that now. So, how is "EXT4 is not as stable or as well tested as most people think" not a cheap shot? By my first hand experience, that claim is absurd. Add to that the first hand experience of roughly two billion other people. Seems to be a bit self serving too, or was that just an accident. > You need to get out of the mindset that Ted and Dave are Enemies that you > need to overcome, they are > friendly competitors, not Enemies. You are wrong about Dave, These are not the words of any friend: "I don't think I'm alone in my suspicion that there was something stinky about your numbers." -- Dave Chinner Basically allegations of cheating. And wrong. Maybe Dave just lives in his own dreamworld where everybody is out to get him, so he has to attack people he views as competitors first. Ted has more taste and his FUD attack was more artful, but it still amounted to nothing more than piling on, He just picked up Dave's straw man uncritically and proceeded to knock it down some more. Nice way of distracting attention from the fact that we actually did what we claimed, and instead of getting the appropriate recognition for it, we were called cheaters. More or less in so many words by Dave, and more subtly by Ted, but the intent is clear and unmistakable. Apologies from both are still in order, but it will be a rainy day in that hot place before we ever see either of them do the right thing. That said, Ted is no enemy, he is brilliant and usually conducts himself admirably. Except sometimes. I wish I would say the same about Dave, but what I see there is a guy who has invested his entire identity in his XFS career and is insecure that something might conspire against him to disrupt it. I mean, come on, if you convince Redhat management to elevate your life's work to the status of something that most of the paid for servers in the world are going to run, do you continue attacking your peers or do you chill a bit? > They assume that you are working in good faith (but are > inexperienced compared to them), and you need to assume that they are working > in good faith. If they > ever do resort to underhanded means to sabotage you, Linus and the other > kernel developers will take > action. But pointing out limits in your current implementation, problems in > your benchmarks based on > how they are run, and concepts that are going to be difficult to merge is not > underhanded, it's > exactly the type of assistance that you should be greatful for in friendly > competition. > > You were the one who started crowing about how badly XFS performed. Not at all, somebody else posted the terrible XFS benchmark result, then Dave put up a big smokescreen to try to deflect atention from it. There is a term for that kind of logical fallacy: http://en.wikipedia.org/wiki/Proof_by_intimidation Seems to have worked well on you. But after all those words, XFS does not run any faster, and it clearly needs to. > Dave gave a long and detailed explination about the reasons for the > differences, and showing benchmarks on other hardware that > showed that XFS works very well there. That's not an attack on EXT4 (or > Tux3), it's an explination. Long, detailed, and bogus. Summary: "oh, XFS doesn't work well on that hardware? Get new hardware." Excuse me, but other filesystems do work well on that hardware, the problem is not with the hardware. > I have my own concerns about how things are going to work (I've voiced some > of them), but no, I > haven't tried running Tux3 because you say it's not ready yet. I did not say that. I said it is not ready for users. It is more than ready for anybody who wants to develop it, or benchmark it, or put test data on it, and has been for a long time. Except for enospc, and that was apparently not an issue for Btrfs, was it. >> You know what to do about checking for faulty benchmarks. > > That requires that the code be readily available, which last I heard, Tux3 > wasn't. Has this been fixed? You heard wrong. The code is readily available and you can clone it from here: https://github.com/OGAWAHirofumi/linux-tux3.git The hirofumi-user branch has the user tools including mkfs and basic fsck,
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue, May 12, 2015 at 03:35:43PM -0700, David Lang wrote: > > I happen to think that it's correct. It's not that Ext4 isn't tested, but > that people's expectations of how much it's been tested, and at what scale > don't match the reality. Ext4 is used at Google, on a very large number of disks. Exactly how large is not something I'm allowed to say, but there's a very amusing Ted Talk by Randall Munroe (of xkcd fame) on that topic: http://tedsummaries.com/2014/05/14/randall-munroe-comics-that-ask-what-if/ One thing I can say is that shortly after we deployed ext4 at Google, thanks to having a very large number of disks, and because we have very good system monitoring, we detected a file system corruption problem that happened with a very low probability, but we had enough disks that we could detect the pattern. (Fortunately, because Google's cluster file system has replication and/or erasure coding, no user data was lost.) Even though we could notice the problem, it took us several months to track down the problem. When we finally did, it turned out to be a race condition which only took place under high memory pressure. What was *very* amusing was after fixing the problem for ext4, I looked at ext3, and discovered that (a) the ext4 had inerited the bug was also in ext3, and (b) the bug in ext3 had not been noticed in several enterprise distribution testing runs done by Red Hat, SuSE, and IBM --- for well over a **decade**. What this means is that it's hard for *any* file system to be that well tested; it's hard to substitute for years and years of production use, hopefully in systems that have very rigorous monitoring so you would notice if data or file system metadata is getting corrupted in ways that can't be explained as hardware errors. The fact that we found a bug that was never discovered in ext3 after years and years of use in many enterprises is a testimony to that fact. (This is also why the fact that Facebook has started using btrfs in production is going to be a very good thing for btrfs. I'm sure they will find all sorts of problems once they start running at large scale, which is a _good_ thing; that's how those problems get fixed.) Of course, using xfstests certainly helps a lot, and so in my opinion all serious file system developers should be regularly using xfstests as a part of the daily development cycle, and to be be extremely ruthless about not allowing any test regressions. Best regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue, 12 May 2015, Daniel Phillips wrote: On 05/12/2015 02:30 PM, David Lang wrote: On Tue, 12 May 2015, Daniel Phillips wrote: Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Phoronix turns any correction or criticism into an attack. Phoronix gets attacked in an unseemly way by a number of people in the developer community who should behave better. You are doing it yourself, seemingly oblivious to the valuable role that the publication plays in our community. Google for filesystem benchmarks. Where do you find them? Right. Not to mention the Xorg coverage, community issues, etc etc. The last thing we need is a monoculture in Linux news, and we are dangerously close to that now. It's on my 'sites to check daily' list, but they have also had some pretty nasty errors in their benchmarks, some of which have been pointed out repeatedly over the years (doing fsync dependent workloads in situations where one FS actually honors the fsyncs and another doesn't is a classic) So, how is "EXT4 is not as stable or as well tested as most people think" not a cheap shot? By my first hand experience, that claim is absurd. Add to that the first hand experience of roughly two billion other people. Seems to be a bit self serving too, or was that just an accident. I happen to think that it's correct. It's not that Ext4 isn't tested, but that people's expectations of how much it's been tested, and at what scale don't match the reality. You need to get out of the mindset that Ted and Dave are Enemies that you need to overcome, they are friendly competitors, not Enemies. You are wrong about Dave These are not the words of any friend: "I don't think I'm alone in my suspicion that there was something stinky about your numbers." -- Dave Chinner you are looking for offense. That just means that something is wrong with them, not that they were deliberatly falsified. Basically allegations of cheating. And wrong. Maybe Dave just lives in his own dreamworld where everybody is out to get him, so he has to attack people he views as competitors first. you are the one doing the attacking. Please stop. Take a break if needed, and then get back to producing software rather than complaining about how everyone is out to get you. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/12/2015 02:30 PM, David Lang wrote: > On Tue, 12 May 2015, Daniel Phillips wrote: >> Phoronix published a headline that identifies Dave Chinner as >> someone who takes shots at other projects. Seems pretty much on >> the money to me, and it ought to be obvious why he does it. > > Phoronix turns any correction or criticism into an attack. Phoronix gets attacked in an unseemly way by a number of people in the developer community who should behave better. You are doing it yourself, seemingly oblivious to the valuable role that the publication plays in our community. Google for filesystem benchmarks. Where do you find them? Right. Not to mention the Xorg coverage, community issues, etc etc. The last thing we need is a monoculture in Linux news, and we are dangerously close to that now. So, how is "EXT4 is not as stable or as well tested as most people think" not a cheap shot? By my first hand experience, that claim is absurd. Add to that the first hand experience of roughly two billion other people. Seems to be a bit self serving too, or was that just an accident. > You need to get out of the mindset that Ted and Dave are Enemies that you > need to overcome, they are > friendly competitors, not Enemies. You are wrong about Dave These are not the words of any friend: "I don't think I'm alone in my suspicion that there was something stinky about your numbers." -- Dave Chinner Basically allegations of cheating. And wrong. Maybe Dave just lives in his own dreamworld where everybody is out to get him, so he has to attack people he views as competitors first. Ted has more taste and his FUD attack was more artful, but it still amounted to nothing more than piling on, he just picked Dave's straw man uncritically and proceeded to and knock it down some more. Nice way of distracting attention from the fact that we actually did what we claimed, and instead of getting the appropriate recognition for it, we were called cheaters. More or less in so many words, and more subtly by Ted, but the intent is clear and unmistakable. Apologies from both are still in order, but it > They assume that you are working in good faith (but are > inexperienced compared to them), and you need to assume that they are working > in good faith. If they > ever do resort to underhanded means to sabotage you, Linus and the other > kernel developers will take > action. But pointing out limits in your current implementation, problems in > your benchmarks based on > how they are run, and concepts that are going to be difficult to merge is not > underhanded, it's > exactly the type of assistance that you should be greatful for in friendly > competition. > > You were the one who started crowing about how badly XFS performed. Not at all, somebody else posted the terrible XFS benchmark result, then Dave put up a big smokescreen to try to deflect atention from it. There is a term for that kind of logical fallacy: http://en.wikipedia.org/wiki/Proof_by_intimidation Seems to have worked well on you. But after all those words, XFS does not run any faster, and it clearly needs to. Dave gave a long and detailed > explination about the reasons for the differences, and showing benchmarks on > other hardware that > showed that XFS works very well there. That's not an attack on EXT4 (or > Tux3), it's an explination. > The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. >>> >>> The linux development process is making code available, responding to >>> concerns from the experts in >>> the community, and letting the code talk for itself. >> >> Nice idea, but it isn't working. Did you let the code talk to you? >> Right, you let the code talk to Dave Chinner, then you listen to >> what Dave Chinner has to say about it. Any chance that there might >> be some creative licence acting somewhere in that chain? > > I have my own concerns about how things are going to work (I've voiced some > of them), but no, I > haven't tried running Tux3 because you say it's not ready yet. > >>> There have been many people pushing code for inclusion that has not gotten >>> into the kernel, or has >>> not been used by any distros after it's made it into the kernel, in spite >>> of benchmarks being posted >>> that seem to show how wonderful the new code is. ReiserFS was one of the >>> first, and part of what >>> tarnished it's reputation with many people was how much they were pushing >>> the benchmarks that were >>> shown to be faulty (the one I remember most vividly was that the entire >>> benchmark completed in <30 >>> seconds, and they had the FS tuned to not start flushing data to disk for >>> 30 seconds, so the entire >>> 'benchmark' ran out of ram without ever touching the disk) >> >> You know what to do about checking for faulty benchmarks. > > That requires that the
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 12.05.2015 22:54, Daniel Phillips wrote: On 05/12/2015 11:39 AM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: ...it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say "we" when you know that I am just as much a "we" as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM "XFS Developer Takes Shots At Btrfs, EXT4" umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. Perhaps you misunderstood. Linus decides what gets merged. Andrew decides. Greg decides. Dave Chinner does not decide, he just does his level best to create the impression that our project is unfit to merge. Any chance there might be an agenda? Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Maybe Dave has convincing arguments, that have been misinterpreted by that website, which is an interesting but also highliy manipulative publication. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. Nice idea, but it isn't working. Did you let the code talk to you? Right, you let the code talk to Dave Chinner, then you listen to what Dave Chinner has to say about it. Any chance that there might be some creative licence acting somewhere in that chain? We are missing the complete useable thing. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in<30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) You know what to do about checking for faulty benchmarks. So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. Ted and Dave failed to point out any actual problem with any benchmark. They invented issues with benchmarks and promoted those as FUD. In general, benchmarks are a critical issue. In this relation, let me quote Churchill in a derivated way: Do not trust a benchmark that you have not forged yourself. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) Yet he clearly wrote "we" as if he believes he is part of it. Now that ENOSPC is done to a standard way beyond what Btrfs had when it was merged, the next item on the agenda is writeback. That involves us and VFS people as you say, and not Dave Chinner, who only intends to obstruct the process as much as he possibly can. He should get back to work on his own project. Nobody will miss his posts if he doesn't make them. They contribute nothing of value, create a lot of bad blood, and just serve to further besmirch the famously tarnished reputation of LKML. At least, I would miss his contributions, specifically his technical explanations but also his opinions. You know that Tux3 is already fast. Not just that of course. It has a higher standard of data integrity than your metadata-only journalling filesystem and a small enough code base that it can be reasonably expected to reach the quality expected of an enterprise class filesystem, quite possibly before XFS gets there. We wouldn't expect anyone developing a new filesystem to believe any differently. It is not a matter of belief, it is a matter of testable fact. For example, you can count the lines. You can run the same benchmarks. Proving
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue, 12 May 2015, Daniel Phillips wrote: On 05/12/2015 11:39 AM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: ...it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say "we" when you know that I am just as much a "we" as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM "XFS Developer Takes Shots At Btrfs, EXT4" umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. Perhaps you misunderstood. Linus decides what gets merged. Andrew decides. Greg decides. Dave Chinner does not decide, he just does his level best to create the impression that our project is unfit to merge. Any chance there might be an agenda? Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Phoronix turns any correction or criticism into an attack. You need to get out of the mindset that Ted and Dave are Enemies that you need to overcome, they are friendly competitors, not Enemies. They assume that you are working in good faith (but are inexperienced compared to them), and you need to assume that they are working in good faith. If they ever do resort to underhanded means to sabotage you, Linus and the other kernel developers will take action. But pointing out limits in your current implementation, problems in your benchmarks based on how they are run, and concepts that are going to be difficult to merge is not underhanded, it's exactly the type of assistance that you should be greatful for in friendly competition. You were the one who started crowing about how badly XFS performed. Dave gave a long and detailed explination about the reasons for the differences, and showing benchmarks on other hardware that showed that XFS works very well there. That's not an attack on EXT4 (or Tux3), it's an explination. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. Nice idea, but it isn't working. Did you let the code talk to you? Right, you let the code talk to Dave Chinner, then you listen to what Dave Chinner has to say about it. Any chance that there might be some creative licence acting somewhere in that chain? I have my own concerns about how things are going to work (I've voiced some of them), but no, I haven't tried running Tux3 because you say it's not ready yet. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in <30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) You know what to do about checking for faulty benchmarks. That requires that the code be readily available, which last I heard, Tux3 wasn't. Has this been fixed? So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. Ted and Dave failed to point out any actual problem with any benchmark. They invented issues with benchmarks and promoted those as FUD. They pointed out problems with using ramdisk to simulate a SSD and huge differences between spinning rust and an SSD (or disk array). Those aren't FUD. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) Yet he clearly wrote "we" as if he believes he is part of it. He is part of the group of people who use
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/12/2015 11:39 AM, David Lang wrote: > On Mon, 11 May 2015, Daniel Phillips wrote: >>> ...it's the mm and core kernel developers that need to >>> review and accept that code *before* we can consider merging tux3. >> >> Please do not say "we" when you know that I am just as much a "we" >> as you are. Merging Tux3 is not your decision. The people whose >> decision it actually is are perfectly capable of recognizing your >> agenda for what it is. >> >> http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM >> "XFS Developer Takes Shots At Btrfs, EXT4" > > umm, Phoronix has no input on what gets merged into the kernel. they also hae > a reputation for > trying to turn anything into click-bait by making it sound like a fight when > it isn't. Perhaps you misunderstood. Linus decides what gets merged. Andrew decides. Greg decides. Dave Chinner does not decide, he just does his level best to create the impression that our project is unfit to merge. Any chance there might be an agenda? Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. >> The real question is, has the Linux development process become >> so political and toxic that worthwhile projects fail to benefit >> from supposed grassroots community support. You are the poster >> child for that. > > The linux development process is making code available, responding to > concerns from the experts in > the community, and letting the code talk for itself. Nice idea, but it isn't working. Did you let the code talk to you? Right, you let the code talk to Dave Chinner, then you listen to what Dave Chinner has to say about it. Any chance that there might be some creative licence acting somewhere in that chain? > There have been many people pushing code for inclusion that has not gotten > into the kernel, or has > not been used by any distros after it's made it into the kernel, in spite of > benchmarks being posted > that seem to show how wonderful the new code is. ReiserFS was one of the > first, and part of what > tarnished it's reputation with many people was how much they were pushing the > benchmarks that were > shown to be faulty (the one I remember most vividly was that the entire > benchmark completed in <30 > seconds, and they had the FS tuned to not start flushing data to disk for 30 > seconds, so the entire > 'benchmark' ran out of ram without ever touching the disk) You know what to do about checking for faulty benchmarks. > So when Ted and Dave point out problems with the benchmark (the difference in > behavior between a > single spinning disk, different partitions on the same disk, SSDs, and > ramdisks), you would be > better off acknowledging them and if you can't adjust and re-run the > benchmarks, don't start > attacking them as a result. Ted and Dave failed to point out any actual problem with any benchmark. They invented issues with benchmarks and promoted those as FUD. > As Dave says above, it's not the other filesystem people you have to > convince, it's the core VFS and > Memory Mangement folks you have to convince. You may need a little > benchmarking to show that there > is a real advantage to be gained, but the real discussion is going to be on > the impact that page > forking is going to have on everything else (both in complexity and in > performance impact to other > things) Yet he clearly wrote "we" as if he believes he is part of it. Now that ENOSPC is done to a standard way beyond what Btrfs had when it was merged, the next item on the agenda is writeback. That involves us and VFS people as you say, and not Dave Chinner, who only intends to obstruct the process as much as he possibly can. He should get back to work on his own project. Nobody will miss his posts if he doesn't make them. They contribute nothing of value, create a lot of bad blood, and just serve to further besmirch the famously tarnished reputation of LKML. >> You know that Tux3 is already fast. Not just that of course. It >> has a higher standard of data integrity than your metadata-only >> journalling filesystem and a small enough code base that it can >> be reasonably expected to reach the quality expected of an >> enterprise class filesystem, quite possibly before XFS gets >> there. > > We wouldn't expect anyone developing a new filesystem to believe any > differently. It is not a matter of belief, it is a matter of testable fact. For example, you can count the lines. You can run the same benchmarks. Proving the data consistency claims would be a little harder, you need tools for that, and some of those aren't built yet. Or, if you have technical ability, you can read the code and the copious design material that has been posted and convince yourself that, yes, there is something cool here, why didn't anybody do it that way before? But of course that starts to sound like work. Debating nonte
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon, 11 May 2015, Daniel Phillips wrote: On Monday, May 11, 2015 10:38:42 PM PDT, Dave Chinner wrote: I think Ted and I are on the same page here. "Competitive benchmarks" only matter to the people who are trying to sell something. You're trying to sell Tux3, but By "same page", do you mean "transparently obvious about obstructing other projects"? The "except page forking design" statement is your biggest hurdle for getting tux3 merged, not performance. No, the "except page forking design" is because the design is already good and effective. The small adjustments needed in core are well worth merging because the benefits are proved by benchmarks. So benchmarks are key and will not stop just because you don't like the attention they bring to XFS issues. Without page forking, tux3 cannot be merged at all. But it's not filesystem developers you need to convince about the merits of the page forking design and implementation - it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say "we" when you know that I am just as much a "we" as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM "XFS Developer Takes Shots At Btrfs, EXT4" umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in <30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) IOWs, you need to focus on the important things needed to acheive your stated goal of getting tux3 merged. New filesystems should be faster than those based on 20-25 year old designs, so you don't need to waste time trying to convince people that tux3, when complete, will be fast. You know that Tux3 is already fast. Not just that of course. It has a higher standard of data integrity than your metadata-only journalling filesystem and a small enough code base that it can be reasonably expected to reach the quality expected of an enterprise class filesystem, quite possibly before XFS gets there. We wouldn't expect anyone developing a new filesystem to believe any differently. If they didn't believe this, why would they be working on the filesystem instead of just using an existing filesystem. The ugly reality is that everyone's early versions of their new filesystem looks really good. The problem is when they extend it to cover the corner cases and when it gets stressed by real-world (as opposed to benchmark) workloads. This isn't saying that you are wrong in your belief, just that you may not be right, and nobody will know until you are to a usable state and other people can start beating on it. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Am 12.05.2015 06:36, schrieb Daniel Phillips: Hi David, On 05/11/2015 05:12 PM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: On 05/11/2015 03:12 PM, Pavel Machek wrote: It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Umm, are you sure. If "some areas of disk are faster than others" is still true on todays harddrives, the gaps will decrease the performance (as you'll "use up" the fast areas more quickly). That's why I hedged my claim with "similar or identical". The difference in media speed seems to be a relatively small effect compared to extra seeks. It seems that XFS puts big spaces between new directories, and suffers a lot of extra seeks because of it. I propose to batch new directories together initially, then change the allocation goal to a new, relatively empty area if a big batch of files lands on a directory in a crowded region. The "big" gaps would be on the order of delta size, so not really very big. This is an interesting idea, but what happens if the files don't arrive as a big batch, but rather trickle in over time (think a logserver that if putting files into a bunch of directories at a fairly modest rate per directory) If files are trickling in then we can afford to spend a lot more time finding nice places to tuck them in. Log server files are an especially irksome problem for a redirect-on-write filesystem because the final block tends to be rewritten many times and we must move it to a new location each time, so every extent ends up as one block. Oh well. If we just make sure to have some free space at the end of the file that only that file can use (until everywhere else is full) then the long term result will be slightly ravelled blocks that nonetheless tend to be on the same track or flash block as their logically contiguous neighbours. There will be just zero or one empty data blocks mixed into the file tail as we commit the tail block over and over with the same allocation goal. Sometimes there will be a block or two of metadata as well, which will eventually bake themselves into the middle of contiguous data and stop moving around. Putting this together, we have: * At delta flush, break out all the log type files * Dedicate some block groups to append type files * Leave lots of space between files in those block groups * Peek at the last block of the file to set the allocation goal Something like that. What we don't want is to throw those files into the middle of a lot of rewrite-all files, messing up both kinds of file. We don't care much about keeping these files near the parent directory because one big seek per log file in a grep is acceptable, we just need to avoid thousands of big seeks within the file, and not dribble single blocks all over the disk. It would also be nice to merge together extents somehow as the final block is rewritten. One idea is to retain the final block dirty until the next delta, and write it again into a contiguous position, so the final block is always flushed twice. We already have the opportunistic merge logic, but the redirty behavior and making sure it only happens to log files would be a bit fiddly. We will also play the incremental defragmentation card at some point, but first we should try hard to control fragmentation in the first place. Tux3 is well suited to online defragmentation because the delta commit model makes it easy to move things around efficiently and safely, but it does generate extra IO, so as a basic mechanism it is not ideal. When we get to piling on features, that will be high on the list, because it is relatively easy, and having that fallback gives a certain sense of security. So we are again at some more features of SASOS4Fun. Said this, I can see as an alleged troll expert the agenda and strategy behind this and related threads, but still no usable code/file system at all and hence nothing that even might be ready for merging, as I understand the statements of the file system gurus. So it is time for the developer(s) to take decisions, what should be implement respectively manifested in code eventually and then show the complete result, so that others can make the tests and the benchmarks. Thanks Best Regards Do not feed the trolls. C.S. And when you then decide that you
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Daniel Phillips wrote: On 05/12/2015 02:03 AM, Pavel Machek wrote: I'd call system with 65 tasks doing heavy fsync load at the some time "embarrassingly misconfigured" :-). It is nice if your filesystem can stay fast in that case, but... Well, Tux3 wins the fsync race now whether it is 1 task, 64 tasks or 10,000 tasks. At the high end, maybe it is just a curiosity, or maybe it tells us something about how Tux3 is will scale on the big machines that XFS currently lays claim to. And Java programmers are busy doing all kinds of wild and crazy things with lots of tasks. Java almost makes them do it. If they need their data durable then they can easily create loads like my test case. Suppose you have a web server meant to serve 10,000 transactions simultaneously and it needs to survive crashes without losing client state. How will you do it? You could install an expensive, finicky database, or you could write some Java code that happens to work well because Linux has a scheduler and a filesystem that can handle it. Oh wait, we don't have the second one yet, but maybe we soon will. I will not claim that stupidly fast and scalable fsync is the main reason that somebody should want Tux3, however, the lack of a high performance fsync was in fact used as a means of spreading FUD about Tux3, so I had some fun going way beyond the call of duty to answer that. By the way, I am still waiting for the original source of the FUD to concede the point politely, but maybe he is waiting for the code to land, which it still has not as of today, so I guess that is fair. Note that it would have landed quite some time ago if Tux3 was already merged. Well, stupidly fast and scalable fsync sounds wonderful to me; it's the primary pain point in LMDB write performance now. http://symas.com/mdb/ondisk/ I look forward to testing Tux3 when usable code shows up in a public repo. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/12/2015 02:03 AM, Pavel Machek wrote: > On Mon 2015-05-11 19:34:34, Daniel Phillips wrote: >> On 05/11/2015 04:17 PM, Theodore Ts'o wrote: >>> and another way that people >>> doing competitive benchmarking can screw up and produce misleading >>> numbers. >> >> If you think we screwed up or produced misleading numbers, could you >> please be up front about it instead of making insinuations and >> continuing your tirade against benchmarking and those who do it. > > Are not you little harsh with Ted? He was polite. Polite language does not include words like "screw up" and "misleading numbers", those are combative words intended to undermine and disparage. It is not clear how repeating the same words can be construed as less polite than the original utterance. >> The ram disk removes seek overhead and greatly reduces media transfer >> overhead. This does not change things much: it confirms that Tux3 is >> significantly faster than the others at synchronous loads. This is >> apparently true independently of media type, though to be sure SSD >> remains to be tested. >> >> The really interesting result is how much difference there is between >> filesystems, even on a ram disk. Is it just CPU or is it synchronization >> strategy and lock contention? Does our asynchronous front/back design >> actually help a lot, instead of being a disadvantage as you predicted? >> >> It is too bad that fs_mark caps number of tasks at 64, because I am >> sure that some embarrassing behavior would emerge at high task counts, >> as with my tests on spinning disk. > > I'd call system with 65 tasks doing heavy fsync load at the some time > "embarrassingly misconfigured" :-). It is nice if your filesystem can > stay fast in that case, but... Well, Tux3 wins the fsync race now whether it is 1 task, 64 tasks or 10,000 tasks. At the high end, maybe it is just a curiosity, or maybe it tells us something about how Tux3 is will scale on the big machines that XFS currently lays claim to. And Java programmers are busy doing all kinds of wild and crazy things with lots of tasks. Java almost makes them do it. If they need their data durable then they can easily create loads like my test case. Suppose you have a web server meant to serve 10,000 transactions simultaneously and it needs to survive crashes without losing client state. How will you do it? You could install an expensive, finicky database, or you could write some Java code that happens to work well because Linux has a scheduler and a filesystem that can handle it. Oh wait, we don't have the second one yet, but maybe we soon will. I will not claim that stupidly fast and scalable fsync is the main reason that somebody should want Tux3, however, the lack of a high performance fsync was in fact used as a means of spreading FUD about Tux3, so I had some fun going way beyond the call of duty to answer that. By the way, I am still waiting for the original source of the FUD to concede the point politely, but maybe he is waiting for the code to land, which it still has not as of today, so I guess that is fair. Note that it would have landed quite some time ago if Tux3 was already merged. Historical note: didn't Java motivate the O(1) scheduler? Regarda, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon 2015-05-11 19:34:34, Daniel Phillips wrote: > > > On 05/11/2015 04:17 PM, Theodore Ts'o wrote: > > On Tue, May 12, 2015 at 12:12:23AM +0200, Pavel Machek wrote: > >> Umm, are you sure. If "some areas of disk are faster than others" is > >> still true on todays harddrives, the gaps will decrease the > >> performance (as you'll "use up" the fast areas more quickly). > > > > It's still true. The difference between O.D. and I.D. (outer diameter > > vs inner diameter) LBA's is typically a factor of 2. This is why > > "short-stroking" works as a technique, > > That is true, and the effect is not dominant compared to introducing > a lot of extra seeks. > > > and another way that people > > doing competitive benchmarking can screw up and produce misleading > > numbers. > > If you think we screwed up or produced misleading numbers, could you > please be up front about it instead of making insinuations and > continuing your tirade against benchmarking and those who do it. Are not you little harsh with Ted? He was polite. > The ram disk removes seek overhead and greatly reduces media transfer > overhead. This does not change things much: it confirms that Tux3 is > significantly faster than the others at synchronous loads. This is > apparently true independently of media type, though to be sure SSD > remains to be tested. > > The really interesting result is how much difference there is between > filesystems, even on a ram disk. Is it just CPU or is it synchronization > strategy and lock contention? Does our asynchronous front/back design > actually help a lot, instead of being a disadvantage as you predicted? > > It is too bad that fs_mark caps number of tasks at 64, because I am > sure that some embarrassing behavior would emerge at high task counts, > as with my tests on spinning disk. I'd call system with 65 tasks doing heavy fsync load at the some time "embarrassingly misconfigured" :-). It is nice if your filesystem can stay fast in that case, but... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Monday, May 11, 2015 10:38:42 PM PDT, Dave Chinner wrote: > I think Ted and I are on the same page here. "Competitive > benchmarks" only matter to the people who are trying to sell > something. You're trying to sell Tux3, but By "same page", do you mean "transparently obvious about obstructing other projects"? > The "except page forking design" statement is your biggest hurdle > for getting tux3 merged, not performance. No, the "except page forking design" is because the design is already good and effective. The small adjustments needed in core are well worth merging because the benefits are proved by benchmarks. So benchmarks are key and will not stop just because you don't like the attention they bring to XFS issues. > Without page forking, tux3 > cannot be merged at all. But it's not filesystem developers you need > to convince about the merits of the page forking design and > implementation - it's the mm and core kernel developers that need to > review and accept that code *before* we can consider merging tux3. Please do not say "we" when you know that I am just as much a "we" as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM "XFS Developer Takes Shots At Btrfs, EXT4" The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. > IOWs, you need to focus on the important things needed to acheive > your stated goal of getting tux3 merged. New filesystems should be > faster than those based on 20-25 year old designs, so you don't need > to waste time trying to convince people that tux3, when complete, > will be fast. You know that Tux3 is already fast. Not just that of course. It has a higher standard of data integrity than your metadata-only journalling filesystem and a small enough code base that it can be reasonably expected to reach the quality expected of an enterprise class filesystem, quite possibly before XFS gets there. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon, May 11, 2015 at 07:34:34PM -0700, Daniel Phillips wrote: > Anyway, everybody but you loves competitive benchmarks, that is why I I think Ted and I are on the same page here. "Competitive benchmarks" only matter to the people who are trying to sell something. You're trying to sell Tux3, but > post them. They are not only useful for tracking down performance bugs, > but as you point out, they help us advertise the reasons why Tux3 is > interesting and ought to be merged. benchmarks won't get tux3 merged. Addressing the significant issues that have been raised during previous code reviews is what will get it merged. I posted that list elsewhere in this thread which you replied that they were all "on the list of things to do except for the page forking design". The "except page forking design" statement is your biggest hurdle for getting tux3 merged, not performance. Without page forking, tux3 cannot be merged at all. But it's not filesystem developers you need to convince about the merits of the page forking design and implementation - it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. IOWs, you need to focus on the important things needed to acheive your stated goal of getting tux3 merged. New filesystems should be faster than those based on 20-25 year old designs, so you don't need to waste time trying to convince people that tux3, when complete, will be fast. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Hi David, On 05/11/2015 05:12 PM, David Lang wrote: > On Mon, 11 May 2015, Daniel Phillips wrote: > >> On 05/11/2015 03:12 PM, Pavel Machek wrote: > It is a fact of life that when you change one aspect of an intimately > interconnected system, > something else will change as well. You have naive/nonexistent free space > management now; when you > design something workable there it is going to impact everything else > you've already done. It's an > easy bet that the impact will be negative, the only question is to what > degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. >>> >>> Umm, are you sure. If "some areas of disk are faster than others" is >>> still true on todays harddrives, the gaps will decrease the >>> performance (as you'll "use up" the fast areas more quickly). >> >> That's why I hedged my claim with "similar or identical". The >> difference in media speed seems to be a relatively small effect >> compared to extra seeks. It seems that XFS puts big spaces between >> new directories, and suffers a lot of extra seeks because of it. >> I propose to batch new directories together initially, then change >> the allocation goal to a new, relatively empty area if a big batch >> of files lands on a directory in a crowded region. The "big" gaps >> would be on the order of delta size, so not really very big. > > This is an interesting idea, but what happens if the files don't arrive as a > big batch, but rather > trickle in over time (think a logserver that if putting files into a bunch of > directories at a > fairly modest rate per directory) If files are trickling in then we can afford to spend a lot more time finding nice places to tuck them in. Log server files are an especially irksome problem for a redirect-on-write filesystem because the final block tends to be rewritten many times and we must move it to a new location each time, so every extent ends up as one block. Oh well. If we just make sure to have some free space at the end of the file that only that file can use (until everywhere else is full) then the long term result will be slightly ravelled blocks that nonetheless tend to be on the same track or flash block as their logically contiguous neighbours. There will be just zero or one empty data blocks mixed into the file tail as we commit the tail block over and over with the same allocation goal. Sometimes there will be a block or two of metadata as well, which will eventually bake themselves into the middle of contiguous data and stop moving around. Putting this together, we have: * At delta flush, break out all the log type files * Dedicate some block groups to append type files * Leave lots of space between files in those block groups * Peek at the last block of the file to set the allocation goal Something like that. What we don't want is to throw those files into the middle of a lot of rewrite-all files, messing up both kinds of file. We don't care much about keeping these files near the parent directory because one big seek per log file in a grep is acceptable, we just need to avoid thousands of big seeks within the file, and not dribble single blocks all over the disk. It would also be nice to merge together extents somehow as the final block is rewritten. One idea is to retain the final block dirty until the next delta, and write it again into a contiguous position, so the final block is always flushed twice. We already have the opportunistic merge logic, but the redirty behavior and making sure it only happens to log files would be a bit fiddly. We will also play the incremental defragmentation card at some point, but first we should try hard to control fragmentation in the first place. Tux3 is well suited to online defragmentation because the delta commit model makes it easy to move things around efficiently and safely, but it does generate extra IO, so as a basic mechanism it is not ideal. When we get to piling on features, that will be high on the list, because it is relatively easy, and having that fallback gives a certain sense of security. > And when you then decide that you have to move the directory/file info, > doesn't that create a > potentially large amount of unexpected IO that could end up interfering with > what the user is trying > to do? Right, we don't like that and don't plan to rely on it. What we hope for is behavior that, when you slowly stir the pot, tends to improve the layout just as often as it degrades it. It may indeed become harder to find ideal places to put things as time goes by, but we also gain more information to base de
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 05/11/2015 04:17 PM, Theodore Ts'o wrote: > On Tue, May 12, 2015 at 12:12:23AM +0200, Pavel Machek wrote: >> Umm, are you sure. If "some areas of disk are faster than others" is >> still true on todays harddrives, the gaps will decrease the >> performance (as you'll "use up" the fast areas more quickly). > > It's still true. The difference between O.D. and I.D. (outer diameter > vs inner diameter) LBA's is typically a factor of 2. This is why > "short-stroking" works as a technique, That is true, and the effect is not dominant compared to introducing a lot of extra seeks. > and another way that people > doing competitive benchmarking can screw up and produce misleading > numbers. If you think we screwed up or produced misleading numbers, could you please be up front about it instead of making insinuations and continuing your tirade against benchmarking and those who do it. > (If you use partitions instead of the whole disk, you have > to use the same partition in order to make sure you aren't comparing > apples with oranges.) You can rest assured I did exactly that. Somebody complained that things would look much different with seeks factored out, so here are some new "competitive benchmarks" using fs_mark on a ram disk: tasks11664 ext4: 231 2154 5439 btrfs: 152 962 2230 xfs:268 2729 6466 tux3: 315 5529 20301 (Files per second, more is better) The shell commands are: fs_mark -dtest -D5 -N100 -L1 -p5 -r5 -s1048576 -w4096 -n1000 -t1 fs_mark -dtest -D5 -N100 -L1 -p5 -r5 -s65536 -w4096 -n1000 -t16 fs_mark -dtest -D5 -N100 -L1 -p5 -r5 -s4096 -w4096 -n1000 -t64 The ram disk removes seek overhead and greatly reduces media transfer overhead. This does not change things much: it confirms that Tux3 is significantly faster than the others at synchronous loads. This is apparently true independently of media type, though to be sure SSD remains to be tested. The really interesting result is how much difference there is between filesystems, even on a ram disk. Is it just CPU or is it synchronization strategy and lock contention? Does our asynchronous front/back design actually help a lot, instead of being a disadvantage as you predicted? It is too bad that fs_mark caps number of tasks at 64, because I am sure that some embarrassing behavior would emerge at high task counts, as with my tests on spinning disk. Anyway, everybody but you loves competitive benchmarks, that is why I post them. They are not only useful for tracking down performance bugs, but as you point out, they help us advertise the reasons why Tux3 is interesting and ought to be merged. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Mon, 11 May 2015, Daniel Phillips wrote: On 05/11/2015 03:12 PM, Pavel Machek wrote: It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Umm, are you sure. If "some areas of disk are faster than others" is still true on todays harddrives, the gaps will decrease the performance (as you'll "use up" the fast areas more quickly). That's why I hedged my claim with "similar or identical". The difference in media speed seems to be a relatively small effect compared to extra seeks. It seems that XFS puts big spaces between new directories, and suffers a lot of extra seeks because of it. I propose to batch new directories together initially, then change the allocation goal to a new, relatively empty area if a big batch of files lands on a directory in a crowded region. The "big" gaps would be on the order of delta size, so not really very big. This is an interesting idea, but what happens if the files don't arrive as a big batch, but rather trickle in over time (think a logserver that if putting files into a bunch of directories at a fairly modest rate per directory) And when you then decide that you have to move the directory/file info, doesn't that create a potentially large amount of unexpected IO that could end up interfering with what the user is trying to do? David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Hi Pavel, On 05/11/2015 03:12 PM, Pavel Machek wrote: >>> It is a fact of life that when you change one aspect of an intimately >>> interconnected system, >>> something else will change as well. You have naive/nonexistent free space >>> management now; when you >>> design something workable there it is going to impact everything else >>> you've already done. It's an >>> easy bet that the impact will be negative, the only question is to what >>> degree. >> >> You might lose that bet. For example, suppose we do strictly linear >> allocation >> each delta, and just leave nice big gaps between the deltas for future >> expansion. Clearly, we run at similar or identical speed to the current naive >> strategy until we must start filling in the gaps, and at that point our >> layout >> is not any worse than XFS, which started bad and stayed that way. > > Umm, are you sure. If "some areas of disk are faster than others" is > still true on todays harddrives, the gaps will decrease the > performance (as you'll "use up" the fast areas more quickly). That's why I hedged my claim with "similar or identical". The difference in media speed seems to be a relatively small effect compared to extra seeks. It seems that XFS puts big spaces between new directories, and suffers a lot of extra seeks because of it. I propose to batch new directories together initially, then change the allocation goal to a new, relatively empty area if a big batch of files lands on a directory in a crowded region. The "big" gaps would be on the order of delta size, so not really very big. Anyway, some people seem to have pounced on the words "naive" and "linear allocation" and jumped to the conclusion that our whole strategy is naive. Far from it. We don't just throw files randomly at the disk. We sort and partition files and metadata, and we carefully arrange the order of our allocation operations so that linear allocation produces a nice layout for both read and write. This turned out to be so much better than fiddling with the goal of individual allocations that we concluded we would get best results by sticking with linear allocation, but improve our sort step. The new plan is to partition updates into batches according to some affinity metrics, and set the linear allocation goal per batch. So for example, big files and append-type files can get special treatment in separate batches, while files that seem to be related because of having the same directory parent and being written in the same delta will continue to be streamed out using "naive" linear allocation, which is not necessarily as naive as one might think. It will take time and a lot of performance testing to get this right, but nobody should get the idea that it is any inherent design limitation. The opposite is true: we have no restrictions at all in media layout. Compared to Ext4, we do need to address the issue that data moves around when updated. This can cause rapid fragmentation. Btrfs has shown issues with that for big, randomly updated files. We want to fix it without falling back on update-in-place as Btrfs does. Actually, Tux3 already has update-in-place, and unlike Btrfs, we can switch to it for non-empty files. But we think that perfect data isolation per delta is something worth fighting for, and we would rather not force users to fiddle around with mode settings just to make something work as well as it already does on Ext4. We will tackle this issue by partitioning as above, and use a dedicated allocation strategy for such files, which are easy to detect. Metadata moving around per update does not seem to be a problem because it is all single blocks that need very little slack space to stay close to home. > Anyway... you have brand new filesystem. Of course it should be > faster/better/nicer than the existing filesystems. So don't be too > harsh with XFS people. They have done a lot of good work, but they still have a long way to go. I don't see any shame in that. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Tue, May 12, 2015 at 12:12:23AM +0200, Pavel Machek wrote: > Umm, are you sure. If "some areas of disk are faster than others" is > still true on todays harddrives, the gaps will decrease the > performance (as you'll "use up" the fast areas more quickly). It's still true. The difference between O.D. and I.D. (outer diameter vs inner diameter) LBA's is typically a factor of 2. This is why "short-stroking" works as a technique, and another way that people doing competitive benchmarking can screw up and produce misleading numbers. (If you use partitions instead of the whole disk, you have to use the same partition in order to make sure you aren't comparing apples with oranges.) Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Hi! > > It is a fact of life that when you change one aspect of an intimately > > interconnected system, > > something else will change as well. You have naive/nonexistent free space > > management now; when you > > design something workable there it is going to impact everything else > > you've already done. It's an > > easy bet that the impact will be negative, the only question is to what > > degree. > > You might lose that bet. For example, suppose we do strictly linear allocation > each delta, and just leave nice big gaps between the deltas for future > expansion. Clearly, we run at similar or identical speed to the current naive > strategy until we must start filling in the gaps, and at that point our layout > is not any worse than XFS, which started bad and stayed that way. Umm, are you sure. If "some areas of disk are faster than others" is still true on todays harddrives, the gaps will decrease the performance (as you'll "use up" the fast areas more quickly). Anyway... you have brand new filesystem. Of course it should be faster/better/nicer than the existing filesystems. So don't be too harsh with XFS people. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On the 30th of April 2015 17:14, Daniel Phillips wrote: Hallo hardcore coders On 04/30/2015 07:28 AM, Howard Chu wrote: Daniel Phillips wrote: On 04/30/2015 06:48 AM, Mike Galbraith wrote: On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote: On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: Lovely sounding argument, but it is wrong because Tux3 still beats XFS even with seek time factored out of the equation. Hm. Do you have big-storage comparison numbers to back that? I'm no storage guy (waiting for holographic crystal arrays to obsolete all this crap;), but Dave's big-storage guy words made sense to me. This has nothing to do with big storage. The proposition was that seek time is the reason for Tux3's fsync performance. That claim was easily falsified by removing the seek time. Dave's big storage words are there to draw attention away from the fact that XFS ran the Git tests four times slower than Tux3 and three times slower than Ext4. Whatever the big storage excuse is for that, the fact is, XFS obviously sucks at little storage. If you allocate spanning the disk from start of life, you're going to eat seeks that others don't until later. That seemed rather obvious and straight forward. It is a logical falacy. It mixes a grain of truth (spreading all over the disk causes extra seeks) with an obvious falsehood (it is not necessarily the only possible way to avoid long term fragmentation). You're reading into it what isn't there. Spreading over the disk isn't (just) about avoiding fragmentation - it's about delivering consistent and predictable latency. It is undeniable that if you start by only allocating from the fastest portion of the platter, you are going to see performance slow down over time. If you start by spreading allocations across the entire platter, you make the worst-case and average-case latency equal, which is exactly what a lot of folks are looking for. Another fallacy: intentionally running slower than necessary is not necessarily the only way to deliver consistent and predictable latency. Not only that, but intentionally running slower than necessary does not necessarily guarantee performing better than some alternate strategy later. Anyway, let's not be silly. Everybody in the room who wants Git to run 4 times slower with no guarantee of any benefit in the future, please raise your hand. He flat stated that xfs has passable performance on single bit of rust, and openly explained why. I see no misdirection, only some evidence of bad blood between you two. Raising the spectre of theoretical fragmentation issues when we have not even begun that work is a straw man and intellectually dishonest. You have to wonder why he does it. It is destructive to our community image and harmful to progress. It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Now here is where you lose the bet: we already know that linear allocation with wrap ends horribly right? However, as above, we start linear, without compromise, but because of the gaps we leave, we are able to switch to a slower strategy, but not nearly as slow as the ugly tangle we get with simple wrap. So impact over the lifetime of the filesystem is positive, not negative, and what seemed to be self evident to you turns out to be wrong. In short, we would rather deliver as much performance as possible, all the time. I really don't need to think about it very hard to know that is what I want, and what most users want. I will make you a bet in return: when we get to doing that part properly, the quality of the work will be just as high as everything else we have completed so far. Why would we suddenly get lazy? Regards, Daniel -- How? Maybe this is explained and discussed in a new thread about allocation or so. Thanks Best Regards Have fun C.S. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Am Donnerstag, 30. April 2015, 10:57:10 schrieb Theodore Ts'o: > One of the problems is that it's *hard* to get good benchmarking > numbers that take into account file system aging and measure how well > the free space has been fragmented over time. Most of the benchmark > results that I've seen do a really lousy job at this, and the vast > majority don't even try. > > This is one of the reasons why I find head-to-head "competitions" > between file systems to be not very helpful for anything other than > benchmarketing. It's almost certain that the benchmark won't be > "fair" in some way, and it doesn't really matter whether the person > doing the benchmark was doing it with malice aforethought, or was just > incompetent and didn't understand the issues --- or did understand the > issues and didn't really care, because what they _really_ wanted to do > was to market their file system. I agree to that. One benchmark measure one thing, and if its with the fresh filesystem, it does so with a fresh filesystem. Benchmarks that aiming at how to test an aged filesystem are much more expensive in time and resources needed, unless one reuses and aged filesystem image again and again. Thanks for your explainations, Ted, Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Daniel Phillips wrote: On 04/30/2015 07:28 AM, Howard Chu wrote: You're reading into it what isn't there. Spreading over the disk isn't (just) about avoiding fragmentation - it's about delivering consistent and predictable latency. It is undeniable that if you start by only allocating from the fastest portion of the platter, you are going to see performance slow down over time. If you start by spreading allocations across the entire platter, you make the worst-case and average-case latency equal, which is exactly what a lot of folks are looking for. Another fallacy: intentionally running slower than necessary is not necessarily the only way to deliver consistent and predictable latency. Totally agree with you there. Not only that, but intentionally running slower than necessary does not necessarily guarantee performing better than some alternate strategy later. True, it's a question of algorithmic efficiency - does the performance decay linearly or logarithmically. Anyway, let's not be silly. Everybody in the room who wants Git to run 4 times slower with no guarantee of any benefit in the future, please raise your hand. git is an important workload for us as developers, but I don't think that's the only workload that's important for us. He flat stated that xfs has passable performance on single bit of rust, and openly explained why. I see no misdirection, only some evidence of bad blood between you two. Raising the spectre of theoretical fragmentation issues when we have not even begun that work is a straw man and intellectually dishonest. You have to wonder why he does it. It is destructive to our community image and harmful to progress. It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Now here is where you lose the bet: we already know that linear allocation with wrap ends horribly right? However, as above, we start linear, without compromise, but because of the gaps we leave, we are able to switch to a slower strategy, but not nearly as slow as the ugly tangle we get with simple wrap. So impact over the lifetime of the filesystem is positive, not negative, and what seemed to be self evident to you turns out to be wrong. In short, we would rather deliver as much performance as possible, all the time. I really don't need to think about it very hard to know that is what I want, and what most users want. I will make you a bet in return: when we get to doing that part properly, the quality of the work will be just as high as everything else we have completed so far. Why would we suddenly get lazy? I never said anything about getting lazy. You're working in a closed system though. If you run today's version on a system, and then you run your future version on that same hardware, you're doing more CPU work and probably more I/O work to do the more complex space management. It's not quite zero-sum but close enough, when you're talking about highly optimized designs. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Hi Ted, On 04/30/2015 07:57 AM, Theodore Ts'o wrote: > This is one of the reasons why I find head-to-head "competitions" > between file systems to be not very helpful for anything other than > benchmarketing. It's almost certain that the benchmark won't be > "fair" in some way, and it doesn't really matter whether the person > doing the benchmark was doing it with malice aforethought, or was just > incompetent and didn't understand the issues --- or did understand the > issues and didn't really care, because what they _really_ wanted to do > was to market their file system. Your proposition, as I understand it, is that nobody should ever do benchmarks because any benchmark must be one of: 1) malicious; 2) incompetent; or 3) careless. When in fact, a benchmark may be perfectly honest, competently done, and informative. > And even if the benchmark is fair, it might not match up with the end > user's hardware, or their use case. There will always be some use > case where file system A is better than file system B, for pretty much > any file system. Don't get me wrong --- I will do comparisons between > file systems, but only so I can figure out ways of making _my_ file > system better. And more often than not, it's comparisons of the same > file system before and after adding some new feature which is the most > interesting. I cordially invite you to replicate our fsync benchmarks, or invent your own. I am confident that you will find that the numbers are accurate, that the test cases were well chosen, that the results are informative, and that there is no sleight of hand. As for whether or not people should "market" their filesystems as you put it, that is easy for you to disparage when you are the incumbant. If we don't tell people what is great about Tux3 then how will they ever find out? Sure, it might be "advertising", but the important question is, is it _truthful_ advertising? Surely you remember how Linus got started... that was really blatant, and I am glad he did it. >> That are the allocation groups. I always wondered how it can be beneficial >> to spread the allocations onto 4 areas of one partition on expensive seek >> media. Now that makes better sense for me. I always had the gut impression >> that XFS may not be the fastest in all cases, but it is one of the >> filesystem with the most consistent performance over time, but never was >> able to fully explain why that is. > > Yep, pretty much all of the traditional update-in-place file systems > since the BSD FFS have done this, and for the same reason. For COW > file systems which are are constantly moving data and metadata blocks > around, they will need different strategies for trying to avoid the > free space fragmentation problem as the file system ages. Right, different problems, but I have a pretty good idea how to go about it now. I made a failed attempt a while back and learned a lot, my mistake was to try to give every object a fixed home position based on where it was first written and the result was worse for both read and write. Now the interesting thing is, naive linear allocation is great for both read and read, so my effort now is directed towards ways of doing naive linear allocation but choosing carefully which order we do the allocation in. I will keep you posted on how that progresses of course. Anyway, how did we get onto allocation? I thought my post was about fsync, and after all, you are the guest of honor. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 04/30/2015 07:33 AM, Mike Galbraith wrote: > Well ok, let's forget bad blood, straw men... and answering my question > too I suppose. Not having any sexy IO gizmos in my little desktop box, > I don't care deeply which stomps the other flat on beastly boxen. I'm with you, especially the forget bad blood part. I did my time in big storage and I will no doubt do it again, but right now, what I care about is bringing truth and beauty to small storage, which includes that spinning rust of yours and also the cheap SSD you are about to run out and buy. I hope you caught the bit about how Tux3 is doing really well running in tmpfs? According to my calculations, that means good things for SSD performance. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 04/30/2015 07:28 AM, Howard Chu wrote: > Daniel Phillips wrote: >> >> >> On 04/30/2015 06:48 AM, Mike Galbraith wrote: >>> On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote: On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: > On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: > >> Lovely sounding argument, but it is wrong because Tux3 still beats XFS >> even with seek time factored out of the equation. > > Hm. Do you have big-storage comparison numbers to back that? I'm no > storage guy (waiting for holographic crystal arrays to obsolete all this > crap;), but Dave's big-storage guy words made sense to me. This has nothing to do with big storage. The proposition was that seek time is the reason for Tux3's fsync performance. That claim was easily falsified by removing the seek time. Dave's big storage words are there to draw attention away from the fact that XFS ran the Git tests four times slower than Tux3 and three times slower than Ext4. Whatever the big storage excuse is for that, the fact is, XFS obviously sucks at little storage. >>> >>> If you allocate spanning the disk from start of life, you're going to >>> eat seeks that others don't until later. That seemed rather obvious and >>> straight forward. >> >> It is a logical falacy. It mixes a grain of truth (spreading all over the >> disk causes extra seeks) with an obvious falsehood (it is not necessarily >> the only possible way to avoid long term fragmentation). > > You're reading into it what isn't there. Spreading over the disk isn't (just) > about avoiding > fragmentation - it's about delivering consistent and predictable latency. It > is undeniable that if > you start by only allocating from the fastest portion of the platter, you are > going to see > performance slow down over time. If you start by spreading allocations across > the entire platter, > you make the worst-case and average-case latency equal, which is exactly what > a lot of folks are > looking for. Another fallacy: intentionally running slower than necessary is not necessarily the only way to deliver consistent and predictable latency. Not only that, but intentionally running slower than necessary does not necessarily guarantee performing better than some alternate strategy later. Anyway, let's not be silly. Everybody in the room who wants Git to run 4 times slower with no guarantee of any benefit in the future, please raise your hand. >>> He flat stated that xfs has passable performance on >>> single bit of rust, and openly explained why. I see no misdirection, >>> only some evidence of bad blood between you two. >> >> Raising the spectre of theoretical fragmentation issues when we have not >> even begun that work is a straw man and intellectually dishonest. You have >> to wonder why he does it. It is destructive to our community image and >> harmful to progress. > > It is a fact of life that when you change one aspect of an intimately > interconnected system, > something else will change as well. You have naive/nonexistent free space > management now; when you > design something workable there it is going to impact everything else you've > already done. It's an > easy bet that the impact will be negative, the only question is to what > degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Now here is where you lose the bet: we already know that linear allocation with wrap ends horribly right? However, as above, we start linear, without compromise, but because of the gaps we leave, we are able to switch to a slower strategy, but not nearly as slow as the ugly tangle we get with simple wrap. So impact over the lifetime of the filesystem is positive, not negative, and what seemed to be self evident to you turns out to be wrong. In short, we would rather deliver as much performance as possible, all the time. I really don't need to think about it very hard to know that is what I want, and what most users want. I will make you a bet in return: when we get to doing that part properly, the quality of the work will be just as high as everything else we have completed so far. Why would we suddenly get lazy? Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Thu, Apr 30, 2015 at 11:00:05AM +0200, Martin Steigerwald wrote: > > IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and > > the problem goes away. :) > > I am quite surprised that a traditional filesystem that was created in the > age of rotating media does not like this kind of media and even seems to > excel on BTRFS on the new non rotating media available. You shouldn't be surprised; XFS was designed in an era where RAID was extremely important. To this day, on a very large RAID arrays, I'm pretty sure none of the other file systems will come close to touching XFS, because it was optimized by some really, really good file system engineers for that hardware. And while RAID systems are certainly not identical to SSD, the fact that you have multiple disk heads means that a good file system will optimize for that parallelism, and that's how SSD's get their speed (individual SSD channels aren't really all that fast; it's the fast that you can be reading or writing arge numbers of them in parallel that high end flash get their really great performance numbers.) > > Thing is, once you've abused those filesytsems for a couple of > > months, the files in ext4, btrfs and tux3 are not going to be laid > > out perfectly on the outer edge of the disk. They'll be spread all > > over the place and so all the filesystems will be seeing large seeks > > on read. The thing is, XFS will have roughly the same performance as > > when the filesystem is empty because the spreading of the allocation > > allows it to maintain better locality and separation and hence > > doesn't fragment free space nearly as badly as the oher filesystems. > > Free space fragmentation is what leads to performance degradation in > > filesystems, and all the other filesystem will have degraded to be > > *much worse* than XFS. In fact, ext4 doesn't actually lay out things perfectly on the outer edge of the disk either, because we try to do spreading as well. Worse, we use a random algorithm to try to do the spreading, so that means that results from run to run on an empty file system will show a lot more variation. I won't claim that we're best in class with either our spreading techniques or our ability to manage free space fragmentation, although we do a lot of work to manage free space fragmentation as well. One of the problems is that it's *hard* to get good benchmarking numbers that take into account file system aging and measure how well the free space has been fragmented over time. Most of the benchmark results that I've seen do a really lousy job at this, and the vast majority don't even try. This is one of the reasons why I find head-to-head "competitions" between file systems to be not very helpful for anything other than benchmarketing. It's almost certain that the benchmark won't be "fair" in some way, and it doesn't really matter whether the person doing the benchmark was doing it with malice aforethought, or was just incompetent and didn't understand the issues --- or did understand the issues and didn't really care, because what they _really_ wanted to do was to market their file system. And even if the benchmark is fair, it might not match up with the end user's hardware, or their use case. There will always be some use case where file system A is better than file system B, for pretty much any file system. Don't get me wrong --- I will do comparisons between file systems, but only so I can figure out ways of making _my_ file system better. And more often than not, it's comparisons of the same file system before and after adding some new feature which is the most interesting. > That are the allocation groups. I always wondered how it can be beneficial > to spread the allocations onto 4 areas of one partition on expensive seek > media. Now that makes better sense for me. I always had the gut impression > that XFS may not be the fastest in all cases, but it is one of the > filesystem with the most consistent performance over time, but never was > able to fully explain why that is. Yep, pretty much all of the traditional update-in-place file systems since the BSD FFS have done this, and for the same reason. For COW file systems which are are constantly moving data and metadata blocks around, they will need different strategies for trying to avoid the free space fragmentation problem as the file system ages. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Daniel Phillips wrote: On 04/30/2015 06:48 AM, Mike Galbraith wrote: On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote: On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: Lovely sounding argument, but it is wrong because Tux3 still beats XFS even with seek time factored out of the equation. Hm. Do you have big-storage comparison numbers to back that? I'm no storage guy (waiting for holographic crystal arrays to obsolete all this crap;), but Dave's big-storage guy words made sense to me. This has nothing to do with big storage. The proposition was that seek time is the reason for Tux3's fsync performance. That claim was easily falsified by removing the seek time. Dave's big storage words are there to draw attention away from the fact that XFS ran the Git tests four times slower than Tux3 and three times slower than Ext4. Whatever the big storage excuse is for that, the fact is, XFS obviously sucks at little storage. If you allocate spanning the disk from start of life, you're going to eat seeks that others don't until later. That seemed rather obvious and straight forward. It is a logical falacy. It mixes a grain of truth (spreading all over the disk causes extra seeks) with an obvious falsehood (it is not necessarily the only possible way to avoid long term fragmentation). You're reading into it what isn't there. Spreading over the disk isn't (just) about avoiding fragmentation - it's about delivering consistent and predictable latency. It is undeniable that if you start by only allocating from the fastest portion of the platter, you are going to see performance slow down over time. If you start by spreading allocations across the entire platter, you make the worst-case and average-case latency equal, which is exactly what a lot of folks are looking for. He flat stated that xfs has passable performance on single bit of rust, and openly explained why. I see no misdirection, only some evidence of bad blood between you two. Raising the spectre of theoretical fragmentation issues when we have not even begun that work is a straw man and intellectually dishonest. You have to wonder why he does it. It is destructive to our community image and harmful to progress. It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Thu, 2015-04-30 at 07:07 -0700, Daniel Phillips wrote: > > On 04/30/2015 06:48 AM, Mike Galbraith wrote: > > On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote: > >> On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: > >>> On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: > >>> > Lovely sounding argument, but it is wrong because Tux3 still beats XFS > even with seek time factored out of the equation. > >>> > >>> Hm. Do you have big-storage comparison numbers to back that? I'm no > >>> storage guy (waiting for holographic crystal arrays to obsolete all this > >>> crap;), but Dave's big-storage guy words made sense to me. > >> > >> This has nothing to do with big storage. The proposition was that seek > >> time is the reason for Tux3's fsync performance. That claim was easily > >> falsified by removing the seek time. > >> > >> Dave's big storage words are there to draw attention away from the fact > >> that XFS ran the Git tests four times slower than Tux3 and three times > >> slower than Ext4. Whatever the big storage excuse is for that, the fact > >> is, XFS obviously sucks at little storage. > > > > If you allocate spanning the disk from start of life, you're going to > > eat seeks that others don't until later. That seemed rather obvious and > > straight forward. > > It is a logical falacy. It mixes a grain of truth (spreading all over the > disk causes extra seeks) with an obvious falsehood (it is not necessarily > the only possible way to avoid long term fragmentation). Shrug, but seems it is a solution, and more importantly, an implemented solution. What I gleaned up as a layman reader is that xfs has no fragmentation issue, but tux3 still does. It doesn't seem right to slam xfs for a conscious design decision unless tux3 can proudly display its superior solution, which I gathered doesn't yet exist. > > He flat stated that xfs has passable performance on > > single bit of rust, and openly explained why. I see no misdirection, > > only some evidence of bad blood between you two. > > Raising the spectre of theoretical fragmentation issues when we have not > even begun that work is a straw man and intellectually dishonest. You have > to wonder why he does it. It is destructive to our community image and > harmful to progress. Well ok, let's forget bad blood, straw men... and answering my question too I suppose. Not having any sexy IO gizmos in my little desktop box, I don't care deeply which stomps the other flat on beastly boxen. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 04/30/2015 06:48 AM, Mike Galbraith wrote: > On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote: >> On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: >>> On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: >>> Lovely sounding argument, but it is wrong because Tux3 still beats XFS even with seek time factored out of the equation. >>> >>> Hm. Do you have big-storage comparison numbers to back that? I'm no >>> storage guy (waiting for holographic crystal arrays to obsolete all this >>> crap;), but Dave's big-storage guy words made sense to me. >> >> This has nothing to do with big storage. The proposition was that seek >> time is the reason for Tux3's fsync performance. That claim was easily >> falsified by removing the seek time. >> >> Dave's big storage words are there to draw attention away from the fact >> that XFS ran the Git tests four times slower than Tux3 and three times >> slower than Ext4. Whatever the big storage excuse is for that, the fact >> is, XFS obviously sucks at little storage. > > If you allocate spanning the disk from start of life, you're going to > eat seeks that others don't until later. That seemed rather obvious and > straight forward. It is a logical falacy. It mixes a grain of truth (spreading all over the disk causes extra seeks) with an obvious falsehood (it is not necessarily the only possible way to avoid long term fragmentation). > He flat stated that xfs has passable performance on > single bit of rust, and openly explained why. I see no misdirection, > only some evidence of bad blood between you two. Raising the spectre of theoretical fragmentation issues when we have not even begun that work is a straw man and intellectually dishonest. You have to wonder why he does it. It is destructive to our community image and harmful to progress. > No, I won't be switching to xfs any time soon, but then it would take a > hell of a lot of evidence to get me to move away from ext4. I trust > ext[n] deeply because it has proven many times over the years that it > can take one hell of a lot (of self inflicted wounds;). Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote: > On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: > > On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: > > > >> Lovely sounding argument, but it is wrong because Tux3 still beats XFS > >> even with seek time factored out of the equation. > > > > Hm. Do you have big-storage comparison numbers to back that? I'm no > > storage guy (waiting for holographic crystal arrays to obsolete all this > > crap;), but Dave's big-storage guy words made sense to me. > > This has nothing to do with big storage. The proposition was that seek > time is the reason for Tux3's fsync performance. That claim was easily > falsified by removing the seek time. > > Dave's big storage words are there to draw attention away from the fact > that XFS ran the Git tests four times slower than Tux3 and three times > slower than Ext4. Whatever the big storage excuse is for that, the fact > is, XFS obviously sucks at little storage. If you allocate spanning the disk from start of life, you're going to eat seeks that others don't until later. That seemed rather obvious and straight forward. He flat stated that xfs has passable performance on single bit of rust, and openly explained why. I see no misdirection, only some evidence of bad blood between you two. No, I won't be switching to xfs any time soon, but then it would take a hell of a lot of evidence to get me to move away from ext4. I trust ext[n] deeply because it has proven many times over the years that it can take one hell of a lot (of self inflicted wounds;). -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: Lovely sounding argument, but it is wrong because Tux3 still beats XFS even with seek time factored out of the equation. Hm. Do you have big-storage comparison numbers to back that? I'm no storage guy (waiting for holographic crystal arrays to obsolete all this crap;), but Dave's big-storage guy words made sense to me. This has nothing to do with big storage. The proposition was that seek time is the reason for Tux3's fsync performance. That claim was easily falsified by removing the seek time. Dave's big storage words are there to draw attention away from the fact that XFS ran the Git tests four times slower than Tux3 and three times slower than Ext4. Whatever the big storage excuse is for that, the fact is, XFS obviously sucks at little storage. He also posted nonsense: "XFS, however, will spread the load across many (if not all) of the disks, and so effectively reduce the average seek time by the number of disks doing concurrent IO." False. No matter how big an array of spinning disks you have, seek latency and synchronous write latency stay the same. It is just an attempt to bamboozle you. If instead he had talked about throughput, he would have a point. But he didn't, because he knows that does not help his argument. If fsync sucks on one disk, it will suck just as much on a thousand disks. The talk about filling up from the outside of disk is disingenuous. Dave should know that Ext4 does not do that, it spreads out allocations exactly to give good aging, and it does deliver that - Ext4's aging performance is second to none. What XFS does is just stupid, and instead of admitting that and fixing it, Dave claims it would be great if the disk was an array or an SSD instead of what it actually is. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: > Lovely sounding argument, but it is wrong because Tux3 still beats XFS > even with seek time factored out of the equation. Hm. Do you have big-storage comparison numbers to back that? I'm no storage guy (waiting for holographic crystal arrays to obsolete all this crap;), but Dave's big-storage guy words made sense to me. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Wednesday, April 29, 2015 5:20:08 PM PDT, Dave Chinner wrote: It's easy to be fast on empty filesystems. XFS does not aim to be fast in such situations - it aims to have consistent performance across the life of the filesystem. In this case, ext4, btrfs and tux3 have optimal allocation filling from the outside of the disk, while XFS is spreading the files across (at least) 4 separate regions of the whole disk. Hence XFS is seeing seek times on read are much larger than the other filesystems when the filesystem is empty as it is doing full disk seeks rather than being confined to the outer edges of spindle. Thing is, once you've abused those filesytsems for a couple of months, the files in ext4, btrfs and tux3 are not going to be laid out perfectly on the outer edge of the disk. They'll be spread all over the place and so all the filesystems will be seeing large seeks on read. The thing is, XFS will have roughly the same performance as when the filesystem is empty because the spreading of the allocation allows it to maintain better locality and separation and hence doesn't fragment free space nearly as badly as the oher filesystems. Free space fragmentation is what leads to performance degradation in filesystems, and all the other filesystem will have degraded to be *much worse* than XFS. Put simply: empty filesystem benchmarking does not show the real performance of the filesystem under sustained production workloads. Hence benchmarks like this - while interesting from a theoretical point of view and are widely used for bragging about whose got the fastest - are mostly irrelevant to determining how the filesystem will perform in production environments. We can also look at this algorithm in a different way: take a large filesystem (say a few hundred TB) across a few tens of disks in a linear concat. ext4, btrfs and tux3 will only hit the first disk in the concat, and so go no faster because they are still bound by physical seek times. XFS, however, will spread the load across many (if not all) of the disks, and so effectively reduce the average seek time by the number of disks doing concurrent IO. Then you'll see that application level IO concurrency becomes the performance limitation, not the physical seek time of the hardware. IOWs, what you don't see here is that the XFS algorithms that make your test slow will keep *lots* of disks busy. i.e. testing empty filesystem performance a single, slow disk demonstrates that an algorithm designed for scalability isn't designed to acheive physical seek distance minimisation. Hence your storage makes XFS look particularly poor in comparison to filesystems that are being designed and optimised for the limitations of single slow spindles... To further demonstrate that it is physical seek distance that is the issue here, lets take the seek time out of the equation (e.g. use a SSD). Doing that will result in basically no difference in performance between all 4 filesystems as performance will now be determined by application level concurrency and that is the same for all tests. Lovely sounding argument, but it is wrong because Tux3 still beats XFS even with seek time factored out of the equation. Even with SSD, if you just go splattering files all over the disk you will pay for it in latency and lifetime when the disk goes into continuous erase and your messy layout causes write multiplication. But of course you can design your filesystem any way you want. Tux3 is designed to be fast on the hardware that people actually have. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Am Donnerstag, 30. April 2015, 10:20:08 schrieb Dave Chinner: > On Wed, Apr 29, 2015 at 09:05:26PM +0200, Mike Galbraith wrote: > > Here's something that _might_ interest xfs folks. > > > > cd git (source repository of git itself) > > make clean > > echo 3 > /proc/sys/vm/drop_caches > > time make -j8 test > > > > ext42m20.721s > > xfs 6m41.887s <-- ick > > btrfs 1m32.038s > > tux31m30.262s > > > > Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test. > > TL;DR: Results are *very different* on a 256GB Samsung 840 EVO SSD > with slightly slower CPUs (E5-4620 @ 2.20GHz)i, all filesystems > using defaults: > > realusersys > xfs 3m16.138s 7m8.341s14m32.462s > ext4 3m18.045s 7m7.840s14m32.994s > btrfs 3m45.149s 7m10.184s 16m30.498s > > What you are seeing is physical seek distances impacting read > performance. XFS does not optimise for minimal physical seek > distance, and hence is slower than filesytsems that do optimise for > minimal seek distance. This shows up especially well on slow single > spindles. > > XFS is *adequate* for the use on slow single drives, but it is > really designed for best performance on storage hardware that is not > seek distance sensitive. > > IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and > the problem goes away. :) I am quite surprised that a traditional filesystem that was created in the age of rotating media does not like this kind of media and even seems to excel on BTRFS on the new non rotating media available. But… > > > And now in more detail. > > It's easy to be fast on empty filesystems. XFS does not aim to be > fast in such situations - it aims to have consistent performance > across the life of the filesystem. … this is a quite important addition. > Thing is, once you've abused those filesytsems for a couple of > months, the files in ext4, btrfs and tux3 are not going to be laid > out perfectly on the outer edge of the disk. They'll be spread all > over the place and so all the filesystems will be seeing large seeks > on read. The thing is, XFS will have roughly the same performance as > when the filesystem is empty because the spreading of the allocation > allows it to maintain better locality and separation and hence > doesn't fragment free space nearly as badly as the oher filesystems. > Free space fragmentation is what leads to performance degradation in > filesystems, and all the other filesystem will have degraded to be > *much worse* than XFS. I even still see hungs on what I tend to see as freespace fragmentation in BTRFS. My /home on a Dual (!) BTRFS SSD setup can basically stall to a halt when it has reserved all space of the device for chunks. So this merkaba:~> btrfs fi sh /home Label: 'home' uuid: […] Total devices 2 FS bytes used 129.48GiB devid1 size 170.00GiB used 146.03GiB path /dev/mapper/msata- home devid2 size 170.00GiB used 146.03GiB path /dev/mapper/sata- home Btrfs v3.18 merkaba:~> btrfs fi df /home Data, RAID1: total=142.00GiB, used=126.72GiB System, RAID1: total=32.00MiB, used=48.00KiB Metadata, RAID1: total=4.00GiB, used=2.76GiB GlobalReserve, single: total=512.00MiB, used=0.00B is safe, but one I have size 170 GiB user 170 GiB, even if inside the chunks there is enough free space to allocate from, enough as in 30-40 GiB, it can happen that writes are stalled up to the point that applications on the desktop freeze and I see hung task messages in kernel log. This is the case upto kernel 4.0. I have seen Chris Mason fixing some write stalls for big facebook setups, maybe it will help here, but unless this issue is fixed, I think BTRFS is not yet fully production ready, unless you leave *huge* amount of free space, as in for 200 GiB of data you want to write make a 400 GiB volume. > Put simply: empty filesystem benchmarking does not show the real > performance of the filesystem under sustained production workloads. > Hence benchmarks like this - while interesting from a theoretical > point of view and are widely used for bragging about whose got the > fastest - are mostly irrelevant to determining how the filesystem > will perform in production environments. > > We can also look at this algorithm in a different way: take a large > filesystem (say a few hundred TB) across a few tens of disks in a > linear concat. ext4, btrfs and tux3 will only hit the first disk in > the concat, and so go no faster because they are still bound by > physical seek times. XFS, however, will spread the load across many > (if not all) of the disks, and so effectively reduce the average > seek time by the number of disks doing concurrent IO. Then you'll > see that application level IO concurrency becomes the performance > limitation, not the physical seek time of the hardware. That are the allocation groups. I always wondered how it can be beneficial to spread the allocation
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Wed, 2015-04-29 at 14:12 -0700, Daniel Phillips wrote: > Btrfs appears to optimize tiny files by storing them in its big btree, > the equivalent of our itree, and Tux3 doesn't do that yet, so we are a > bit hobbled for a make load. That's not a build load, it's a git load. btrfs beat all others at the various git/quilt things I tried (since that's what I do lots of in real life), but tux3 looked quite good too. As Dave noted though, an orchard produces oodles of apples over its lifetime, these shiny new apples may lose luster over time ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Thu, 2015-04-30 at 10:20 +1000, Dave Chinner wrote: > IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and > the problem goes away. :) I'd love to. Too bad sorry sack of sh*t MB manufacturer only applied _connectors_ to 4 of 6 available ports, and they're all in use :) > > > And now in more detail. Thanks for those details, made perfect sense. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Wed, Apr 29, 2015 at 09:05:26PM +0200, Mike Galbraith wrote: > Here's something that _might_ interest xfs folks. > > cd git (source repository of git itself) > make clean > echo 3 > /proc/sys/vm/drop_caches > time make -j8 test > > ext42m20.721s > xfs 6m41.887s <-- ick > btrfs 1m32.038s > tux31m30.262s > > Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test. TL;DR: Results are *very different* on a 256GB Samsung 840 EVO SSD with slightly slower CPUs (E5-4620 @ 2.20GHz)i, all filesystems using defaults: realusersys xfs 3m16.138s 7m8.341s14m32.462s ext43m18.045s 7m7.840s14m32.994s btrfs 3m45.149s 7m10.184s 16m30.498s What you are seeing is physical seek distances impacting read performance. XFS does not optimise for minimal physical seek distance, and hence is slower than filesytsems that do optimise for minimal seek distance. This shows up especially well on slow single spindles. XFS is *adequate* for the use on slow single drives, but it is really designed for best performance on storage hardware that is not seek distance sensitive. IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and the problem goes away. :) And now in more detail. It's easy to be fast on empty filesystems. XFS does not aim to be fast in such situations - it aims to have consistent performance across the life of the filesystem. In this case, ext4, btrfs and tux3 have optimal allocation filling from the outside of the disk, while XFS is spreading the files across (at least) 4 separate regions of the whole disk. Hence XFS is seeing seek times on read are much larger than the other filesystems when the filesystem is empty as it is doing full disk seeks rather than being confined to the outer edges of spindle. Thing is, once you've abused those filesytsems for a couple of months, the files in ext4, btrfs and tux3 are not going to be laid out perfectly on the outer edge of the disk. They'll be spread all over the place and so all the filesystems will be seeing large seeks on read. The thing is, XFS will have roughly the same performance as when the filesystem is empty because the spreading of the allocation allows it to maintain better locality and separation and hence doesn't fragment free space nearly as badly as the oher filesystems. Free space fragmentation is what leads to performance degradation in filesystems, and all the other filesystem will have degraded to be *much worse* than XFS. Put simply: empty filesystem benchmarking does not show the real performance of the filesystem under sustained production workloads. Hence benchmarks like this - while interesting from a theoretical point of view and are widely used for bragging about whose got the fastest - are mostly irrelevant to determining how the filesystem will perform in production environments. We can also look at this algorithm in a different way: take a large filesystem (say a few hundred TB) across a few tens of disks in a linear concat. ext4, btrfs and tux3 will only hit the first disk in the concat, and so go no faster because they are still bound by physical seek times. XFS, however, will spread the load across many (if not all) of the disks, and so effectively reduce the average seek time by the number of disks doing concurrent IO. Then you'll see that application level IO concurrency becomes the performance limitation, not the physical seek time of the hardware. IOWs, what you don't see here is that the XFS algorithms that make your test slow will keep *lots* of disks busy. i.e. testing empty filesystem performance a single, slow disk demonstrates that an algorithm designed for scalability isn't designed to acheive physical seek distance minimisation. Hence your storage makes XFS look particularly poor in comparison to filesystems that are being designed and optimised for the limitations of single slow spindles... To further demonstrate that it is physical seek distance that is the issue here, lets take the seek time out of the equation (e.g. use a SSD). Doing that will result in basically no difference in performance between all 4 filesystems as performance will now be determined by application level concurrency and that is the same for all tests. e.g. on a 16p, 16GB RAM VM with storage on a SSDs a "make -j 8" compile test on a kernel source tree (using my normal test machine .config) gives: realusersys xfs:4m6.723s26m21.087s 2m49.426s ext4: 4m11.415s 26m21.122s 2m49.786s btrfs: 4m8.118s26m26.440s 2m50.357s i.e. take seek times out of the picture, and XFS is just as fast as any of the other filesystems. Just about everyone I know uses SSDs in their laptops and machines that build kernels these days, and spinning disks are rapidly disappearing from enterprise and HPC environments which also happens to be the target markets for XFS. Hence
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On Wednesday, April 29, 2015 12:05:26 PM PDT, Mike Galbraith wrote: Here's something that _might_ interest xfs folks. cd git (source repository of git itself) make clean echo 3 > /proc/sys/vm/drop_caches time make -j8 test ext42m20.721s xfs 6m41.887s <-- ick btrfs 1m32.038s tux31m30.262s Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test. Are defaults for mkfs.xfs such that nobody sane uses them, or does xfs really hate whatever git selftests are doing this much? I'm more interested in the fact that we eked out a win :) Btrfs appears to optimize tiny files by storing them in its big btree, the equivalent of our itree, and Tux3 doesn't do that yet, so we are a bit hobbled for a make load. Eventually, that gap should widen. The pattern I noticed where the write-anywhere designs are beating the journal designs seems to continue here. I am sure there are exceptions, but maybe it is a real thing. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 2015-04-29 15:05, Mike Galbraith wrote: Here's something that _might_ interest xfs folks. cd git (source repository of git itself) make clean echo 3 > /proc/sys/vm/drop_caches time make -j8 test ext42m20.721s xfs 6m41.887s <-- ick btrfs 1m32.038s tux31m30.262s Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test. Are defaults for mkfs.xfs such that nobody sane uses them, or does xfs really hate whatever git selftests are doing this much? -Mike I've been using the defaults for it and have been perfectly happy, although I do use a few non-default mount options (like noatime and noquota). It may just be a factor of what exactly the tests are doing. Based on my experience, xfs _is_ better performance wise with a few large files instead of a lot of small ones when used with the default mkfs options. Of course, my uses for it are more focused on stability and reliability than performance (my primary use for XFS is /boot, and I use BTRFS for pretty much everything else). smime.p7s Description: S/MIME Cryptographic Signature