subject:"\"Re\\\: xfs\\\: does mkfs.xfs require fancy switches to get decent performance\\\? \\\(was Tux3 Report\\\: How fast can we fsync\\\?\\\)\""

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Martin Steigerwald

Am Mittwoch, 13. Mai 2015, 13:38:24 schrieb Daniel Phillips:
> On Wednesday, May 13, 2015 1:25:38 PM PDT, Martin Steigerwald wrote:
> > Am Mittwoch, 13. Mai 2015, 12:37:41 schrieb Daniel Phillips:
> >> On 05/13/2015 12:09 PM, Martin Steigerwald wrote: ...
> > 
> > Daniel, if you want to change the process of patch review and
> > inclusion into
> > the kernel, model an example of how you would like it to be. This has
> > way
> > better chances to inspire others to change their behaviors themselves
> > than accusing them of bad faith.
> > 
> > Its yours to choose.
> > 
> > What outcome do you want to create?
> 
> The outcome I would like is:
> 
>   * Everybody has a good think about what has gone wrong in the past,
> not only with troublesome submitters, but with mutual respect and
> collegial conduct.
> 
>   * Tux3 is merged on its merits so we get more developers and
> testers and move it along faster.
> 
>   * I left LKML better than I found it.
> 
>   * Group hugs
> 
> Well, group hugs are optional, that one would be situational.

Great stuff!

Looking forward to it.

Thank you,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Daniel Phillips


On Wednesday, May 13, 2015 1:25:38 PM PDT, Martin Steigerwald wrote:

Am Mittwoch, 13. Mai 2015, 12:37:41 schrieb Daniel Phillips:

On 05/13/2015 12:09 PM, Martin Steigerwald wrote: ...


Daniel, if you want to change the process of patch review and 
inclusion into 
the kernel, model an example of how you would like it to be. This has way 
better chances to inspire others to change their behaviors themselves than 
accusing them of bad faith.


Its yours to choose. 


What outcome do you want to create?


The outcome I would like is:

 * Everybody has a good think about what has gone wrong in the past,
   not only with troublesome submitters, but with mutual respect and
   collegial conduct.

 * Tux3 is merged on its merits so we get more developers and
   testers and move it along faster.

 * I left LKML better than I found it.

 * Group hugs

Well, group hugs are optional, that one would be situational.

Regards,

Daniel
   


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Martin Steigerwald

Am Mittwoch, 13. Mai 2015, 12:37:41 schrieb Daniel Phillips:
> On 05/13/2015 12:09 PM, Martin Steigerwald wrote:
> > Daniel, what are you trying to achieve here?
> > 
> > I thought you wanted to create interest for your filesystem and
> > acceptance for merging it.
> > 
> > What I see you are actually creating tough is something different.
> > 
> > Is what you see after you send your mails really what you want to see?
> > If
> > not… why not? And if you seek change, where can you create change?
> 
> That is the question indeed, whether to try and change the system
> while merging, or just keep smiling and get the job done. The problem
> is, I am just too stupid to realize that I can't change the system,
> which is famously unpleasant for submitters.
> 
> > I really like to see Tux3 inside the kernel for easier testing, yet I
> > also see that the way you, in your oppinion, "defend" it, does not seem
> > to move that goal any closer, quite the opposite. It triggers polarity
> > and resistance.
> > 
> > I believe it to be more productive to work together with the people who
> > will decide about what goes into the kernel and the people whose
> > oppinions are respected by them, instead of against them.
> 
> Obviously true.
> 
> > "Assume good faith" can help here. No amount of accusing people of bad
> > intention will change them. The only thing you have the power to change
> > is your approach. You absolutely and ultimately do not have the power
> > to change other people. You can´t force Tux3 in by sheer willpower or
> > attacking people.
> > 
> > On any account for anyone discussing here: I believe that any personal
> > attacks, counter-attacks or "you are wrong" kind of speech will not help
> > to move this discussion out of the circling it seems to be in at the
> > moment.
> Thanks for the sane commentary. I have the power to change my behavior.
> But if nobody else changes their behavior, the process remains just as
> unpleasant for us as it ever was (not just me!). Obviously, this is
> not the first time I have been through this, and it has never been
> pleasant. After a while, contributors just get tired of the grind and
> move on to something more fun. I know I did, and I am far from the
> only one.

Daniel, if you want to change the process of patch review and inclusion into 
the kernel, model an example of how you would like it to be. This has way 
better chances to inspire others to change their behaviors themselves than 
accusing them of bad faith.

Its yours to choose. 

What outcome do you want to create?

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Daniel Phillips


On Wednesday, May 13, 2015 1:02:34 PM PDT, Jeremy Allison wrote:

On Wed, May 13, 2015 at 12:37:41PM -0700, Daniel Phillips wrote:

On 05/13/2015 12:09 PM, Martin Steigerwald wrote:
 ...


Daniel, please listen to Martin. He speaks a fundamental truth
here.

As you know, I am also interested in Tux3, and would love to
see it as a filesystem option for NAS servers using Samba. But
please think about the way you're interacting with people on the
list, and whether that makes this outcome more or less likely.


Thanks Jeremy, that means more from you than anyone.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Jeremy Allison

On Wed, May 13, 2015 at 12:37:41PM -0700, Daniel Phillips wrote:
> On 05/13/2015 12:09 PM, Martin Steigerwald wrote:
> 
> > "Assume good faith" can help here. No amount of accusing people of bad 
> > intention will change them. The only thing you have the power to change is 
> > your approach. You absolutely and ultimately do not have the power to 
> > change 
> > other people. You can´t force Tux3 in by sheer willpower or attacking 
> > people.
> > 
> > On any account for anyone discussing here: I believe that any personal 
> > attacks, counter-attacks or "you are wrong" kind of speech will not help to 
> > move this discussion out of the circling it seems to be in at the moment.
> 
> Thanks for the sane commentary. I have the power to change my behavior.
> But if nobody else changes their behavior, the process remains just as
> unpleasant for us as it ever was (not just me!). Obviously, this is
> not the first time I have been through this, and it has never been
> pleasant. After a while, contributors just get tired of the grind and
> move on to something more fun. I know I did, and I am far from the
> only one.

Daniel, please listen to Martin. He speaks a fundamental truth
here.

As you know, I am also interested in Tux3, and would love to
see it as a filesystem option for NAS servers using Samba. But
please think about the way you're interacting with people on the
list, and whether that makes this outcome more or less likely.

Cheers,

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Daniel Phillips

On 05/13/2015 12:09 PM, Martin Steigerwald wrote:
> Daniel, what are you trying to achieve here?
> 
> I thought you wanted to create interest for your filesystem and acceptance 
> for merging it.
> 
> What I see you are actually creating tough is something different.
> 
> Is what you see after you send your mails really what you want to see? If 
> not… why not? And if you seek change, where can you create change?

That is the question indeed, whether to try and change the system
while merging, or just keep smiling and get the job done. The problem
is, I am just too stupid to realize that I can't change the system,
which is famously unpleasant for submitters.

> I really like to see Tux3 inside the kernel for easier testing, yet I also 
> see that the way you, in your oppinion, "defend" it, does not seem to move 
> that goal any closer, quite the opposite. It triggers polarity and 
> resistance.
> 
> I believe it to be more productive to work together with the people who will 
> decide about what goes into the kernel and the people whose oppinions are 
> respected by them, instead of against them.

Obviously true.

> "Assume good faith" can help here. No amount of accusing people of bad 
> intention will change them. The only thing you have the power to change is 
> your approach. You absolutely and ultimately do not have the power to change 
> other people. You can´t force Tux3 in by sheer willpower or attacking 
> people.
> 
> On any account for anyone discussing here: I believe that any personal 
> attacks, counter-attacks or "you are wrong" kind of speech will not help to 
> move this discussion out of the circling it seems to be in at the moment.

Thanks for the sane commentary. I have the power to change my behavior.
But if nobody else changes their behavior, the process remains just as
unpleasant for us as it ever was (not just me!). Obviously, this is
not the first time I have been through this, and it has never been
pleasant. After a while, contributors just get tired of the grind and
move on to something more fun. I know I did, and I am far from the
only one.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Martin Steigerwald

Am Dienstag, 12. Mai 2015, 18:26:28 schrieb Daniel Phillips:
> On 05/12/2015 03:35 PM, David Lang wrote:
> > On Tue, 12 May 2015, Daniel Phillips wrote:
> >> On 05/12/2015 02:30 PM, David Lang wrote:
> >>> You need to get out of the mindset that Ted and Dave are Enemies that
> >>> you need to overcome, they are friendly competitors, not Enemies.
> >> 
> >> You are wrong about Dave These are not the words of any friend:
> >>   "I don't think I'm alone in my suspicion that there was something
> >>   stinky about your numbers." -- Dave Chinner
> >
> > 
> >
> > you are looking for offense. That just means that something is wrong
> > with them, not that they were deliberatly falsified.
> 
> I am not mistaken. Dave made sure to eliminate any doubt about
> what he meant. He said "Oh, so nicely contrived. But terribly
> obvious now that I've found it" among other things.

Daniel, what are you trying to achieve here?

I thought you wanted to create interest for your filesystem and acceptance 
for merging it.

What I see you are actually creating tough is something different.

Is what you see after you send your mails really what you want to see? If 
not… why not? And if you seek change, where can you create change?

I really like to see Tux3 inside the kernel for easier testing, yet I also 
see that the way you, in your oppinion, "defend" it, does not seem to move 
that goal any closer, quite the opposite. It triggers polarity and 
resistance.

I believe it to be more productive to work together with the people who will 
decide about what goes into the kernel and the people whose oppinions are 
respected by them, instead of against them.

"Assume good faith" can help here. No amount of accusing people of bad 
intention will change them. The only thing you have the power to change is 
your approach. You absolutely and ultimately do not have the power to change 
other people. You can´t force Tux3 in by sheer willpower or attacking 
people.

On any account for anyone discussing here: I believe that any personal 
attacks, counter-attacks or "you are wrong" kind of speech will not help to 
move this discussion out of the circling it seems to be in at the moment.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Daniel Phillips

On 05/13/2015 06:08 AM, Mike Galbraith wrote:
> On Wed, 2015-05-13 at 04:31 -0700, Daniel Phillips wrote:
>> Third possibility: build from our repository, as Mike did.
> 
> Sorry about that folks.  I've lost all interest, it won't happen again.

Thanks for your valuable contribution. Now we are seeing a steady
of stream of people heading to the repository, after you showed
it could be done.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Mike Galbraith

On Wed, 2015-05-13 at 04:31 -0700, Daniel Phillips wrote:

> Third possibility: build from our repository, as Mike did.

Sorry about that folks.  I've lost all interest, it won't happen again.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Daniel Phillips

On 05/13/2015 04:31 AM, Daniel Phillips wrote:
Let me be the first to catch that arithmetic error

> Let's say our delta size is 400MB (typical under load) and we leave
> a "nice big gap" of 112 MB after flushing each one. Let's say we do
> two thousand of those before deciding that we have enough information
> available to switch to some smarter strategy. We used one GB of a
> a 4TB disk, say. The media transfer rate decreased by a factor of:
> 
> (1 - 2/1000) = .2%.

Ahem, no, we used 1/8th of the disk. The time/data rate increased
from unity to 1.125, for an average of 1.0625 across the region.
If we only use 1/10th of the disk instead, by not leaving gaps,
then the average time/data across the region is 1.05. The
difference is, 1.0625 - 1.05, so the gap strategy increases media
transfer time by 1.25%, which is not significant compared to the
performance deficit in question of 400%. So, same argument:
change in media transfer rate is just a distraction from the
original question.

In any case, we probably want to start using a smarter strategy
sooner than 1000 commits, maybe after ten or a hundred commits,
which would make the change in media transfer rate even less
relevant.

The thing is, when data first starts landing on media, we do not
have much information about what the long term load will be. So
just analyze the clues we have in the early commits and put those
early deltas onto disk in the most efficient format, which for
Tux3 seems to be linear per delta. There would be exceptions, but
that is the common case.

Then get smarter later. The intent is to get the best of both:
early efficiency, and long term nice aging behavior. I do not
accept the proposition that one must be sacrificed for the
other, I find that reasoning faulty.

> The performance deficit in question and the difference in media rate are
> three orders of magnitude apart, does that justify the term "similar or
> identical?".

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Daniel Phillips

On 05/13/2015 12:25 AM, Pavel Machek wrote:
> On Mon 2015-05-11 16:53:10, Daniel Phillips wrote:
>> Hi Pavel,
>>
>> On 05/11/2015 03:12 PM, Pavel Machek wrote:
> It is a fact of life that when you change one aspect of an intimately 
> interconnected system,
> something else will change as well. You have naive/nonexistent free space 
> management now; when you
> design something workable there it is going to impact everything else 
> you've already done. It's an
> easy bet that the impact will be negative, the only question is to what 
> degree.

 You might lose that bet. For example, suppose we do strictly linear 
 allocation
 each delta, and just leave nice big gaps between the deltas for future
 expansion. Clearly, we run at similar or identical speed to the current 
 naive
 strategy until we must start filling in the gaps, and at that point our 
 layout
 is not any worse than XFS, which started bad and stayed that way.
>>>
>>> Umm, are you sure. If "some areas of disk are faster than others" is
>>> still true on todays harddrives, the gaps will decrease the
>>> performance (as you'll "use up" the fast areas more quickly).
>>
>> That's why I hedged my claim with "similar or identical". The
>> difference in media speed seems to be a relatively small effect
> 
> When you knew it can't be identical? That's rather confusing, right?

Maybe. The top of thread is about a measured performance deficit of
a factor of five. Next to that, a media transfer rate variation by
a factor of two already starts to look small, and gets smaller when
scrutinized.

Let's say our delta size is 400MB (typical under load) and we leave
a "nice big gap" of 112 MB after flushing each one. Let's say we do
two thousand of those before deciding that we have enough information
available to switch to some smarter strategy. We used one GB of a
a 4TB disk, say. The media transfer rate decreased by a factor of:

(1 - 2/1000) = .2%.

The performance deficit in question and the difference in media rate are
three orders of magnitude apart, does that justify the term "similar or
identical?".

> Perhaps you should post more details how your benchmark is structured
> next time, so we can see you did not make any trivial mistakes...?

Makes sense to me, though I do take considerable care to ensure that
my results are reproducible. That is born out by the fact that Mike
did reproduce, albeit from the published branch, which is a bit behind
current work. And he went on to do some original testing of his own.

I had no idea Tux3 was so much faster than XFS on the Git self test,
because we never specifically tested anything like that, or optimized
for it. Of course I was interested in why. And that was not all, Mike
also noticed a really interesting fact about latency that I failed to
reproduce. That went on to the list of things to investigate as time
permits.

I reproduced Mike's results according to his description, by actually
building Git in the VM and running the selftests just to see if the same
thing happened, which it did. I didn't think that was worth mentioning
at the time, because if somebody publishes benchmarks, my first instinct
is to trust them. Trust and verify.

> Or just clean the code up so that it can get merged, so that we can
> benchmark ourselves...

Third possibility: build from our repository, as Mike did. Obviously,
we need to merge to master so the build process matches the Wiki. But
Hirofumi is busy with other things, so please be patient.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Pavel Machek

On Mon 2015-05-11 16:53:10, Daniel Phillips wrote:
> Hi Pavel,
> 
> On 05/11/2015 03:12 PM, Pavel Machek wrote:
> >>> It is a fact of life that when you change one aspect of an intimately 
> >>> interconnected system,
> >>> something else will change as well. You have naive/nonexistent free space 
> >>> management now; when you
> >>> design something workable there it is going to impact everything else 
> >>> you've already done. It's an
> >>> easy bet that the impact will be negative, the only question is to what 
> >>> degree.
> >>
> >> You might lose that bet. For example, suppose we do strictly linear 
> >> allocation
> >> each delta, and just leave nice big gaps between the deltas for future
> >> expansion. Clearly, we run at similar or identical speed to the current 
> >> naive
> >> strategy until we must start filling in the gaps, and at that point our 
> >> layout
> >> is not any worse than XFS, which started bad and stayed that way.
> > 
> > Umm, are you sure. If "some areas of disk are faster than others" is
> > still true on todays harddrives, the gaps will decrease the
> > performance (as you'll "use up" the fast areas more quickly).
> 
> That's why I hedged my claim with "similar or identical". The
> difference in media speed seems to be a relatively small effect

When you knew it can't be identical? That's rather confusing, right?
Perhaps you should post more details how your benchmark is structured
next time, so we can see you did not make any trivial mistakes...?

Or just clean the code up so that it can get merged, so that we can
benchmark ourselves...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Pavel Machek

On Tue 2015-05-12 13:54:58, Daniel Phillips wrote:
> On 05/12/2015 11:39 AM, David Lang wrote:
> > On Mon, 11 May 2015, Daniel Phillips wrote:
> >>> ...it's the mm and core kernel developers that need to
> >>> review and accept that code *before* we can consider merging tux3.
> >>
> >> Please do not say "we" when you know that I am just as much a "we"
> >> as you are. Merging Tux3 is not your decision. The people whose
> >> decision it actually is are perfectly capable of recognizing your
> >> agenda for what it is.
> >>
> >>   http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM
> >>   "XFS Developer Takes Shots At Btrfs, EXT4"
> > 
> > umm, Phoronix has no input on what gets merged into the kernel. they also 
> > hae a reputation for
> > trying to turn anything into click-bait by making it sound like a fight 
> > when it isn't.
> 
> Perhaps you misunderstood. Linus decides what gets merged. Andrew
> decides. Greg decides. Dave Chinner does not decide, he just does
> his level best to create the impression that our project is unfit
> to merge. Any chance there might be an agenda?

Dunno. _Your_ agenda seems to be "attack other maintainers so much
that you can later claim they are biased".

Not going to work, sorry.

> > As Dave says above, it's not the other filesystem people you have to 
> > convince, it's the core VFS and
> > Memory Mangement folks you have to convince. You may need a little 
> > benchmarking to show that there
> > is a real advantage to be gained, but the real discussion is going to be on 
> > the impact that page
> > forking is going to have on everything else (both in complexity and in 
> > performance impact to other
> > things)
> 
> Yet he clearly wrote "we" as if he believes he is part of it.
> 
> Now that ENOSPC is done to a standard way beyond what Btrfs had
> when it was merged, the next item on the agenda is writeback. That
> involves us and VFS people as you say, and not Dave Chinner, who
> only intends to obstruct the process as much as he possibly can. He

Why would he do that? Aha, maybe because you keep attacking him all
the time. Or maybe because your code is not up to the kernel
standards. You want to claim it is the former, but it really looks
like the latter.

Just stop doing that. You are not creating nice atmosphere and you are
not getting tux3 being merged in any way.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Daniel Phillips

On 05/12/2015 03:35 PM, David Lang wrote:
> On Tue, 12 May 2015, Daniel Phillips wrote:
>> On 05/12/2015 02:30 PM, David Lang wrote:
>>> You need to get out of the mindset that Ted and Dave are Enemies that you 
>>> need to overcome, they are
>>> friendly competitors, not Enemies.
>>
>> You are wrong about Dave These are not the words of any friend:
>>
>>   "I don't think I'm alone in my suspicion that there was something
>>   stinky about your numbers." -- Dave Chinner
> 
> you are looking for offense. That just means that something is wrong with 
> them, not that they were
> deliberatly falsified.

I am not mistaken. Dave made sure to eliminate any doubt about
what he meant. He said "Oh, so nicely contrived. But terribly
obvious now that I've found it" among other things.

Good work, Dave. Never mind that we did not hide it.

Let's look at some more of the story. Hirofumi ran the test and
I posted the results and explained the significant. I did not
even know that dbench had fsyncs at that time, since I had never
used it myself, nor that Hirofumi had taken them out in order to
test the things he was interested in. Which turned out to be very
interesting, don't you agree?

Anyway, Hirofumi followed up with a clear explanation, here:

   http://phunq.net/pipermail/tux3/2013-May/002022.html

Instead of accepting that, Dave chose to ride right over it and
carry on with his thinly veiled allegations of intellectual fraud,
using such words as "it's deceptive at best." Dave managed to
insult two people that day.

Dave dismissed the basic breakthrough we had made as "silly
marketing fluff". By now I hope you understand that the result in
question was anything but silly marketing fluff. There are real,
technical reasons that Tux3 wins benchmarks, and the specific
detail that Dave attacked so ungraciously is one of them.

Are you beginning to see who the victim of this mugging was?

>> Basically allegations of cheating. And wrong. Maybe Dave just
>> lives in his own dreamworld where everybody is out to get him, so
>> he has to attack people he views as competitors first.
> 
> you are the one doing the attacking.

Defending, not attacking. There is a distinction.

> Please stop. Take a break if needed, and then get back to
> producing software rather than complaining about how everyone is out to get 
> you.

Dave is not "everyone", and a "shut up" will not fix this.

What will fix this is a simple, professional statement that
an error was made, that there was no fraud or anything even
remotely resembling it, and that instead a technical
contribution was made. It is not even important that it come
from Dave. But it is important that the aspersions that were
cast be recognized for what they were.

By the way, do you remember the scene from "Unforgiven" where
the sherrif is kicking the guy on the ground and saying "I'm
not kicking you?" It feels like that.

As far as who should take a break goes, note that either of
us can stop the thread. Does it necessarily have to be me?

If you would prefer some light reading, you could read "How fast
can we fail?", which I believe is relevant to the question of
whether Tux3 is mergeable or not.

   https://lkml.org/lkml/2015/5/12/663

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Daniel Phillips

On 05/12/2015 02:30 PM, David Lang wrote:
> On Tue, 12 May 2015, Daniel Phillips wrote:
>> Phoronix published a headline that identifies Dave Chinner as
>> someone who takes shots at other projects. Seems pretty much on
>> the money to me, and it ought to be obvious why he does it.
> 
> Phoronix turns any correction or criticism into an attack.

Phoronix gets attacked in an unseemly way by a number of people
in the developer community who should behave better. You are
doing it yourself, seemingly oblivious to the valuable role that
the publication plays in our community. Google for filesystem
benchmarks. Where do you find them? Right. Not to mention the
Xorg coverage, community issues, etc etc. The last thing we need
is a monoculture in Linux news, and we are dangerously close to
that now.

So, how is "EXT4 is not as stable or as well tested as most
people think" not a cheap shot? By my first hand experience, that
claim is absurd. Add to that the first hand experience of roughly
two billion other people. Seems to be a bit self serving too, or
was that just an accident.

> You need to get out of the mindset that Ted and Dave are Enemies that you 
> need to overcome, they are
> friendly competitors, not Enemies.

You are wrong about Dave, These are not the words of any friend:

   "I don't think I'm alone in my suspicion that there was something
   stinky about your numbers." -- Dave Chinner

Basically allegations of cheating. And wrong. Maybe Dave just
lives in his own dreamworld where everybody is out to get him, so
he has to attack people he views as competitors first.

Ted has more taste and his FUD attack was more artful, but it
still amounted to nothing more than piling on, He just picked up
Dave's straw man uncritically and proceeded to knock it down
some more. Nice way of distracting attention from the fact that
we actually did what we claimed, and instead of getting the
appropriate recognition for it, we were called cheaters. More or
less in so many words by Dave, and more subtly by Ted, but the
intent is clear and unmistakable. Apologies from both are still
in order, but it will be a rainy day in that hot place before we
ever see either of them do the right thing.

That said, Ted is no enemy, he is brilliant and usually conducts
himself admirably. Except sometimes. I wish I would say the same
about Dave, but what I see there is a guy who has invested his
entire identity in his XFS career and is insecure that something
might conspire against him to disrupt it. I mean, come on, if you
convince Redhat management to elevate your life's work to the
status of something that most of the paid for servers in the
world are going to run, do you continue attacking your peers or
do you chill a bit?

> They assume that you are working in good faith (but are
> inexperienced compared to them), and you need to assume that they are working 
> in good faith. If they
> ever do resort to underhanded means to sabotage you, Linus and the other 
> kernel developers will take
> action. But pointing out limits in your current implementation, problems in 
> your benchmarks based on
> how they are run, and concepts that are going to be difficult to merge is not 
> underhanded, it's
> exactly the type of assistance that you should be greatful for in friendly 
> competition.
> 
> You were the one who started crowing about how badly XFS performed.

Not at all, somebody else posted the terrible XFS benchmark result,
then Dave put up a big smokescreen to try to deflect atention from
it. There is a term for that kind of logical fallacy:

   http://en.wikipedia.org/wiki/Proof_by_intimidation

Seems to have worked well on you. But after all those words, XFS
does not run any faster, and it clearly needs to.

> Dave gave a long and detailed explination about the reasons for the 
> differences, and showing
benchmarks on other hardware that
> showed that XFS works very well there. That's not an attack on EXT4 (or 
> Tux3), it's an explination.

Long, detailed, and bogus. Summary: "oh, XFS doesn't work well on
that hardware? Get new hardware." Excuse me, but other filesystems
do work well on that hardware, the problem is not with the hardware.

> I have my own concerns about how things are going to work (I've voiced some 
> of them), but no, I
> haven't tried running Tux3 because you say it's not ready yet.

I did not say that. I said it is not ready for users. It is more
than ready for anybody who wants to develop it, or benchmark it,
or put test data on it, and has been for a long time. Except for
enospc, and that was apparently not an issue for Btrfs, was it.

>> You know what to do about checking for faulty benchmarks.
> 
> That requires that the code be readily available, which last I heard, Tux3 
> wasn't. Has this been fixed?

You heard wrong. The code is readily available and you can clone it
from here:

https://github.com/OGAWAHirofumi/linux-tux3.git

The hirofumi-user branch has the user tools including mkfs and basic
fsck,

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Theodore Ts'o

On Tue, May 12, 2015 at 03:35:43PM -0700, David Lang wrote:
> 
> I happen to think that it's correct. It's not that Ext4 isn't tested, but
> that people's expectations of how much it's been tested, and at what scale
> don't match the reality.

Ext4 is used at Google, on a very large number of disks.  Exactly how
large is not something I'm allowed to say, but there's a very amusing
Ted Talk by Randall Munroe (of xkcd fame) on that topic:

http://tedsummaries.com/2014/05/14/randall-munroe-comics-that-ask-what-if/

One thing I can say is that shortly after we deployed ext4 at Google,
thanks to having a very large number of disks, and because we have
very good system monitoring, we detected a file system corruption
problem that happened with a very low probability, but we had enough
disks that we could detect the pattern.  (Fortunately, because
Google's cluster file system has replication and/or erasure coding, no
user data was lost.)  Even though we could notice the problem, it took
us several months to track down the problem.

When we finally did, it turned out to be a race condition which only
took place under high memory pressure.  What was *very* amusing was
after fixing the problem for ext4, I looked at ext3, and discovered
that (a) the ext4 had inerited the bug was also in ext3, and (b) the
bug in ext3 had not been noticed in several enterprise distribution
testing runs done by Red Hat, SuSE, and IBM --- for well over a
**decade**.

What this means is that it's hard for *any* file system to be that
well tested; it's hard to substitute for years and years of production
use, hopefully in systems that have very rigorous monitoring so you
would notice if data or file system metadata is getting corrupted in
ways that can't be explained as hardware errors.  The fact that we
found a bug that was never discovered in ext3 after years and years of
use in many enterprises is a testimony to that fact.

(This is also why the fact that Facebook has started using btrfs in
production is going to be a very good thing for btrfs.  I'm sure they
will find all sorts of problems once they start running at large
scale, which is a _good_ thing; that's how those problems get fixed.)

Of course, using xfstests certainly helps a lot, and so in my opinion
all serious file system developers should be regularly using xfstests
as a part of the daily development cycle, and to be be extremely
ruthless about not allowing any test regressions.

Best regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang


On Tue, 12 May 2015, Daniel Phillips wrote:


On 05/12/2015 02:30 PM, David Lang wrote:

On Tue, 12 May 2015, Daniel Phillips wrote:

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.


Phoronix turns any correction or criticism into an attack.


Phoronix gets attacked in an unseemly way by a number of people
in the developer community who should behave better. You are
doing it yourself, seemingly oblivious to the valuable role that
the publication plays in our community. Google for filesystem
benchmarks. Where do you find them? Right. Not to mention the
Xorg coverage, community issues, etc etc. The last thing we
need is a monoculture in Linux news, and we are dangerously
close to that now.


It's on my 'sites to check daily' list, but they have also had some pretty nasty 
errors in their benchmarks, some of which have been pointed out repeatedly over 
the years (doing fsync dependent workloads in situations where one FS actually 
honors the fsyncs and another doesn't is a classic)



So, how is "EXT4 is not as stable or as well tested as most
people think" not a cheap shot? By my first hand experience,
that claim is absurd. Add to that the first hand experience
of roughly two billion other people. Seems to be a bit self
serving too, or was that just an accident.


I happen to think that it's correct. It's not that Ext4 isn't tested, but that 
people's expectations of how much it's been tested, and at what scale don't 
match the reality.



You need to get out of the mindset that Ted and Dave are Enemies that you need 
to overcome, they are
friendly competitors, not Enemies.


You are wrong about Dave These are not the words of any friend:

  "I don't think I'm alone in my suspicion that there was something
  stinky about your numbers." -- Dave Chinner


you are looking for offense. That just means that something is wrong with them, 
not that they were deliberatly falsified.



Basically allegations of cheating. And wrong. Maybe Dave just
lives in his own dreamworld where everybody is out to get him, so
he has to attack people he views as competitors first.


you are the one doing the attacking. Please stop. Take a break if needed, and 
then get back to producing software rather than complaining about how everyone 
is out to get you.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Daniel Phillips

On 05/12/2015 02:30 PM, David Lang wrote:
> On Tue, 12 May 2015, Daniel Phillips wrote:
>> Phoronix published a headline that identifies Dave Chinner as
>> someone who takes shots at other projects. Seems pretty much on
>> the money to me, and it ought to be obvious why he does it.
> 
> Phoronix turns any correction or criticism into an attack.

Phoronix gets attacked in an unseemly way by a number of people
in the developer community who should behave better. You are
doing it yourself, seemingly oblivious to the valuable role that
the publication plays in our community. Google for filesystem
benchmarks. Where do you find them? Right. Not to mention the
Xorg coverage, community issues, etc etc. The last thing we
need is a monoculture in Linux news, and we are dangerously
close to that now.

So, how is "EXT4 is not as stable or as well tested as most
people think" not a cheap shot? By my first hand experience,
that claim is absurd. Add to that the first hand experience
of roughly two billion other people. Seems to be a bit self
serving too, or was that just an accident.

> You need to get out of the mindset that Ted and Dave are Enemies that you 
> need to overcome, they are
> friendly competitors, not Enemies.

You are wrong about Dave These are not the words of any friend:

   "I don't think I'm alone in my suspicion that there was something
   stinky about your numbers." -- Dave Chinner

Basically allegations of cheating. And wrong. Maybe Dave just
lives in his own dreamworld where everybody is out to get him, so
he has to attack people he views as competitors first.

Ted has more taste and his FUD attack was more artful, but it
still amounted to nothing more than piling on, he just picked
Dave's straw man uncritically and proceeded to and knock it down
some more. Nice way of distracting attention from the fact that
we actually did what we claimed, and instead of getting the
appropriate recognition for it, we were called cheaters. More or
less in so many words, and more subtly by Ted, but the intent
is clear and unmistakable. Apologies from both are still in order,
but it

> They assume that you are working in good faith (but are
> inexperienced compared to them), and you need to assume that they are working 
> in good faith. If they
> ever do resort to underhanded means to sabotage you, Linus and the other 
> kernel developers will take
> action. But pointing out limits in your current implementation, problems in 
> your benchmarks based on
> how they are run, and concepts that are going to be difficult to merge is not 
> underhanded, it's
> exactly the type of assistance that you should be greatful for in friendly 
> competition.
> 
> You were the one who started crowing about how badly XFS performed.

Not at all, somebody else posted the terrible XFS benchmark
result, then Dave put up a big smokescreen to try to deflect
atention from it. There is a term for that kind of logical
fallacy:

   http://en.wikipedia.org/wiki/Proof_by_intimidation

Seems to have worked well on you. But after all those words,
XFS does not run any faster, and it clearly needs to.

 Dave gave a long and detailed
> explination about the reasons for the differences, and showing benchmarks on 
> other hardware that
> showed that XFS works very well there. That's not an attack on EXT4 (or 
> Tux3), it's an explination.
> 
 The real question is, has the Linux development process become
 so political and toxic that worthwhile projects fail to benefit
 from supposed grassroots community support. You are the poster
 child for that.
>>>
>>> The linux development process is making code available, responding to 
>>> concerns from the experts in
>>> the community, and letting the code talk for itself.
>>
>> Nice idea, but it isn't working. Did you let the code talk to you?
>> Right, you let the code talk to Dave Chinner, then you listen to
>> what Dave Chinner has to say about it. Any chance that there might
>> be some creative licence acting somewhere in that chain?
> 
> I have my own concerns about how things are going to work (I've voiced some 
> of them), but no, I
> haven't tried running Tux3 because you say it's not ready yet.
> 
>>> There have been many people pushing code for inclusion that has not gotten 
>>> into the kernel, or has
>>> not been used by any distros after it's made it into the kernel, in spite 
>>> of benchmarks being posted
>>> that seem to show how wonderful the new code is. ReiserFS was one of the 
>>> first, and part of what
>>> tarnished it's reputation with many people was how much they were pushing 
>>> the benchmarks that were
>>> shown to be faulty (the one I remember most vividly was that the entire 
>>> benchmark completed in <30
>>> seconds, and they had the FS tuned to not start flushing data to disk for 
>>> 30 seconds, so the entire
>>> 'benchmark' ran out of ram without ever touching the disk)
>>
>> You know what to do about checking for faulty benchmarks.
> 
> That requires that the

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Christian Stroetmann


On 12.05.2015 22:54, Daniel Phillips wrote:

On 05/12/2015 11:39 AM, David Lang wrote:

On Mon, 11 May 2015, Daniel Phillips wrote:

...it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.

Please do not say "we" when you know that I am just as much a "we"
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

   http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM
   "XFS Developer Takes Shots At Btrfs, EXT4"

umm, Phoronix has no input on what gets merged into the kernel. they also hae a 
reputation for
trying to turn anything into click-bait by making it sound like a fight when it 
isn't.

Perhaps you misunderstood. Linus decides what gets merged. Andrew
decides. Greg decides. Dave Chinner does not decide, he just does
his level best to create the impression that our project is unfit
to merge. Any chance there might be an agenda?

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.


Maybe Dave has convincing arguments, that have been misinterpreted by 
that website, which is an interesting but also highliy manipulative 
publication.



The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.

The linux development process is making code available, responding to concerns 
from the experts in
the community, and letting the code talk for itself.

Nice idea, but it isn't working. Did you let the code talk to you?
Right, you let the code talk to Dave Chinner, then you listen to
what Dave Chinner has to say about it. Any chance that there might
be some creative licence acting somewhere in that chain?


We are missing the complete useable thing.


There have been many people pushing code for inclusion that has not gotten into 
the kernel, or has
not been used by any distros after it's made it into the kernel, in spite of 
benchmarks being posted
that seem to show how wonderful the new code is. ReiserFS was one of the first, 
and part of what
tarnished it's reputation with many people was how much they were pushing the 
benchmarks that were
shown to be faulty (the one I remember most vividly was that the entire benchmark 
completed in<30
seconds, and they had the FS tuned to not start flushing data to disk for 30 
seconds, so the entire
'benchmark' ran out of ram without ever touching the disk)

You know what to do about checking for faulty benchmarks.


So when Ted and Dave point out problems with the benchmark (the difference in 
behavior between a
single spinning disk, different partitions on the same disk, SSDs, and 
ramdisks), you would be
better off acknowledging them and if you can't adjust and re-run the 
benchmarks, don't start
attacking them as a result.

Ted and Dave failed to point out any actual problem with any
benchmark. They invented issues with benchmarks and promoted those
as FUD.


In general, benchmarks are a critical issue. In this relation, let me 
quote Churchill in a derivated way:

Do not trust a benchmark that you have not forged yourself.


As Dave says above, it's not the other filesystem people you have to convince, 
it's the core VFS and
Memory Mangement folks you have to convince. You may need a little benchmarking 
to show that there
is a real advantage to be gained, but the real discussion is going to be on the 
impact that page
forking is going to have on everything else (both in complexity and in 
performance impact to other
things)

Yet he clearly wrote "we" as if he believes he is part of it.

Now that ENOSPC is done to a standard way beyond what Btrfs had
when it was merged, the next item on the agenda is writeback. That
involves us and VFS people as you say, and not Dave Chinner, who
only intends to obstruct the process as much as he possibly can. He
should get back to work on his own project. Nobody will miss his
posts if he doesn't make them. They contribute nothing of value,
create a lot of bad blood, and just serve to further besmirch the
famously tarnished reputation of LKML.


At least, I would miss his contributions, specifically his technical 
explanations but also his opinions.



You know that Tux3 is already fast. Not just that of course. It
has a higher standard of data integrity than your metadata-only
journalling filesystem and a small enough code base that it can
be reasonably expected to reach the quality expected of an
enterprise class filesystem, quite possibly before XFS gets
there.

We wouldn't expect anyone developing a new filesystem to believe any 
differently.

It is not a matter of belief, it is a matter of testable fact. For
example, you can count the lines. You can run the same benchmarks.

Proving

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang


On Tue, 12 May 2015, Daniel Phillips wrote:


On 05/12/2015 11:39 AM, David Lang wrote:

On Mon, 11 May 2015, Daniel Phillips wrote:

...it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.


Please do not say "we" when you know that I am just as much a "we"
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

  http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM
  "XFS Developer Takes Shots At Btrfs, EXT4"


umm, Phoronix has no input on what gets merged into the kernel. they also hae a 
reputation for
trying to turn anything into click-bait by making it sound like a fight when it 
isn't.


Perhaps you misunderstood. Linus decides what gets merged. Andrew
decides. Greg decides. Dave Chinner does not decide, he just does
his level best to create the impression that our project is unfit
to merge. Any chance there might be an agenda?

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.


Phoronix turns any correction or criticism into an attack.

You need to get out of the mindset that Ted and Dave are Enemies that you need 
to overcome, they are friendly competitors, not Enemies. They assume that you 
are working in good faith (but are inexperienced compared to them), and you need 
to assume that they are working in good faith. If they ever do resort to 
underhanded means to sabotage you, Linus and the other kernel developers will 
take action. But pointing out limits in your current implementation, problems in 
your benchmarks based on how they are run, and concepts that are going to be 
difficult to merge is not underhanded, it's exactly the type of assistance that 
you should be greatful for in friendly competition.


You were the one who started crowing about how badly XFS performed. Dave gave a 
long and detailed explination about the reasons for the differences, and showing 
benchmarks on other hardware that showed that XFS works very well there. That's 
not an attack on EXT4 (or Tux3), it's an explination.



The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.


The linux development process is making code available, responding to concerns 
from the experts in
the community, and letting the code talk for itself.


Nice idea, but it isn't working. Did you let the code talk to you?
Right, you let the code talk to Dave Chinner, then you listen to
what Dave Chinner has to say about it. Any chance that there might
be some creative licence acting somewhere in that chain?


I have my own concerns about how things are going to work (I've voiced some of 
them), but no, I haven't tried running Tux3 because you say it's not ready yet.



There have been many people pushing code for inclusion that has not gotten into 
the kernel, or has
not been used by any distros after it's made it into the kernel, in spite of 
benchmarks being posted
that seem to show how wonderful the new code is. ReiserFS was one of the first, 
and part of what
tarnished it's reputation with many people was how much they were pushing the 
benchmarks that were
shown to be faulty (the one I remember most vividly was that the entire benchmark 
completed in <30
seconds, and they had the FS tuned to not start flushing data to disk for 30 
seconds, so the entire
'benchmark' ran out of ram without ever touching the disk)


You know what to do about checking for faulty benchmarks.


That requires that the code be readily available, which last I heard, Tux3 
wasn't. Has this been fixed?



So when Ted and Dave point out problems with the benchmark (the difference in 
behavior between a
single spinning disk, different partitions on the same disk, SSDs, and 
ramdisks), you would be
better off acknowledging them and if you can't adjust and re-run the 
benchmarks, don't start
attacking them as a result.


Ted and Dave failed to point out any actual problem with any
benchmark. They invented issues with benchmarks and promoted those
as FUD.


They pointed out problems with using ramdisk to simulate a SSD and huge 
differences between spinning rust and an SSD (or disk array). Those aren't FUD.



As Dave says above, it's not the other filesystem people you have to convince, 
it's the core VFS and
Memory Mangement folks you have to convince. You may need a little benchmarking 
to show that there
is a real advantage to be gained, but the real discussion is going to be on the 
impact that page
forking is going to have on everything else (both in complexity and in 
performance impact to other
things)


Yet he clearly wrote "we" as if he believes he is part of it.


He is part of the group of people who use

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Daniel Phillips

On 05/12/2015 11:39 AM, David Lang wrote:
> On Mon, 11 May 2015, Daniel Phillips wrote:
>>> ...it's the mm and core kernel developers that need to
>>> review and accept that code *before* we can consider merging tux3.
>>
>> Please do not say "we" when you know that I am just as much a "we"
>> as you are. Merging Tux3 is not your decision. The people whose
>> decision it actually is are perfectly capable of recognizing your
>> agenda for what it is.
>>
>>   http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM
>>   "XFS Developer Takes Shots At Btrfs, EXT4"
> 
> umm, Phoronix has no input on what gets merged into the kernel. they also hae 
> a reputation for
> trying to turn anything into click-bait by making it sound like a fight when 
> it isn't.

Perhaps you misunderstood. Linus decides what gets merged. Andrew
decides. Greg decides. Dave Chinner does not decide, he just does
his level best to create the impression that our project is unfit
to merge. Any chance there might be an agenda?

Phoronix published a headline that identifies Dave Chinner as
someone who takes shots at other projects. Seems pretty much on
the money to me, and it ought to be obvious why he does it.

>> The real question is, has the Linux development process become
>> so political and toxic that worthwhile projects fail to benefit
>> from supposed grassroots community support. You are the poster
>> child for that.
> 
> The linux development process is making code available, responding to 
> concerns from the experts in
> the community, and letting the code talk for itself.

Nice idea, but it isn't working. Did you let the code talk to you?
Right, you let the code talk to Dave Chinner, then you listen to
what Dave Chinner has to say about it. Any chance that there might
be some creative licence acting somewhere in that chain?

> There have been many people pushing code for inclusion that has not gotten 
> into the kernel, or has
> not been used by any distros after it's made it into the kernel, in spite of 
> benchmarks being posted
> that seem to show how wonderful the new code is. ReiserFS was one of the 
> first, and part of what
> tarnished it's reputation with many people was how much they were pushing the 
> benchmarks that were
> shown to be faulty (the one I remember most vividly was that the entire 
> benchmark completed in <30
> seconds, and they had the FS tuned to not start flushing data to disk for 30 
> seconds, so the entire
> 'benchmark' ran out of ram without ever touching the disk)

You know what to do about checking for faulty benchmarks.

> So when Ted and Dave point out problems with the benchmark (the difference in 
> behavior between a
> single spinning disk, different partitions on the same disk, SSDs, and 
> ramdisks), you would be
> better off acknowledging them and if you can't adjust and re-run the 
> benchmarks, don't start
> attacking them as a result.

Ted and Dave failed to point out any actual problem with any
benchmark. They invented issues with benchmarks and promoted those
as FUD.

> As Dave says above, it's not the other filesystem people you have to 
> convince, it's the core VFS and
> Memory Mangement folks you have to convince. You may need a little 
> benchmarking to show that there
> is a real advantage to be gained, but the real discussion is going to be on 
> the impact that page
> forking is going to have on everything else (both in complexity and in 
> performance impact to other
> things)

Yet he clearly wrote "we" as if he believes he is part of it.

Now that ENOSPC is done to a standard way beyond what Btrfs had
when it was merged, the next item on the agenda is writeback. That
involves us and VFS people as you say, and not Dave Chinner, who
only intends to obstruct the process as much as he possibly can. He
should get back to work on his own project. Nobody will miss his
posts if he doesn't make them. They contribute nothing of value,
create a lot of bad blood, and just serve to further besmirch the
famously tarnished reputation of LKML.

>> You know that Tux3 is already fast. Not just that of course. It
>> has a higher standard of data integrity than your metadata-only
>> journalling filesystem and a small enough code base that it can
>> be reasonably expected to reach the quality expected of an
>> enterprise class filesystem, quite possibly before XFS gets
>> there.
> 
> We wouldn't expect anyone developing a new filesystem to believe any 
> differently.

It is not a matter of belief, it is a matter of testable fact. For
example, you can count the lines. You can run the same benchmarks.

Proving the data consistency claims would be a little harder, you
need tools for that, and some of those aren't built yet. Or, if you
have technical ability, you can read the code and the copious design
material that has been posted and convince yourself that, yes, there
is something cool here, why didn't anybody do it that way before?
But of course that starts to sound like work. Debating nonte

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread David Lang


On Mon, 11 May 2015, Daniel Phillips wrote:


On Monday, May 11, 2015 10:38:42 PM PDT, Dave Chinner wrote:

I think Ted and I are on the same page here. "Competitive
benchmarks" only matter to the people who are trying to sell
something. You're trying to sell Tux3, but


By "same page", do you mean "transparently obvious about
obstructing other projects"?


The "except page forking design" statement is your biggest hurdle
for getting tux3 merged, not performance.


No, the "except page forking design" is because the design is
already good and effective. The small adjustments needed in core
are well worth merging because the benefits are proved by benchmarks.
So benchmarks are key and will not stop just because you don't like
the attention they bring to XFS issues.


Without page forking, tux3
cannot be merged at all. But it's not filesystem developers you need
to convince about the merits of the page forking design and
implementation - it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.


Please do not say "we" when you know that I am just as much a "we"
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

  http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM
  "XFS Developer Takes Shots At Btrfs, EXT4"


umm, Phoronix has no input on what gets merged into the kernel. they also hae a 
reputation for trying to turn anything into click-bait by making it sound like a 
fight when it isn't.



The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.


The linux development process is making code available, responding to concerns 
from the experts in the community, and letting the code talk for itself.


There have been many people pushing code for inclusion that has not gotten into 
the kernel, or has not been used by any distros after it's made it into the 
kernel, in spite of benchmarks being posted that seem to show how wonderful the 
new code is. ReiserFS was one of the first, and part of what tarnished it's 
reputation with many people was how much they were pushing the benchmarks that 
were shown to be faulty (the one I remember most vividly was that the entire 
benchmark completed in <30 seconds, and they had the FS tuned to not start 
flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram 
without ever touching the disk)


So when Ted and Dave point out problems with the benchmark (the difference in 
behavior between a single spinning disk, different partitions on the same disk, 
SSDs, and ramdisks), you would be better off acknowledging them and if you can't 
adjust and re-run the benchmarks, don't start attacking them as a result.


As Dave says above, it's not the other filesystem people you have to convince, 
it's the core VFS and Memory Mangement folks you have to convince. You may need 
a little benchmarking to show that there is a real advantage to be gained, but 
the real discussion is going to be on the impact that page forking is going to 
have on everything else (both in complexity and in performance impact to other 
things)



IOWs, you need to focus on the important things needed to acheive
your stated goal of getting tux3 merged. New filesystems should be
faster than those based on 20-25 year old designs, so you don't need
to waste time trying to convince people that tux3, when complete,
will be fast.


You know that Tux3 is already fast. Not just that of course. It
has a higher standard of data integrity than your metadata-only
journalling filesystem and a small enough code base that it can
be reasonably expected to reach the quality expected of an
enterprise class filesystem, quite possibly before XFS gets
there.


We wouldn't expect anyone developing a new filesystem to believe any 
differently. If they didn't believe this, why would they be working on the 
filesystem instead of just using an existing filesystem.


The ugly reality is that everyone's early versions of their new filesystem looks 
really good. The problem is when they extend it to cover the corner cases and 
when it gets stressed by real-world (as opposed to benchmark) workloads. This 
isn't saying that you are wrong in your belief, just that you may not be right, 
and nobody will know until you are to a usable state and other people can start 
beating on it.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Christian Stroetmann


Am 12.05.2015 06:36, schrieb Daniel Phillips:

Hi David,

On 05/11/2015 05:12 PM, David Lang wrote:

On Mon, 11 May 2015, Daniel Phillips wrote:


On 05/11/2015 03:12 PM, Pavel Machek wrote:

It is a fact of life that when you change one aspect of an intimately 
interconnected system,
something else will change as well. You have naive/nonexistent free space 
management now; when you
design something workable there it is going to impact everything else you've 
already done. It's an
easy bet that the impact will be negative, the only question is to what degree.

You might lose that bet. For example, suppose we do strictly linear allocation
each delta, and just leave nice big gaps between the deltas for future
expansion. Clearly, we run at similar or identical speed to the current naive
strategy until we must start filling in the gaps, and at that point our layout
is not any worse than XFS, which started bad and stayed that way.

Umm, are you sure. If "some areas of disk are faster than others" is
still true on todays harddrives, the gaps will decrease the
performance (as you'll "use up" the fast areas more quickly).

That's why I hedged my claim with "similar or identical". The
difference in media speed seems to be a relatively small effect
compared to extra seeks. It seems that XFS puts big spaces between
new directories, and suffers a lot of extra seeks because of it.
I propose to batch new directories together initially, then change
the allocation goal to a new, relatively empty area if a big batch
of files lands on a directory in a crowded region. The "big" gaps
would be on the order of delta size, so not really very big.

This is an interesting idea, but what happens if the files don't arrive as a 
big batch, but rather
trickle in over time (think a logserver that if putting files into a bunch of 
directories at a
fairly modest rate per directory)

If files are trickling in then we can afford to spend a lot more time
finding nice places to tuck them in. Log server files are an especially
irksome problem for a redirect-on-write filesystem because the final
block tends to be rewritten many times and we must move it to a new
location each time, so every extent ends up as one block. Oh well. If
we just make sure to have some free space at the end of the file that
only that file can use (until everywhere else is full) then the long
term result will be slightly ravelled blocks that nonetheless tend to
be on the same track or flash block as their logically contiguous
neighbours. There will be just zero or one empty data blocks mixed
into the file tail as we commit the tail block over and over with the
same allocation goal. Sometimes there will be a block or two of
metadata as well, which will eventually bake themselves into the
middle of contiguous data and stop moving around.

Putting this together, we have:

   * At delta flush, break out all the log type files
   * Dedicate some block groups to append type files
   * Leave lots of space between files in those block groups
   * Peek at the last block of the file to set the allocation goal

Something like that. What we don't want is to throw those files into
the middle of a lot of rewrite-all files, messing up both kinds of file.
We don't care much about keeping these files near the parent directory
because one big seek per log file in a grep is acceptable, we just need
to avoid thousands of big seeks within the file, and not dribble single
blocks all over the disk.

It would also be nice to merge together extents somehow as the final
block is rewritten. One idea is to retain the final block dirty until
the next delta, and write it again into a contiguous position, so the
final block is always flushed twice. We already have the opportunistic
merge logic, but the redirty behavior and making sure it only happens
to log files would be a bit fiddly.

We will also play the incremental defragmentation card at some point,
but first we should try hard to control fragmentation in the first
place. Tux3 is well suited to online defragmentation because the delta
commit model makes it easy to move things around efficiently and safely,
but it does generate extra IO, so as a basic mechanism it is not ideal.
When we get to piling on features, that will be high on the list,
because it is relatively easy, and having that fallback gives a certain
sense of security.


So we are again at some more features of SASOS4Fun.

Said this, I can see as an alleged troll expert the agenda and strategy 
behind this and related threads, but still no usable code/file system at 
all and hence nothing that even might be ready for merging, as I 
understand the statements of the file system gurus.


So it is time for the developer(s) to take decisions, what should be 
implement respectively manifested in code eventually and then show the 
complete result, so that others can make the tests and the benchmarks.




Thanks
Best Regards
Do not feed the trolls.
C.S.


And when you then decide that you

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Howard Chu


Daniel Phillips wrote:

On 05/12/2015 02:03 AM, Pavel Machek wrote:

I'd call system with 65 tasks doing heavy fsync load at the some time
"embarrassingly misconfigured" :-). It is nice if your filesystem can
stay fast in that case, but...


Well, Tux3 wins the fsync race now whether it is 1 task, 64 tasks or
10,000 tasks. At the high end, maybe it is just a curiosity, or maybe
it tells us something about how Tux3 is will scale on the big machines
that XFS currently lays claim to. And Java programmers are busy doing
all kinds of wild and crazy things with lots of tasks. Java almost
makes them do it. If they need their data durable then they can easily
create loads like my test case.

Suppose you have a web server meant to serve 10,000 transactions
simultaneously and it needs to survive crashes without losing client
state. How will you do it? You could install an expensive, finicky
database, or you could write some Java code that happens to work well
because Linux has a scheduler and a filesystem that can handle it.
Oh wait, we don't have the second one yet, but maybe we soon will.

I will not claim that stupidly fast and scalable fsync is the main
reason that somebody should want Tux3, however, the lack of a high
performance fsync was in fact used as a means of spreading FUD about
Tux3, so I had some fun going way beyond the call of duty to answer
that. By the way, I am still waiting for the original source of the
FUD to concede the point politely, but maybe he is waiting for the
code to land, which it still has not as of today, so I guess that is
fair. Note that it would have landed quite some time ago if Tux3 was
already merged.


Well, stupidly fast and scalable fsync sounds wonderful to me; it's the 
primary pain point in LMDB write performance now.


http://symas.com/mdb/ondisk/

I look forward to testing Tux3 when usable code shows up in a public repo.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Daniel Phillips

On 05/12/2015 02:03 AM, Pavel Machek wrote:
> On Mon 2015-05-11 19:34:34, Daniel Phillips wrote:
>> On 05/11/2015 04:17 PM, Theodore Ts'o wrote:
>>> and another way that people
>>> doing competitive benchmarking can screw up and produce misleading
>>> numbers.
>>
>> If you think we screwed up or produced misleading numbers, could you
>> please be up front about it instead of making insinuations and
>> continuing your tirade against benchmarking and those who do it.
> 
> Are not you little harsh with Ted? He was polite.

Polite language does not include words like "screw up" and "misleading
numbers", those are combative words intended to undermine and disparage.
It is not clear how repeating the same words can be construed as less
polite than the original utterance.

>> The ram disk removes seek overhead and greatly reduces media transfer
>> overhead. This does not change things much: it confirms that Tux3 is
>> significantly faster than the others at synchronous loads. This is
>> apparently true independently of media type, though to be sure SSD
>> remains to be tested.
>>
>> The really interesting result is how much difference there is between
>> filesystems, even on a ram disk. Is it just CPU or is it synchronization
>> strategy and lock contention? Does our asynchronous front/back design
>> actually help a lot, instead of being a disadvantage as you predicted?
>>
>> It is too bad that fs_mark caps number of tasks at 64, because I am
>> sure that some embarrassing behavior would emerge at high task counts,
>> as with my tests on spinning disk.
> 
> I'd call system with 65 tasks doing heavy fsync load at the some time
> "embarrassingly misconfigured" :-). It is nice if your filesystem can
> stay fast in that case, but...

Well, Tux3 wins the fsync race now whether it is 1 task, 64 tasks or
10,000 tasks. At the high end, maybe it is just a curiosity, or maybe
it tells us something about how Tux3 is will scale on the big machines
that XFS currently lays claim to. And Java programmers are busy doing
all kinds of wild and crazy things with lots of tasks. Java almost
makes them do it. If they need their data durable then they can easily
create loads like my test case.

Suppose you have a web server meant to serve 10,000 transactions
simultaneously and it needs to survive crashes without losing client
state. How will you do it? You could install an expensive, finicky
database, or you could write some Java code that happens to work well
because Linux has a scheduler and a filesystem that can handle it.
Oh wait, we don't have the second one yet, but maybe we soon will.

I will not claim that stupidly fast and scalable fsync is the main
reason that somebody should want Tux3, however, the lack of a high
performance fsync was in fact used as a means of spreading FUD about
Tux3, so I had some fun going way beyond the call of duty to answer
that. By the way, I am still waiting for the original source of the
FUD to concede the point politely, but maybe he is waiting for the
code to land, which it still has not as of today, so I guess that is
fair. Note that it would have landed quite some time ago if Tux3 was
already merged.

Historical note: didn't Java motivate the O(1) scheduler?

Regarda,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-12 Thread Pavel Machek

On Mon 2015-05-11 19:34:34, Daniel Phillips wrote:
> 
> 
> On 05/11/2015 04:17 PM, Theodore Ts'o wrote:
> > On Tue, May 12, 2015 at 12:12:23AM +0200, Pavel Machek wrote:
> >> Umm, are you sure. If "some areas of disk are faster than others" is
> >> still true on todays harddrives, the gaps will decrease the
> >> performance (as you'll "use up" the fast areas more quickly).
> > 
> > It's still true.  The difference between O.D. and I.D. (outer diameter
> > vs inner diameter) LBA's is typically a factor of 2.  This is why
> > "short-stroking" works as a technique,
> 
> That is true, and the effect is not dominant compared to introducing
> a lot of extra seeks.
> 
> > and another way that people
> > doing competitive benchmarking can screw up and produce misleading
> > numbers.
> 
> If you think we screwed up or produced misleading numbers, could you
> please be up front about it instead of making insinuations and
> continuing your tirade against benchmarking and those who do it.

Are not you little harsh with Ted? He was polite.

> The ram disk removes seek overhead and greatly reduces media transfer
> overhead. This does not change things much: it confirms that Tux3 is
> significantly faster than the others at synchronous loads. This is
> apparently true independently of media type, though to be sure SSD
> remains to be tested.
> 
> The really interesting result is how much difference there is between
> filesystems, even on a ram disk. Is it just CPU or is it synchronization
> strategy and lock contention? Does our asynchronous front/back design
> actually help a lot, instead of being a disadvantage as you predicted?
> 
> It is too bad that fs_mark caps number of tasks at 64, because I am
> sure that some embarrassing behavior would emerge at high task counts,
> as with my tests on spinning disk.

I'd call system with 65 tasks doing heavy fsync load at the some time
"embarrassingly misconfigured" :-). It is nice if your filesystem can
stay fast in that case, but...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread Daniel Phillips

On Monday, May 11, 2015 10:38:42 PM PDT, Dave Chinner wrote:
> I think Ted and I are on the same page here. "Competitive
> benchmarks" only matter to the people who are trying to sell
> something. You're trying to sell Tux3, but

By "same page", do you mean "transparently obvious about
obstructing other projects"?

> The "except page forking design" statement is your biggest hurdle
> for getting tux3 merged, not performance.

No, the "except page forking design" is because the design is
already good and effective. The small adjustments needed in core
are well worth merging because the benefits are proved by benchmarks.
So benchmarks are key and will not stop just because you don't like
the attention they bring to XFS issues.

> Without page forking, tux3
> cannot be merged at all. But it's not filesystem developers you need
> to convince about the merits of the page forking design and
> implementation - it's the mm and core kernel developers that need to
> review and accept that code *before* we can consider merging tux3.

Please do not say "we" when you know that I am just as much a "we"
as you are. Merging Tux3 is not your decision. The people whose
decision it actually is are perfectly capable of recognizing your
agenda for what it is.

   http://www.phoronix.com/scan.php?page=news_item&px=MTA0NzM
   "XFS Developer Takes Shots At Btrfs, EXT4"

The real question is, has the Linux development process become
so political and toxic that worthwhile projects fail to benefit
from supposed grassroots community support. You are the poster
child for that.

> IOWs, you need to focus on the important things needed to acheive
> your stated goal of getting tux3 merged. New filesystems should be
> faster than those based on 20-25 year old designs, so you don't need
> to waste time trying to convince people that tux3, when complete,
> will be fast.

You know that Tux3 is already fast. Not just that of course. It
has a higher standard of data integrity than your metadata-only
journalling filesystem and a small enough code base that it can
be reasonably expected to reach the quality expected of an
enterprise class filesystem, quite possibly before XFS gets
there.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread Dave Chinner

On Mon, May 11, 2015 at 07:34:34PM -0700, Daniel Phillips wrote:
> Anyway, everybody but you loves competitive benchmarks, that is why I

I think Ted and I are on the same page here. "Competitive
benchmarks" only matter to the people who are trying to sell
something. You're trying to sell Tux3, but

> post them. They are not only useful for tracking down performance bugs,
> but as you point out, they help us advertise the reasons why Tux3 is
> interesting and ought to be merged.

 benchmarks won't get tux3 merged.

Addressing the significant issues that have been raised during
previous code reviews is what will get it merged.  I posted that
list elsewhere in this thread which you replied that they were all
"on the list of things to do except for the page forking design".

The "except page forking design" statement is your biggest hurdle
for getting tux3 merged, not performance. Without page forking, tux3
cannot be merged at all. But it's not filesystem developers you need
to convince about the merits of the page forking design and
implementation - it's the mm and core kernel developers that need to
review and accept that code *before* we can consider merging tux3.

IOWs, you need to focus on the important things needed to acheive
your stated goal of getting tux3 merged. New filesystems should be
faster than those based on 20-25 year old designs, so you don't need
to waste time trying to convince people that tux3, when complete,
will be fast.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread Daniel Phillips

Hi David,

On 05/11/2015 05:12 PM, David Lang wrote:
> On Mon, 11 May 2015, Daniel Phillips wrote:
> 
>> On 05/11/2015 03:12 PM, Pavel Machek wrote:
> It is a fact of life that when you change one aspect of an intimately 
> interconnected system,
> something else will change as well. You have naive/nonexistent free space 
> management now; when you
> design something workable there it is going to impact everything else 
> you've already done. It's an
> easy bet that the impact will be negative, the only question is to what 
> degree.

 You might lose that bet. For example, suppose we do strictly linear 
 allocation
 each delta, and just leave nice big gaps between the deltas for future
 expansion. Clearly, we run at similar or identical speed to the current 
 naive
 strategy until we must start filling in the gaps, and at that point our 
 layout
 is not any worse than XFS, which started bad and stayed that way.
>>>
>>> Umm, are you sure. If "some areas of disk are faster than others" is
>>> still true on todays harddrives, the gaps will decrease the
>>> performance (as you'll "use up" the fast areas more quickly).
>>
>> That's why I hedged my claim with "similar or identical". The
>> difference in media speed seems to be a relatively small effect
>> compared to extra seeks. It seems that XFS puts big spaces between
>> new directories, and suffers a lot of extra seeks because of it.
>> I propose to batch new directories together initially, then change
>> the allocation goal to a new, relatively empty area if a big batch
>> of files lands on a directory in a crowded region. The "big" gaps
>> would be on the order of delta size, so not really very big.
> 
> This is an interesting idea, but what happens if the files don't arrive as a 
> big batch, but rather
> trickle in over time (think a logserver that if putting files into a bunch of 
> directories at a
> fairly modest rate per directory)

If files are trickling in then we can afford to spend a lot more time
finding nice places to tuck them in. Log server files are an especially
irksome problem for a redirect-on-write filesystem because the final
block tends to be rewritten many times and we must move it to a new
location each time, so every extent ends up as one block. Oh well. If
we just make sure to have some free space at the end of the file that
only that file can use (until everywhere else is full) then the long
term result will be slightly ravelled blocks that nonetheless tend to
be on the same track or flash block as their logically contiguous
neighbours. There will be just zero or one empty data blocks mixed
into the file tail as we commit the tail block over and over with the
same allocation goal. Sometimes there will be a block or two of
metadata as well, which will eventually bake themselves into the
middle of contiguous data and stop moving around.

Putting this together, we have:

  * At delta flush, break out all the log type files
  * Dedicate some block groups to append type files
  * Leave lots of space between files in those block groups
  * Peek at the last block of the file to set the allocation goal

Something like that. What we don't want is to throw those files into
the middle of a lot of rewrite-all files, messing up both kinds of file.
We don't care much about keeping these files near the parent directory
because one big seek per log file in a grep is acceptable, we just need
to avoid thousands of big seeks within the file, and not dribble single
blocks all over the disk.

It would also be nice to merge together extents somehow as the final
block is rewritten. One idea is to retain the final block dirty until
the next delta, and write it again into a contiguous position, so the
final block is always flushed twice. We already have the opportunistic
merge logic, but the redirty behavior and making sure it only happens
to log files would be a bit fiddly.

We will also play the incremental defragmentation card at some point,
but first we should try hard to control fragmentation in the first
place. Tux3 is well suited to online defragmentation because the delta
commit model makes it easy to move things around efficiently and safely,
but it does generate extra IO, so as a basic mechanism it is not ideal.
When we get to piling on features, that will be high on the list,
because it is relatively easy, and having that fallback gives a certain
sense of security.

> And when you then decide that you have to move the directory/file info, 
> doesn't that create a
> potentially large amount of unexpected IO that could end up interfering with 
> what the user is trying
> to do?

Right, we don't like that and don't plan to rely on it. What we hope
for is behavior that, when you slowly stir the pot, tends to improve the
layout just as often as it degrades it. It may indeed become harder to
find ideal places to put things as time goes by, but we also gain more
information to base de

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread Daniel Phillips

On 05/11/2015 04:17 PM, Theodore Ts'o wrote:
> On Tue, May 12, 2015 at 12:12:23AM +0200, Pavel Machek wrote:
>> Umm, are you sure. If "some areas of disk are faster than others" is
>> still true on todays harddrives, the gaps will decrease the
>> performance (as you'll "use up" the fast areas more quickly).
> 
> It's still true.  The difference between O.D. and I.D. (outer diameter
> vs inner diameter) LBA's is typically a factor of 2.  This is why
> "short-stroking" works as a technique,

That is true, and the effect is not dominant compared to introducing
a lot of extra seeks.

> and another way that people
> doing competitive benchmarking can screw up and produce misleading
> numbers.

If you think we screwed up or produced misleading numbers, could you
please be up front about it instead of making insinuations and
continuing your tirade against benchmarking and those who do it.

> (If you use partitions instead of the whole disk, you have
> to use the same partition in order to make sure you aren't comparing
> apples with oranges.)

You can rest assured I did exactly that.

Somebody complained that things would look much different with seeks
factored out, so here are some new "competitive benchmarks" using
fs_mark on a ram disk:

   tasks11664

   ext4:   231  2154   5439
   btrfs:  152   962   2230
   xfs:268  2729   6466
   tux3:   315  5529  20301

(Files per second, more is better)

The shell commands are:

   fs_mark -dtest -D5 -N100 -L1 -p5 -r5 -s1048576 -w4096 -n1000 -t1
   fs_mark -dtest -D5 -N100 -L1 -p5 -r5 -s65536 -w4096 -n1000 -t16
   fs_mark -dtest -D5 -N100 -L1 -p5 -r5 -s4096 -w4096 -n1000 -t64

The ram disk removes seek overhead and greatly reduces media transfer
overhead. This does not change things much: it confirms that Tux3 is
significantly faster than the others at synchronous loads. This is
apparently true independently of media type, though to be sure SSD
remains to be tested.

The really interesting result is how much difference there is between
filesystems, even on a ram disk. Is it just CPU or is it synchronization
strategy and lock contention? Does our asynchronous front/back design
actually help a lot, instead of being a disadvantage as you predicted?

It is too bad that fs_mark caps number of tasks at 64, because I am
sure that some embarrassing behavior would emerge at high task counts,
as with my tests on spinning disk.

Anyway, everybody but you loves competitive benchmarks, that is why I
post them. They are not only useful for tracking down performance bugs,
but as you point out, they help us advertise the reasons why Tux3 is
interesting and ought to be merged.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread David Lang


On Mon, 11 May 2015, Daniel Phillips wrote:


On 05/11/2015 03:12 PM, Pavel Machek wrote:

It is a fact of life that when you change one aspect of an intimately 
interconnected system,
something else will change as well. You have naive/nonexistent free space 
management now; when you
design something workable there it is going to impact everything else you've 
already done. It's an
easy bet that the impact will be negative, the only question is to what degree.


You might lose that bet. For example, suppose we do strictly linear allocation
each delta, and just leave nice big gaps between the deltas for future
expansion. Clearly, we run at similar or identical speed to the current naive
strategy until we must start filling in the gaps, and at that point our layout
is not any worse than XFS, which started bad and stayed that way.


Umm, are you sure. If "some areas of disk are faster than others" is
still true on todays harddrives, the gaps will decrease the
performance (as you'll "use up" the fast areas more quickly).


That's why I hedged my claim with "similar or identical". The
difference in media speed seems to be a relatively small effect
compared to extra seeks. It seems that XFS puts big spaces between
new directories, and suffers a lot of extra seeks because of it.
I propose to batch new directories together initially, then change
the allocation goal to a new, relatively empty area if a big batch
of files lands on a directory in a crowded region. The "big" gaps
would be on the order of delta size, so not really very big.


This is an interesting idea, but what happens if the files don't arrive as a big 
batch, but rather trickle in over time (think a logserver that if putting files 
into a bunch of directories at a fairly modest rate per directory)


And when you then decide that you have to move the directory/file info, doesn't 
that create a potentially large amount of unexpected IO that could end up 
interfering with what the user is trying to do?


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread Daniel Phillips

Hi Pavel,

On 05/11/2015 03:12 PM, Pavel Machek wrote:
>>> It is a fact of life that when you change one aspect of an intimately 
>>> interconnected system,
>>> something else will change as well. You have naive/nonexistent free space 
>>> management now; when you
>>> design something workable there it is going to impact everything else 
>>> you've already done. It's an
>>> easy bet that the impact will be negative, the only question is to what 
>>> degree.
>>
>> You might lose that bet. For example, suppose we do strictly linear 
>> allocation
>> each delta, and just leave nice big gaps between the deltas for future
>> expansion. Clearly, we run at similar or identical speed to the current naive
>> strategy until we must start filling in the gaps, and at that point our 
>> layout
>> is not any worse than XFS, which started bad and stayed that way.
> 
> Umm, are you sure. If "some areas of disk are faster than others" is
> still true on todays harddrives, the gaps will decrease the
> performance (as you'll "use up" the fast areas more quickly).

That's why I hedged my claim with "similar or identical". The
difference in media speed seems to be a relatively small effect
compared to extra seeks. It seems that XFS puts big spaces between
new directories, and suffers a lot of extra seeks because of it.
I propose to batch new directories together initially, then change
the allocation goal to a new, relatively empty area if a big batch
of files lands on a directory in a crowded region. The "big" gaps
would be on the order of delta size, so not really very big.

Anyway, some people seem to have pounced on the words "naive" and
"linear allocation" and jumped to the conclusion that our whole
strategy is naive. Far from it. We don't just throw files randomly
at the disk. We sort and partition files and metadata, and we
carefully arrange the order of our allocation operations so that
linear allocation produces a nice layout for both read and write.

This turned out to be so much better than fiddling with the goal
of individual allocations that we concluded we would get best
results by sticking with linear allocation, but improve our sort
step. The new plan is to partition updates into batches according
to some affinity metrics, and set the linear allocation goal per
batch. So for example, big files and append-type files can get
special treatment in separate batches, while files that seem to
be related because of having the same directory parent and being
written in the same delta will continue to be streamed out using
"naive" linear allocation, which is not necessarily as naive as
one might think.

It will take time and a lot of performance testing to get this
right, but nobody should get the idea that it is any inherent
design limitation. The opposite is true: we have no restrictions
at all in media layout.

Compared to Ext4, we do need to address the issue that data moves
around when updated. This can cause rapid fragmentation. Btrfs has
shown issues with that for big, randomly updated files. We want to
fix it without falling back on update-in-place as Btrfs does.

Actually, Tux3 already has update-in-place, and unlike Btrfs, we
can switch to it for non-empty files. But we think that perfect data
isolation per delta is something worth fighting for, and we would
rather not force users to fiddle around with mode settings just to
make something work as well as it already does on Ext4. We will
tackle this issue by partitioning as above, and use a dedicated
allocation strategy for such files, which are easy to detect.

Metadata moving around per update does not seem to be a problem
because it is all single blocks that need very little slack space
to stay close to home.

> Anyway... you have brand new filesystem. Of course it should be
> faster/better/nicer than the existing filesystems. So don't be too
> harsh with XFS people.

They have done a lot of good work, but they still have a long way
to go. I don't see any shame in that.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread Theodore Ts'o

On Tue, May 12, 2015 at 12:12:23AM +0200, Pavel Machek wrote:
> Umm, are you sure. If "some areas of disk are faster than others" is
> still true on todays harddrives, the gaps will decrease the
> performance (as you'll "use up" the fast areas more quickly).

It's still true.  The difference between O.D. and I.D. (outer diameter
vs inner diameter) LBA's is typically a factor of 2.  This is why
"short-stroking" works as a technique, and another way that people
doing competitive benchmarking can screw up and produce misleading
numbers.  (If you use partitions instead of the whole disk, you have
to use the same partition in order to make sure you aren't comparing
apples with oranges.)

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-11 Thread Pavel Machek

Hi!

> > It is a fact of life that when you change one aspect of an intimately 
> > interconnected system,
> > something else will change as well. You have naive/nonexistent free space 
> > management now; when you
> > design something workable there it is going to impact everything else 
> > you've already done. It's an
> > easy bet that the impact will be negative, the only question is to what 
> > degree.
> 
> You might lose that bet. For example, suppose we do strictly linear allocation
> each delta, and just leave nice big gaps between the deltas for future
> expansion. Clearly, we run at similar or identical speed to the current naive
> strategy until we must start filling in the gaps, and at that point our layout
> is not any worse than XFS, which started bad and stayed that way.

Umm, are you sure. If "some areas of disk are faster than others" is
still true on todays harddrives, the gaps will decrease the
performance (as you'll "use up" the fast areas more quickly).

Anyway... you have brand new filesystem. Of course it should be
faster/better/nicer than the existing filesystems. So don't be too
harsh with XFS people.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Christian Stroetmann


On the 30th of April 2015 17:14, Daniel Phillips wrote:

Hallo hardcore coders


On 04/30/2015 07:28 AM, Howard Chu wrote:

Daniel Phillips wrote:


On 04/30/2015 06:48 AM, Mike Galbraith wrote:

On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote:

On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote:

On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote:


Lovely sounding argument, but it is wrong because Tux3 still beats XFS
even with seek time factored out of the equation.

Hm.  Do you have big-storage comparison numbers to back that?  I'm no
storage guy (waiting for holographic crystal arrays to obsolete all this
crap;), but Dave's big-storage guy words made sense to me.

This has nothing to do with big storage. The proposition was that seek
time is the reason for Tux3's fsync performance. That claim was easily
falsified by removing the seek time.

Dave's big storage words are there to draw attention away from the fact
that XFS ran the Git tests four times slower than Tux3 and three times
slower than Ext4. Whatever the big storage excuse is for that, the fact
is, XFS obviously sucks at little storage.

If you allocate spanning the disk from start of life, you're going to
eat seeks that others don't until later.  That seemed rather obvious and
straight forward.

It is a logical falacy. It mixes a grain of truth (spreading all over the
disk causes extra seeks) with an obvious falsehood (it is not necessarily
the only possible way to avoid long term fragmentation).

You're reading into it what isn't there. Spreading over the disk isn't (just) 
about avoiding
fragmentation - it's about delivering consistent and predictable latency. It is 
undeniable that if
you start by only allocating from the fastest portion of the platter, you are 
going to see
performance slow down over time. If you start by spreading allocations across 
the entire platter,
you make the worst-case and average-case latency equal, which is exactly what a 
lot of folks are
looking for.

Another fallacy: intentionally running slower than necessary is not necessarily
the only way to deliver consistent and predictable latency. Not only that, but
intentionally running slower than necessary does not necessarily guarantee
performing better than some alternate strategy later.

Anyway, let's not be silly. Everybody in the room who wants Git to run 4 times
slower with no guarantee of any benefit in the future, please raise your hand.


He flat stated that xfs has passable performance on
single bit of rust, and openly explained why.  I see no misdirection,
only some evidence of bad blood between you two.

Raising the spectre of theoretical fragmentation issues when we have not
even begun that work is a straw man and intellectually dishonest. You have
to wonder why he does it. It is destructive to our community image and
harmful to progress.

It is a fact of life that when you change one aspect of an intimately 
interconnected system,
something else will change as well. You have naive/nonexistent free space 
management now; when you
design something workable there it is going to impact everything else you've 
already done. It's an
easy bet that the impact will be negative, the only question is to what degree.

You might lose that bet. For example, suppose we do strictly linear allocation
each delta, and just leave nice big gaps between the deltas for future
expansion. Clearly, we run at similar or identical speed to the current naive
strategy until we must start filling in the gaps, and at that point our layout
is not any worse than XFS, which started bad and stayed that way.

Now here is where you lose the bet: we already know that linear allocation
with wrap ends horribly right? However, as above, we start linear, without
compromise, but because of the gaps we leave, we are able to switch to a
slower strategy, but not nearly as slow as the ugly tangle we get with
simple wrap. So impact over the lifetime of the filesystem is positive, not
negative, and what seemed to be self evident to you turns out to be wrong.

In short, we would rather deliver as much performance as possible, all the
time. I really don't need to think about it very hard to know that is what I
want, and what most users want.

I will make you a bet in return: when we get to doing that part properly, the
quality of the work will be just as high as everything else we have completed
so far. Why would we suddenly get lazy?

Regards,

Daniel
--



How?
Maybe this is explained and discussed in a new thread about allocation 
or so.




Thanks
Best Regards
Have fun
C.S.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Martin Steigerwald

Am Donnerstag, 30. April 2015, 10:57:10 schrieb Theodore Ts'o:
> One of the problems is that it's *hard* to get good benchmarking
> numbers that take into account file system aging and measure how well
> the free space has been fragmented over time.  Most of the benchmark
> results that I've seen do a really lousy job at this, and the vast
> majority don't even try.
> 
> This is one of the reasons why I find head-to-head "competitions"
> between file systems to be not very helpful for anything other than
> benchmarketing.  It's almost certain that the benchmark won't be
> "fair" in some way, and it doesn't really matter whether the person
> doing the benchmark was doing it with malice aforethought, or was just
> incompetent and didn't understand the issues --- or did understand the
> issues and didn't really care, because what they _really_ wanted to do
> was to market their file system.

I agree to that.

One benchmark measure one thing, and if its with the fresh filesystem, it 
does so with a fresh filesystem.

Benchmarks that aiming at how to test an aged filesystem are much more 
expensive in time and resources needed, unless one reuses and aged 
filesystem image again and again.

Thanks for your explainations, Ted,

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Howard Chu


Daniel Phillips wrote:

On 04/30/2015 07:28 AM, Howard Chu wrote:

You're reading into it what isn't there. Spreading over the disk isn't (just) 
about avoiding
fragmentation - it's about delivering consistent and predictable latency. It is 
undeniable that if
you start by only allocating from the fastest portion of the platter, you are 
going to see
performance slow down over time. If you start by spreading allocations across 
the entire platter,
you make the worst-case and average-case latency equal, which is exactly what a 
lot of folks are
looking for.


Another fallacy: intentionally running slower than necessary is not necessarily
the only way to deliver consistent and predictable latency.


Totally agree with you there.


Not only that, but
intentionally running slower than necessary does not necessarily guarantee
performing better than some alternate strategy later.


True, it's a question of algorithmic efficiency - does the performance 
decay linearly or logarithmically.



Anyway, let's not be silly. Everybody in the room who wants Git to run 4 times
slower with no guarantee of any benefit in the future, please raise your hand.


git is an important workload for us as developers, but I don't think 
that's the only workload that's important for us.



He flat stated that xfs has passable performance on
single bit of rust, and openly explained why.  I see no misdirection,
only some evidence of bad blood between you two.


Raising the spectre of theoretical fragmentation issues when we have not
even begun that work is a straw man and intellectually dishonest. You have
to wonder why he does it. It is destructive to our community image and
harmful to progress.


It is a fact of life that when you change one aspect of an intimately 
interconnected system,
something else will change as well. You have naive/nonexistent free space 
management now; when you
design something workable there it is going to impact everything else you've 
already done. It's an
easy bet that the impact will be negative, the only question is to what degree.


You might lose that bet. For example, suppose we do strictly linear allocation
each delta, and just leave nice big gaps between the deltas for future
expansion. Clearly, we run at similar or identical speed to the current naive
strategy until we must start filling in the gaps, and at that point our layout
is not any worse than XFS, which started bad and stayed that way.

Now here is where you lose the bet: we already know that linear allocation
with wrap ends horribly right? However, as above, we start linear, without
compromise, but because of the gaps we leave, we are able to switch to a
slower strategy, but not nearly as slow as the ugly tangle we get with
simple wrap. So impact over the lifetime of the filesystem is positive, not
negative, and what seemed to be self evident to you turns out to be wrong.

In short, we would rather deliver as much performance as possible, all the
time. I really don't need to think about it very hard to know that is what I
want, and what most users want.

I will make you a bet in return: when we get to doing that part properly, the
quality of the work will be just as high as everything else we have completed
so far. Why would we suddenly get lazy?


I never said anything about getting lazy. You're working in a closed 
system though. If you run today's version on a system, and then you run 
your future version on that same hardware, you're doing more CPU work 
and probably more I/O work to do the more complex space management. It's 
not quite zero-sum but close enough, when you're talking about highly 
optimized designs.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Daniel Phillips

Hi Ted,

On 04/30/2015 07:57 AM, Theodore Ts'o wrote:
> This is one of the reasons why I find head-to-head "competitions"
> between file systems to be not very helpful for anything other than
> benchmarketing.  It's almost certain that the benchmark won't be
> "fair" in some way, and it doesn't really matter whether the person
> doing the benchmark was doing it with malice aforethought, or was just
> incompetent and didn't understand the issues --- or did understand the
> issues and didn't really care, because what they _really_ wanted to do
> was to market their file system.

Your proposition, as I understand it, is that nobody should ever do
benchmarks because any benchmark must be one of: 1) malicious; 2)
incompetent; or 3) careless. When in fact, a benchmark may be perfectly
honest, competently done, and informative.

> And even if the benchmark is fair, it might not match up with the end
> user's hardware, or their use case.  There will always be some use
> case where file system A is better than file system B, for pretty much
> any file system.  Don't get me wrong --- I will do comparisons between
> file systems, but only so I can figure out ways of making _my_ file
> system better.  And more often than not, it's comparisons of the same
> file system before and after adding some new feature which is the most
> interesting.

I cordially invite you to replicate our fsync benchmarks, or invent
your own. I am confident that you will find that the numbers are
accurate, that the test cases were well chosen, that the results are
informative, and that there is no sleight of hand.

As for whether or not people should "market" their filesystems as you
put it, that is easy for you to disparage when you are the incumbant.
If we don't tell people what is great about Tux3 then how will they
ever find out? Sure, it might be "advertising", but the important
question is, is it _truthful_ advertising? Surely you remember how
Linus got started... that was really blatant, and I am glad he did it.

>> That are the allocation groups. I always wondered how it can be beneficial 
>> to spread the allocations onto 4 areas of one partition on expensive seek 
>> media. Now that makes better sense for me. I always had the gut impression 
>> that XFS may not be the fastest in all cases, but it is one of the 
>> filesystem with the most consistent performance over time, but never was 
>> able to fully explain why that is.
> 
> Yep, pretty much all of the traditional update-in-place file systems
> since the BSD FFS have done this, and for the same reason.  For COW
> file systems which are are constantly moving data and metadata blocks
> around, they will need different strategies for trying to avoid the
> free space fragmentation problem as the file system ages.

Right, different problems, but I have a pretty good idea how to go
about it now. I made a failed attempt a while back and learned a lot,
my mistake was to try to give every object a fixed home position based
on where it was first written and the result was worse for both read
and write. Now the interesting thing is, naive linear allocation is
great for both read and read, so my effort now is directed towards
ways of doing naive linear allocation but choosing carefully which
order we do the allocation in. I will keep you posted on how that
progresses of course.

Anyway, how did we get onto allocation? I thought my post was about
fsync, and after all, you are the guest of honor.

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Daniel Phillips

On 04/30/2015 07:33 AM, Mike Galbraith wrote:
> Well ok, let's forget bad blood, straw men... and answering my question
> too I suppose.  Not having any sexy  IO gizmos in my little desktop box,
> I don't care deeply which stomps the other flat on beastly boxen.

I'm with you, especially the forget bad blood part. I did my time in
big storage and I will no doubt do it again, but right now, what I care
about is bringing truth and beauty to small storage, which includes
that spinning rust of yours and also the cheap SSD you are about to
run out and buy.

I hope you caught the bit about how Tux3 is doing really well running
in tmpfs? According to my calculations, that means good things for SSD
performance.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Daniel Phillips

On 04/30/2015 07:28 AM, Howard Chu wrote:
> Daniel Phillips wrote:
>>
>>
>> On 04/30/2015 06:48 AM, Mike Galbraith wrote:
>>> On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote:
 On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote:
> On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote:
>
>> Lovely sounding argument, but it is wrong because Tux3 still beats XFS
>> even with seek time factored out of the equation.
>
> Hm.  Do you have big-storage comparison numbers to back that?  I'm no
> storage guy (waiting for holographic crystal arrays to obsolete all this
> crap;), but Dave's big-storage guy words made sense to me.

 This has nothing to do with big storage. The proposition was that seek
 time is the reason for Tux3's fsync performance. That claim was easily
 falsified by removing the seek time.

 Dave's big storage words are there to draw attention away from the fact
 that XFS ran the Git tests four times slower than Tux3 and three times
 slower than Ext4. Whatever the big storage excuse is for that, the fact
 is, XFS obviously sucks at little storage.
>>>
>>> If you allocate spanning the disk from start of life, you're going to
>>> eat seeks that others don't until later.  That seemed rather obvious and
>>> straight forward.
>>
>> It is a logical falacy. It mixes a grain of truth (spreading all over the
>> disk causes extra seeks) with an obvious falsehood (it is not necessarily
>> the only possible way to avoid long term fragmentation).
> 
> You're reading into it what isn't there. Spreading over the disk isn't (just) 
> about avoiding
> fragmentation - it's about delivering consistent and predictable latency. It 
> is undeniable that if
> you start by only allocating from the fastest portion of the platter, you are 
> going to see
> performance slow down over time. If you start by spreading allocations across 
> the entire platter,
> you make the worst-case and average-case latency equal, which is exactly what 
> a lot of folks are
> looking for.

Another fallacy: intentionally running slower than necessary is not necessarily
the only way to deliver consistent and predictable latency. Not only that, but
intentionally running slower than necessary does not necessarily guarantee
performing better than some alternate strategy later.

Anyway, let's not be silly. Everybody in the room who wants Git to run 4 times
slower with no guarantee of any benefit in the future, please raise your hand.

>>> He flat stated that xfs has passable performance on
>>> single bit of rust, and openly explained why.  I see no misdirection,
>>> only some evidence of bad blood between you two.
>>
>> Raising the spectre of theoretical fragmentation issues when we have not
>> even begun that work is a straw man and intellectually dishonest. You have
>> to wonder why he does it. It is destructive to our community image and
>> harmful to progress.
> 
> It is a fact of life that when you change one aspect of an intimately 
> interconnected system,
> something else will change as well. You have naive/nonexistent free space 
> management now; when you
> design something workable there it is going to impact everything else you've 
> already done. It's an
> easy bet that the impact will be negative, the only question is to what 
> degree.

You might lose that bet. For example, suppose we do strictly linear allocation
each delta, and just leave nice big gaps between the deltas for future
expansion. Clearly, we run at similar or identical speed to the current naive
strategy until we must start filling in the gaps, and at that point our layout
is not any worse than XFS, which started bad and stayed that way.

Now here is where you lose the bet: we already know that linear allocation
with wrap ends horribly right? However, as above, we start linear, without
compromise, but because of the gaps we leave, we are able to switch to a
slower strategy, but not nearly as slow as the ugly tangle we get with
simple wrap. So impact over the lifetime of the filesystem is positive, not
negative, and what seemed to be self evident to you turns out to be wrong.

In short, we would rather deliver as much performance as possible, all the
time. I really don't need to think about it very hard to know that is what I
want, and what most users want.

I will make you a bet in return: when we get to doing that part properly, the
quality of the work will be just as high as everything else we have completed
so far. Why would we suddenly get lazy?

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Theodore Ts'o

On Thu, Apr 30, 2015 at 11:00:05AM +0200, Martin Steigerwald wrote:
> > IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and
> > the problem goes away. :)
> 
> I am quite surprised that a traditional filesystem that was created in the 
> age of rotating media does not like this kind of media and even seems to 
> excel on BTRFS on the new non rotating media available.

You shouldn't be surprised; XFS was designed in an era where RAID was
extremely important.  To this day, on a very large RAID arrays, I'm
pretty sure none of the other file systems will come close to touching
XFS, because it was optimized by some really, really good file system
engineers for that hardware.  And while RAID systems are certainly not
identical to SSD, the fact that you have multiple disk heads means
that a good file system will optimize for that parallelism, and that's
how SSD's get their speed (individual SSD channels aren't really all
that fast; it's the fast that you can be reading or writing arge
numbers of them in parallel that high end flash get their really great
performance numbers.)

> > Thing is, once you've abused those filesytsems for a couple of
> > months, the files in ext4, btrfs and tux3 are not going to be laid
> > out perfectly on the outer edge of the disk. They'll be spread all
> > over the place and so all the filesystems will be seeing large seeks
> > on read. The thing is, XFS will have roughly the same performance as
> > when the filesystem is empty because the spreading of the allocation
> > allows it to maintain better locality and separation and hence
> > doesn't fragment free space nearly as badly as the oher filesystems.
> > Free space fragmentation is what leads to performance degradation in
> > filesystems, and all the other filesystem will have degraded to be
> > *much worse* than XFS.

In fact, ext4 doesn't actually lay out things perfectly on the outer
edge of the disk either, because we try to do spreading as well.
Worse, we use a random algorithm to try to do the spreading, so that
means that results from run to run on an empty file system will show a
lot more variation.  I won't claim that we're best in class with
either our spreading techniques or our ability to manage free space
fragmentation, although we do a lot of work to manage free space
fragmentation as well.

One of the problems is that it's *hard* to get good benchmarking
numbers that take into account file system aging and measure how well
the free space has been fragmented over time.  Most of the benchmark
results that I've seen do a really lousy job at this, and the vast
majority don't even try.

This is one of the reasons why I find head-to-head "competitions"
between file systems to be not very helpful for anything other than
benchmarketing.  It's almost certain that the benchmark won't be
"fair" in some way, and it doesn't really matter whether the person
doing the benchmark was doing it with malice aforethought, or was just
incompetent and didn't understand the issues --- or did understand the
issues and didn't really care, because what they _really_ wanted to do
was to market their file system.

And even if the benchmark is fair, it might not match up with the end
user's hardware, or their use case.  There will always be some use
case where file system A is better than file system B, for pretty much
any file system.  Don't get me wrong --- I will do comparisons between
file systems, but only so I can figure out ways of making _my_ file
system better.  And more often than not, it's comparisons of the same
file system before and after adding some new feature which is the most
interesting.

> That are the allocation groups. I always wondered how it can be beneficial 
> to spread the allocations onto 4 areas of one partition on expensive seek 
> media. Now that makes better sense for me. I always had the gut impression 
> that XFS may not be the fastest in all cases, but it is one of the 
> filesystem with the most consistent performance over time, but never was 
> able to fully explain why that is.

Yep, pretty much all of the traditional update-in-place file systems
since the BSD FFS have done this, and for the same reason.  For COW
file systems which are are constantly moving data and metadata blocks
around, they will need different strategies for trying to avoid the
free space fragmentation problem as the file system ages.

Cheers,

   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Howard Chu


Daniel Phillips wrote:



On 04/30/2015 06:48 AM, Mike Galbraith wrote:

On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote:

On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote:

On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote:


Lovely sounding argument, but it is wrong because Tux3 still beats XFS
even with seek time factored out of the equation.


Hm.  Do you have big-storage comparison numbers to back that?  I'm no
storage guy (waiting for holographic crystal arrays to obsolete all this
crap;), but Dave's big-storage guy words made sense to me.


This has nothing to do with big storage. The proposition was that seek
time is the reason for Tux3's fsync performance. That claim was easily
falsified by removing the seek time.

Dave's big storage words are there to draw attention away from the fact
that XFS ran the Git tests four times slower than Tux3 and three times
slower than Ext4. Whatever the big storage excuse is for that, the fact
is, XFS obviously sucks at little storage.


If you allocate spanning the disk from start of life, you're going to
eat seeks that others don't until later.  That seemed rather obvious and
straight forward.


It is a logical falacy. It mixes a grain of truth (spreading all over the
disk causes extra seeks) with an obvious falsehood (it is not necessarily
the only possible way to avoid long term fragmentation).


You're reading into it what isn't there. Spreading over the disk isn't 
(just) about avoiding fragmentation - it's about delivering consistent 
and predictable latency. It is undeniable that if you start by only 
allocating from the fastest portion of the platter, you are going to see 
performance slow down over time. If you start by spreading allocations 
across the entire platter, you make the worst-case and average-case 
latency equal, which is exactly what a lot of folks are looking for.



He flat stated that xfs has passable performance on
single bit of rust, and openly explained why.  I see no misdirection,
only some evidence of bad blood between you two.


Raising the spectre of theoretical fragmentation issues when we have not
even begun that work is a straw man and intellectually dishonest. You have
to wonder why he does it. It is destructive to our community image and
harmful to progress.


It is a fact of life that when you change one aspect of an intimately 
interconnected system, something else will change as well. You have 
naive/nonexistent free space management now; when you design something 
workable there it is going to impact everything else you've already 
done. It's an easy bet that the impact will be negative, the only 
question is to what degree.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Mike Galbraith

On Thu, 2015-04-30 at 07:07 -0700, Daniel Phillips wrote:
> 
> On 04/30/2015 06:48 AM, Mike Galbraith wrote:
> > On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote:
> >> On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote:
> >>> On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote:
> >>>
>  Lovely sounding argument, but it is wrong because Tux3 still beats XFS
>  even with seek time factored out of the equation.
> >>>
> >>> Hm.  Do you have big-storage comparison numbers to back that?  I'm no
> >>> storage guy (waiting for holographic crystal arrays to obsolete all this
> >>> crap;), but Dave's big-storage guy words made sense to me.
> >>
> >> This has nothing to do with big storage. The proposition was that seek
> >> time is the reason for Tux3's fsync performance. That claim was easily
> >> falsified by removing the seek time.
> >>
> >> Dave's big storage words are there to draw attention away from the fact
> >> that XFS ran the Git tests four times slower than Tux3 and three times
> >> slower than Ext4. Whatever the big storage excuse is for that, the fact
> >> is, XFS obviously sucks at little storage.
> > 
> > If you allocate spanning the disk from start of life, you're going to
> > eat seeks that others don't until later.  That seemed rather obvious and
> > straight forward.
> 
> It is a logical falacy. It mixes a grain of truth (spreading all over the
> disk causes extra seeks) with an obvious falsehood (it is not necessarily
> the only possible way to avoid long term fragmentation).

Shrug, but seems it is a solution, and more importantly, an implemented
solution.  What I gleaned up as a layman reader is that xfs has no
fragmentation issue, but tux3 still does.  It doesn't seem right to slam
xfs for a conscious design decision unless tux3 can proudly display its
superior solution, which I gathered doesn't yet exist.

> > He flat stated that xfs has passable performance on
> > single bit of rust, and openly explained why.  I see no misdirection,
> > only some evidence of bad blood between you two.
> 
> Raising the spectre of theoretical fragmentation issues when we have not
> even begun that work is a straw man and intellectually dishonest. You have
> to wonder why he does it. It is destructive to our community image and
> harmful to progress.

Well ok, let's forget bad blood, straw men... and answering my question
too I suppose.  Not having any sexy  IO gizmos in my little desktop box,
I don't care deeply which stomps the other flat on beastly boxen.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Daniel Phillips



On 04/30/2015 06:48 AM, Mike Galbraith wrote:
> On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote:
>> On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote:
>>> On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote:
>>>
 Lovely sounding argument, but it is wrong because Tux3 still beats XFS
 even with seek time factored out of the equation.
>>>
>>> Hm.  Do you have big-storage comparison numbers to back that?  I'm no
>>> storage guy (waiting for holographic crystal arrays to obsolete all this
>>> crap;), but Dave's big-storage guy words made sense to me.
>>
>> This has nothing to do with big storage. The proposition was that seek
>> time is the reason for Tux3's fsync performance. That claim was easily
>> falsified by removing the seek time.
>>
>> Dave's big storage words are there to draw attention away from the fact
>> that XFS ran the Git tests four times slower than Tux3 and three times
>> slower than Ext4. Whatever the big storage excuse is for that, the fact
>> is, XFS obviously sucks at little storage.
> 
> If you allocate spanning the disk from start of life, you're going to
> eat seeks that others don't until later.  That seemed rather obvious and
> straight forward.

It is a logical falacy. It mixes a grain of truth (spreading all over the
disk causes extra seeks) with an obvious falsehood (it is not necessarily
the only possible way to avoid long term fragmentation).

> He flat stated that xfs has passable performance on
> single bit of rust, and openly explained why.  I see no misdirection,
> only some evidence of bad blood between you two.

Raising the spectre of theoretical fragmentation issues when we have not
even begun that work is a straw man and intellectually dishonest. You have
to wonder why he does it. It is destructive to our community image and
harmful to progress.

> No, I won't be switching to xfs any time soon, but then it would take a
> hell of a lot of evidence to get me to move away from ext4.  I trust
> ext[n] deeply because it has proven many times over the years that it
> can take one hell of a lot (of self inflicted wounds;).

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Mike Galbraith

On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote:
> On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote:
> > On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote:
> >
> >> Lovely sounding argument, but it is wrong because Tux3 still beats XFS
> >> even with seek time factored out of the equation.
> >
> > Hm.  Do you have big-storage comparison numbers to back that?  I'm no
> > storage guy (waiting for holographic crystal arrays to obsolete all this
> > crap;), but Dave's big-storage guy words made sense to me.
> 
> This has nothing to do with big storage. The proposition was that seek
> time is the reason for Tux3's fsync performance. That claim was easily
> falsified by removing the seek time.
> 
> Dave's big storage words are there to draw attention away from the fact
> that XFS ran the Git tests four times slower than Tux3 and three times
> slower than Ext4. Whatever the big storage excuse is for that, the fact
> is, XFS obviously sucks at little storage.

If you allocate spanning the disk from start of life, you're going to
eat seeks that others don't until later.  That seemed rather obvious and
straight forward.  He flat stated that xfs has passable performance on
single bit of rust, and openly explained why.  I see no misdirection,
only some evidence of bad blood between you two.

No, I won't be switching to xfs any time soon, but then it would take a
hell of a lot of evidence to get me to move away from ext4.  I trust
ext[n] deeply because it has proven many times over the years that it
can take one hell of a lot (of self inflicted wounds;).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Daniel Phillips


On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote:

On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote:


Lovely sounding argument, but it is wrong because Tux3 still beats XFS
even with seek time factored out of the equation.


Hm.  Do you have big-storage comparison numbers to back that?  I'm no
storage guy (waiting for holographic crystal arrays to obsolete all this
crap;), but Dave's big-storage guy words made sense to me.


This has nothing to do with big storage. The proposition was that seek
time is the reason for Tux3's fsync performance. That claim was easily
falsified by removing the seek time.

Dave's big storage words are there to draw attention away from the fact
that XFS ran the Git tests four times slower than Tux3 and three times
slower than Ext4. Whatever the big storage excuse is for that, the fact
is, XFS obviously sucks at little storage.

He also posted nonsense: "XFS, however, will spread the load across
many (if not all) of the disks, and so effectively reduce the average
seek time by the number of disks doing concurrent IO." False. No matter
how big an array of spinning disks you have, seek latency and
synchronous write latency stay the same. It is just an attempt to
bamboozle you. If instead he had talked about throughput, he would have
a point. But he didn't, because he knows that does not help his
argument. If fsync sucks on one disk, it will suck just as much on
a thousand disks.

The talk about filling up from the outside of disk is disingenuous.
Dave should know that Ext4 does not do that, it spreads out allocations
exactly to give good aging, and it does deliver that - Ext4's aging
performance is second to none. What XFS does is just stupid, and
instead of admitting that and fixing it, Dave claims it would be great
if the disk was an array or an SSD instead of what it actually is.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Mike Galbraith

On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote:

> Lovely sounding argument, but it is wrong because Tux3 still beats XFS
> even with seek time factored out of the equation.

Hm.  Do you have big-storage comparison numbers to back that?  I'm no
storage guy (waiting for holographic crystal arrays to obsolete all this
crap;), but Dave's big-storage guy words made sense to me.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Daniel Phillips


On Wednesday, April 29, 2015 5:20:08 PM PDT, Dave Chinner wrote:

It's easy to be fast on empty filesystems. XFS does not aim to be
fast in such situations - it aims to have consistent performance
across the life of the filesystem.

In this case, ext4, btrfs and tux3 have optimal allocation filling
from the outside of the disk, while XFS is spreading the files
across (at least) 4 separate regions of the whole disk. Hence XFS is
seeing seek times on read are much larger than the other filesystems
when the filesystem is empty as it is doing full disk seeks rather
than being confined to the outer edges of spindle.

Thing is, once you've abused those filesytsems for a couple of
months, the files in ext4, btrfs and tux3 are not going to be laid
out perfectly on the outer edge of the disk. They'll be spread all
over the place and so all the filesystems will be seeing large seeks
on read. The thing is, XFS will have roughly the same performance as
when the filesystem is empty because the spreading of the allocation
allows it to maintain better locality and separation and hence
doesn't fragment free space nearly as badly as the oher filesystems.
Free space fragmentation is what leads to performance degradation in
filesystems, and all the other filesystem will have degraded to be
*much worse* than XFS.

Put simply: empty filesystem benchmarking does not show the real
performance of the filesystem under sustained production workloads.
Hence benchmarks like this - while interesting from a theoretical
point of view and are widely used for bragging about whose got the
fastest - are mostly irrelevant to determining how the filesystem
will perform in production environments.

We can also look at this algorithm in a different way: take a large
filesystem (say a few hundred TB) across a few tens of disks in a
linear concat.  ext4, btrfs and tux3 will only hit the first disk in
the concat, and so go no faster because they are still bound by
physical seek times.  XFS, however, will spread the load across many
(if not all) of the disks, and so effectively reduce the average
seek time by the number of disks doing concurrent IO. Then you'll
see that application level IO concurrency becomes the performance
limitation, not the physical seek time of the hardware.

IOWs, what you don't see here is that the XFS algorithms that make
your test slow will keep *lots* of disks busy. i.e. testing empty
filesystem performance a single, slow disk demonstrates that an
algorithm designed for scalability isn't designed to acheive
physical seek distance minimisation.  Hence your storage makes XFS
look particularly poor in comparison to filesystems that are being
designed and optimised for the limitations of single slow spindles...

To further demonstrate that it is physical seek distance that is the
issue here, lets take the seek time out of the equation (e.g. use a
SSD).  Doing that will result in basically no difference in
performance between all 4 filesystems as performance will now be
determined by application level concurrency and that is the same for
all tests.


Lovely sounding argument, but it is wrong because Tux3 still beats XFS
even with seek time factored out of the equation.

Even with SSD, if you just go splattering files all over the disk you
will pay for it in latency and lifetime when the disk goes into
continuous erase and your messy layout causes write multiplication.
But of course you can design your filesystem any way you want. Tux3
is designed to be fast on the hardware that people actually have.

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-30 Thread Martin Steigerwald

Am Donnerstag, 30. April 2015, 10:20:08 schrieb Dave Chinner:
> On Wed, Apr 29, 2015 at 09:05:26PM +0200, Mike Galbraith wrote:
> > Here's something that _might_ interest xfs folks.
> > 
> > cd git (source repository of git itself)
> > make clean
> > echo 3 > /proc/sys/vm/drop_caches
> > time make -j8 test
> > 
> > ext42m20.721s
> > xfs 6m41.887s <-- ick
> > btrfs   1m32.038s
> > tux31m30.262s
> > 
> > Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test.
> 
> TL;DR: Results are *very different* on a 256GB Samsung 840 EVO SSD
> with slightly slower CPUs (E5-4620 @ 2.20GHz)i, all filesystems
> using defaults:
> 
>   realusersys
> xfs   3m16.138s   7m8.341s14m32.462s
> ext4  3m18.045s   7m7.840s14m32.994s
> btrfs 3m45.149s   7m10.184s   16m30.498s
> 
> What you are seeing is physical seek distances impacting read
> performance.  XFS does not optimise for minimal physical seek
> distance, and hence is slower than filesytsems that do optimise for
> minimal seek distance. This shows up especially well on slow single
> spindles.
> 
> XFS is *adequate* for the use on slow single drives, but it is
> really designed for best performance on storage hardware that is not
> seek distance sensitive.
> 
> IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and
> the problem goes away. :)


I am quite surprised that a traditional filesystem that was created in the 
age of rotating media does not like this kind of media and even seems to 
excel on BTRFS on the new non rotating media available.

But…

> 
> 
> And now in more detail.
> 
> It's easy to be fast on empty filesystems. XFS does not aim to be
> fast in such situations - it aims to have consistent performance
> across the life of the filesystem.

… this is a quite important addition.

> Thing is, once you've abused those filesytsems for a couple of
> months, the files in ext4, btrfs and tux3 are not going to be laid
> out perfectly on the outer edge of the disk. They'll be spread all
> over the place and so all the filesystems will be seeing large seeks
> on read. The thing is, XFS will have roughly the same performance as
> when the filesystem is empty because the spreading of the allocation
> allows it to maintain better locality and separation and hence
> doesn't fragment free space nearly as badly as the oher filesystems.
> Free space fragmentation is what leads to performance degradation in
> filesystems, and all the other filesystem will have degraded to be
> *much worse* than XFS.

I even still see hungs on what I tend to see as freespace fragmentation in 
BTRFS. My /home on a Dual (!) BTRFS SSD setup can basically stall to a 
halt when it has reserved all space of the device for chunks. So this

merkaba:~> btrfs fi sh /home
Label: 'home'  uuid: […]
Total devices 2 FS bytes used 129.48GiB
devid1 size 170.00GiB used 146.03GiB path /dev/mapper/msata-
home
devid2 size 170.00GiB used 146.03GiB path /dev/mapper/sata-
home

Btrfs v3.18
merkaba:~> btrfs fi df /home
Data, RAID1: total=142.00GiB, used=126.72GiB
System, RAID1: total=32.00MiB, used=48.00KiB
Metadata, RAID1: total=4.00GiB, used=2.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

is safe, but one I have size 170 GiB user 170 GiB, even if inside the 
chunks there is enough free space to allocate from, enough as in 30-40 
GiB, it can happen that writes are stalled up to the point that 
applications on the desktop freeze and I see hung task messages in kernel 
log.

This is the case upto kernel 4.0. I have seen Chris Mason fixing some write 
stalls for big facebook setups, maybe it will help here, but unless this 
issue is fixed, I think BTRFS is not yet fully production ready, unless you 
leave *huge* amount of free space, as in for 200 GiB of data you want to 
write make a 400 GiB volume.

> Put simply: empty filesystem benchmarking does not show the real
> performance of the filesystem under sustained production workloads.
> Hence benchmarks like this - while interesting from a theoretical
> point of view and are widely used for bragging about whose got the
> fastest - are mostly irrelevant to determining how the filesystem
> will perform in production environments.
> 
> We can also look at this algorithm in a different way: take a large
> filesystem (say a few hundred TB) across a few tens of disks in a
> linear concat.  ext4, btrfs and tux3 will only hit the first disk in
> the concat, and so go no faster because they are still bound by
> physical seek times.  XFS, however, will spread the load across many
> (if not all) of the disks, and so effectively reduce the average
> seek time by the number of disks doing concurrent IO. Then you'll
> see that application level IO concurrency becomes the performance
> limitation, not the physical seek time of the hardware.

That are the allocation groups. I always wondered how it can be beneficial 
to spread the allocation

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-29 Thread Mike Galbraith

On Wed, 2015-04-29 at 14:12 -0700, Daniel Phillips wrote:

> Btrfs appears to optimize tiny files by storing them in its big btree,
> the equivalent of our itree, and Tux3 doesn't do that yet, so we are a
> bit hobbled for a make load.

That's not a build load, it's a git load.  btrfs beat all others at the
various git/quilt things I tried (since that's what I do lots of in real
life), but tux3 looked quite good too.

As Dave noted though, an orchard produces oodles of apples over its
lifetime, these shiny new apples may lose luster over time ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-29 Thread Mike Galbraith

On Thu, 2015-04-30 at 10:20 +1000, Dave Chinner wrote:

> IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and
> the problem goes away. :)

I'd love to.  Too bad sorry sack of sh*t MB manufacturer only applied
_connectors_ to 4 of 6 available ports, and they're all in use :)

> 
> 
> And now in more detail.

Thanks for those details, made perfect sense.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-29 Thread Dave Chinner

On Wed, Apr 29, 2015 at 09:05:26PM +0200, Mike Galbraith wrote:
> Here's something that _might_ interest xfs folks.
> 
> cd git (source repository of git itself)
> make clean
> echo 3 > /proc/sys/vm/drop_caches
> time make -j8 test
> 
> ext42m20.721s
> xfs 6m41.887s <-- ick
> btrfs   1m32.038s
> tux31m30.262s
> 
> Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test.

TL;DR: Results are *very different* on a 256GB Samsung 840 EVO SSD
with slightly slower CPUs (E5-4620 @ 2.20GHz)i, all filesystems
using defaults:

realusersys
xfs 3m16.138s   7m8.341s14m32.462s
ext43m18.045s   7m7.840s14m32.994s
btrfs   3m45.149s   7m10.184s   16m30.498s

What you are seeing is physical seek distances impacting read
performance.  XFS does not optimise for minimal physical seek
distance, and hence is slower than filesytsems that do optimise for
minimal seek distance. This shows up especially well on slow single
spindles.

XFS is *adequate* for the use on slow single drives, but it is
really designed for best performance on storage hardware that is not
seek distance sensitive.

IOWS, XFS just hates your disk. Spend $50 and buy a cheap SSD and
the problem goes away. :)

And now in more detail.

It's easy to be fast on empty filesystems. XFS does not aim to be
fast in such situations - it aims to have consistent performance
across the life of the filesystem.

In this case, ext4, btrfs and tux3 have optimal allocation filling
from the outside of the disk, while XFS is spreading the files
across (at least) 4 separate regions of the whole disk. Hence XFS is
seeing seek times on read are much larger than the other filesystems
when the filesystem is empty as it is doing full disk seeks rather
than being confined to the outer edges of spindle.

Thing is, once you've abused those filesytsems for a couple of
months, the files in ext4, btrfs and tux3 are not going to be laid
out perfectly on the outer edge of the disk. They'll be spread all
over the place and so all the filesystems will be seeing large seeks
on read. The thing is, XFS will have roughly the same performance as
when the filesystem is empty because the spreading of the allocation
allows it to maintain better locality and separation and hence
doesn't fragment free space nearly as badly as the oher filesystems.
Free space fragmentation is what leads to performance degradation in
filesystems, and all the other filesystem will have degraded to be
*much worse* than XFS.

Put simply: empty filesystem benchmarking does not show the real
performance of the filesystem under sustained production workloads.
Hence benchmarks like this - while interesting from a theoretical
point of view and are widely used for bragging about whose got the
fastest - are mostly irrelevant to determining how the filesystem
will perform in production environments.

We can also look at this algorithm in a different way: take a large
filesystem (say a few hundred TB) across a few tens of disks in a
linear concat.  ext4, btrfs and tux3 will only hit the first disk in
the concat, and so go no faster because they are still bound by
physical seek times.  XFS, however, will spread the load across many
(if not all) of the disks, and so effectively reduce the average
seek time by the number of disks doing concurrent IO. Then you'll
see that application level IO concurrency becomes the performance
limitation, not the physical seek time of the hardware.

IOWs, what you don't see here is that the XFS algorithms that make
your test slow will keep *lots* of disks busy. i.e. testing empty
filesystem performance a single, slow disk demonstrates that an
algorithm designed for scalability isn't designed to acheive
physical seek distance minimisation.  Hence your storage makes XFS
look particularly poor in comparison to filesystems that are being
designed and optimised for the limitations of single slow spindles...

To further demonstrate that it is physical seek distance that is the
issue here, lets take the seek time out of the equation (e.g. use a
SSD).  Doing that will result in basically no difference in
performance between all 4 filesystems as performance will now be
determined by application level concurrency and that is the same for
all tests.

e.g. on a 16p, 16GB RAM VM with storage on a SSDs a "make -j 8"
compile test on a kernel source tree (using my normal test machine
.config) gives:

realusersys
xfs:4m6.723s26m21.087s  2m49.426s
ext4:   4m11.415s   26m21.122s  2m49.786s
btrfs:  4m8.118s26m26.440s  2m50.357s

i.e. take seek times out of the picture, and XFS is just as fast as
any of the other filesystems.

Just about everyone I know uses SSDs in their laptops and machines
that build kernels these days, and spinning disks are rapidly
disappearing from enterprise and HPC environments which also happens
to be the target markets for XFS.  Hence

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-29 Thread Daniel Phillips


On Wednesday, April 29, 2015 12:05:26 PM PDT, Mike Galbraith wrote:

Here's something that _might_ interest xfs folks.

cd git (source repository of git itself)
make clean
echo 3 > /proc/sys/vm/drop_caches
time make -j8 test

ext42m20.721s
xfs 6m41.887s <-- ick
btrfs   1m32.038s
tux31m30.262s

Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test.

Are defaults for mkfs.xfs such that nobody sane uses them, or does xfs
really hate whatever git selftests are doing this much?


I'm more interested in the fact that we eked out a win :)

Btrfs appears to optimize tiny files by storing them in its big btree,
the equivalent of our itree, and Tux3 doesn't do that yet, so we are a
bit hobbled for a make load. Eventually, that gap should widen.

The pattern I noticed where the write-anywhere designs are beating the
journal designs seems to continue here. I am sure there are exceptions,
but maybe it is a real thing.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-04-29 Thread Austin S Hemmelgarn


On 2015-04-29 15:05, Mike Galbraith wrote:

Here's something that _might_ interest xfs folks.

cd git (source repository of git itself)
make clean
echo 3 > /proc/sys/vm/drop_caches
time make -j8 test

ext42m20.721s
xfs 6m41.887s <-- ick
btrfs   1m32.038s
tux31m30.262s

Testing by Aunt Tilly: mkfs, no fancy switches, mount the thing, test.

Are defaults for mkfs.xfs such that nobody sane uses them, or does xfs
really hate whatever git selftests are doing this much?

-Mike

I've been using the defaults for it and have been perfectly happy, 
although I do use a few non-default mount options (like noatime and 
noquota).  It may just be a factor of what exactly the tests are doing. 
 Based on my experience, xfs _is_ better performance wise with a few 
large files instead of a lot of small ones when used with the default 
mkfs options.  Of course, my uses for it are more focused on stability 
and reliability than performance (my primary use for XFS is /boot, and I 
use BTRFS for pretty much everything else).





smime.p7s
Description: S/MIME Cryptographic Signature

54 matches

Mail list logo