Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 12.05.2015 22:54, Daniel Phillips wrote: On 05/12/2015 11:39 AM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: ...it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say "we" when you know that I am just as much a "we" as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_item=MTA0NzM "XFS Developer Takes Shots At Btrfs, EXT4" umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. Perhaps you misunderstood. Linus decides what gets merged. Andrew decides. Greg decides. Dave Chinner does not decide, he just does his level best to create the impression that our project is unfit to merge. Any chance there might be an agenda? Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Maybe Dave has convincing arguments, that have been misinterpreted by that website, which is an interesting but also highliy manipulative publication. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. Nice idea, but it isn't working. Did you let the code talk to you? Right, you let the code talk to Dave Chinner, then you listen to what Dave Chinner has to say about it. Any chance that there might be some creative licence acting somewhere in that chain? We are missing the complete useable thing. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in<30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) You know what to do about checking for faulty benchmarks. So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. Ted and Dave failed to point out any actual problem with any benchmark. They invented issues with benchmarks and promoted those as FUD. In general, benchmarks are a critical issue. In this relation, let me quote Churchill in a derivated way: Do not trust a benchmark that you have not forged yourself. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) Yet he clearly wrote "we" as if he believes he is part of it. Now that ENOSPC is done to a standard way beyond what Btrfs had when it was merged, the next item on the agenda is writeback. That involves us and VFS people as you say, and not Dave Chinner, who only intends to obstruct the process as much as he possibly can. He should get back to work on his own project. Nobody will miss his posts if he doesn't make them. They contribute nothing of value, create a lot of bad blood, and just serve to further besmirch the famously tarnished reputation of LKML. At least, I would miss his contributions, specifically his technical explanations but also his opinions. You know that Tux3 is already fast. Not just that of course. It has a higher standard of data integrity than your metadata-only journalling filesystem and a small enough code base that it can be reasonably expected to reach the quality expected of an enterprise class filesystem, quite possibly before XFS gets there. We wouldn't expect anyone developing a new filesystem to believe any differently. It is not a matter of belief, it is a matter of testable fact. For example, you can count the lines. You can run the same benchmarks. Proving
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Am 12.05.2015 06:36, schrieb Daniel Phillips: Hi David, On 05/11/2015 05:12 PM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: On 05/11/2015 03:12 PM, Pavel Machek wrote: It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Umm, are you sure. If "some areas of disk are faster than others" is still true on todays harddrives, the gaps will decrease the performance (as you'll "use up" the fast areas more quickly). That's why I hedged my claim with "similar or identical". The difference in media speed seems to be a relatively small effect compared to extra seeks. It seems that XFS puts big spaces between new directories, and suffers a lot of extra seeks because of it. I propose to batch new directories together initially, then change the allocation goal to a new, relatively empty area if a big batch of files lands on a directory in a crowded region. The "big" gaps would be on the order of delta size, so not really very big. This is an interesting idea, but what happens if the files don't arrive as a big batch, but rather trickle in over time (think a logserver that if putting files into a bunch of directories at a fairly modest rate per directory) If files are trickling in then we can afford to spend a lot more time finding nice places to tuck them in. Log server files are an especially irksome problem for a redirect-on-write filesystem because the final block tends to be rewritten many times and we must move it to a new location each time, so every extent ends up as one block. Oh well. If we just make sure to have some free space at the end of the file that only that file can use (until everywhere else is full) then the long term result will be slightly ravelled blocks that nonetheless tend to be on the same track or flash block as their logically contiguous neighbours. There will be just zero or one empty data blocks mixed into the file tail as we commit the tail block over and over with the same allocation goal. Sometimes there will be a block or two of metadata as well, which will eventually bake themselves into the middle of contiguous data and stop moving around. Putting this together, we have: * At delta flush, break out all the log type files * Dedicate some block groups to append type files * Leave lots of space between files in those block groups * Peek at the last block of the file to set the allocation goal Something like that. What we don't want is to throw those files into the middle of a lot of rewrite-all files, messing up both kinds of file. We don't care much about keeping these files near the parent directory because one big seek per log file in a grep is acceptable, we just need to avoid thousands of big seeks within the file, and not dribble single blocks all over the disk. It would also be nice to merge together extents somehow as the final block is rewritten. One idea is to retain the final block dirty until the next delta, and write it again into a contiguous position, so the final block is always flushed twice. We already have the opportunistic merge logic, but the redirty behavior and making sure it only happens to log files would be a bit fiddly. We will also play the incremental defragmentation card at some point, but first we should try hard to control fragmentation in the first place. Tux3 is well suited to online defragmentation because the delta commit model makes it easy to move things around efficiently and safely, but it does generate extra IO, so as a basic mechanism it is not ideal. When we get to piling on features, that will be high on the list, because it is relatively easy, and having that fallback gives a certain sense of security. So we are again at some more features of SASOS4Fun. Said this, I can see as an alleged troll expert the agenda and strategy behind this and related threads, but still no usable code/file system at all and hence nothing that even might be ready for merging, as I understand the statements of the file system gurus. So it is time for the developer(s) to take decisions, what should be implement respectively manifested in code eventually and then show the complete result, so that others can make the tests and the benchmarks. Thanks Best Regards Do not feed the trolls. C.S. And when you then decide that
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Am 12.05.2015 06:36, schrieb Daniel Phillips: Hi David, On 05/11/2015 05:12 PM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: On 05/11/2015 03:12 PM, Pavel Machek wrote: It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Umm, are you sure. If some areas of disk are faster than others is still true on todays harddrives, the gaps will decrease the performance (as you'll use up the fast areas more quickly). That's why I hedged my claim with similar or identical. The difference in media speed seems to be a relatively small effect compared to extra seeks. It seems that XFS puts big spaces between new directories, and suffers a lot of extra seeks because of it. I propose to batch new directories together initially, then change the allocation goal to a new, relatively empty area if a big batch of files lands on a directory in a crowded region. The big gaps would be on the order of delta size, so not really very big. This is an interesting idea, but what happens if the files don't arrive as a big batch, but rather trickle in over time (think a logserver that if putting files into a bunch of directories at a fairly modest rate per directory) If files are trickling in then we can afford to spend a lot more time finding nice places to tuck them in. Log server files are an especially irksome problem for a redirect-on-write filesystem because the final block tends to be rewritten many times and we must move it to a new location each time, so every extent ends up as one block. Oh well. If we just make sure to have some free space at the end of the file that only that file can use (until everywhere else is full) then the long term result will be slightly ravelled blocks that nonetheless tend to be on the same track or flash block as their logically contiguous neighbours. There will be just zero or one empty data blocks mixed into the file tail as we commit the tail block over and over with the same allocation goal. Sometimes there will be a block or two of metadata as well, which will eventually bake themselves into the middle of contiguous data and stop moving around. Putting this together, we have: * At delta flush, break out all the log type files * Dedicate some block groups to append type files * Leave lots of space between files in those block groups * Peek at the last block of the file to set the allocation goal Something like that. What we don't want is to throw those files into the middle of a lot of rewrite-all files, messing up both kinds of file. We don't care much about keeping these files near the parent directory because one big seek per log file in a grep is acceptable, we just need to avoid thousands of big seeks within the file, and not dribble single blocks all over the disk. It would also be nice to merge together extents somehow as the final block is rewritten. One idea is to retain the final block dirty until the next delta, and write it again into a contiguous position, so the final block is always flushed twice. We already have the opportunistic merge logic, but the redirty behavior and making sure it only happens to log files would be a bit fiddly. We will also play the incremental defragmentation card at some point, but first we should try hard to control fragmentation in the first place. Tux3 is well suited to online defragmentation because the delta commit model makes it easy to move things around efficiently and safely, but it does generate extra IO, so as a basic mechanism it is not ideal. When we get to piling on features, that will be high on the list, because it is relatively easy, and having that fallback gives a certain sense of security. So we are again at some more features of SASOS4Fun. Said this, I can see as an alleged troll expert the agenda and strategy behind this and related threads, but still no usable code/file system at all and hence nothing that even might be ready for merging, as I understand the statements of the file system gurus. So it is time for the developer(s) to take decisions, what should be implement respectively manifested in code eventually and then show the complete result, so that others can make the tests and the benchmarks. Thanks Best Regards Do not feed the trolls. C.S. And when you then decide that you have
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On 12.05.2015 22:54, Daniel Phillips wrote: On 05/12/2015 11:39 AM, David Lang wrote: On Mon, 11 May 2015, Daniel Phillips wrote: ...it's the mm and core kernel developers that need to review and accept that code *before* we can consider merging tux3. Please do not say we when you know that I am just as much a we as you are. Merging Tux3 is not your decision. The people whose decision it actually is are perfectly capable of recognizing your agenda for what it is. http://www.phoronix.com/scan.php?page=news_itempx=MTA0NzM XFS Developer Takes Shots At Btrfs, EXT4 umm, Phoronix has no input on what gets merged into the kernel. they also hae a reputation for trying to turn anything into click-bait by making it sound like a fight when it isn't. Perhaps you misunderstood. Linus decides what gets merged. Andrew decides. Greg decides. Dave Chinner does not decide, he just does his level best to create the impression that our project is unfit to merge. Any chance there might be an agenda? Phoronix published a headline that identifies Dave Chinner as someone who takes shots at other projects. Seems pretty much on the money to me, and it ought to be obvious why he does it. Maybe Dave has convincing arguments, that have been misinterpreted by that website, which is an interesting but also highliy manipulative publication. The real question is, has the Linux development process become so political and toxic that worthwhile projects fail to benefit from supposed grassroots community support. You are the poster child for that. The linux development process is making code available, responding to concerns from the experts in the community, and letting the code talk for itself. Nice idea, but it isn't working. Did you let the code talk to you? Right, you let the code talk to Dave Chinner, then you listen to what Dave Chinner has to say about it. Any chance that there might be some creative licence acting somewhere in that chain? We are missing the complete useable thing. There have been many people pushing code for inclusion that has not gotten into the kernel, or has not been used by any distros after it's made it into the kernel, in spite of benchmarks being posted that seem to show how wonderful the new code is. ReiserFS was one of the first, and part of what tarnished it's reputation with many people was how much they were pushing the benchmarks that were shown to be faulty (the one I remember most vividly was that the entire benchmark completed in30 seconds, and they had the FS tuned to not start flushing data to disk for 30 seconds, so the entire 'benchmark' ran out of ram without ever touching the disk) You know what to do about checking for faulty benchmarks. So when Ted and Dave point out problems with the benchmark (the difference in behavior between a single spinning disk, different partitions on the same disk, SSDs, and ramdisks), you would be better off acknowledging them and if you can't adjust and re-run the benchmarks, don't start attacking them as a result. Ted and Dave failed to point out any actual problem with any benchmark. They invented issues with benchmarks and promoted those as FUD. In general, benchmarks are a critical issue. In this relation, let me quote Churchill in a derivated way: Do not trust a benchmark that you have not forged yourself. As Dave says above, it's not the other filesystem people you have to convince, it's the core VFS and Memory Mangement folks you have to convince. You may need a little benchmarking to show that there is a real advantage to be gained, but the real discussion is going to be on the impact that page forking is going to have on everything else (both in complexity and in performance impact to other things) Yet he clearly wrote we as if he believes he is part of it. Now that ENOSPC is done to a standard way beyond what Btrfs had when it was merged, the next item on the agenda is writeback. That involves us and VFS people as you say, and not Dave Chinner, who only intends to obstruct the process as much as he possibly can. He should get back to work on his own project. Nobody will miss his posts if he doesn't make them. They contribute nothing of value, create a lot of bad blood, and just serve to further besmirch the famously tarnished reputation of LKML. At least, I would miss his contributions, specifically his technical explanations but also his opinions. You know that Tux3 is already fast. Not just that of course. It has a higher standard of data integrity than your metadata-only journalling filesystem and a small enough code base that it can be reasonably expected to reach the quality expected of an enterprise class filesystem, quite possibly before XFS gets there. We wouldn't expect anyone developing a new filesystem to believe any differently. It is not a matter of belief, it is a matter of testable fact. For example, you can count the lines. You can run the same benchmarks. Proving the data
Re: Tux3 Report: How fast can we fsync?
On 2nd of May 2015 18:30, Richard Weinberger wrote: On Sat, May 2, 2015 at 6:00 PM, Christian Stroetmann wrote: On the 2nd of May 2015 12:26, Daniel Phillips wrote: Aloha everybody On Friday, May 1, 2015 6:07:48 PM PDT, David Lang wrote: On Fri, 1 May 2015, Daniel Phillips wrote: On Friday, May 1, 2015 8:38:55 AM PDT, Dave Chinner wrote: Well, yes - I never claimed XFS is a general purpose filesystem. It is a high performance filesystem. Is is also becoming more relevant to general purpose systems as low cost storage gains capabilities that used to be considered the domain of high performance storage... OK. Well, Tux3 is general purpose and that means we care about single spinning disk and small systems. keep in mind that if you optimize only for the small systems you may not scale as well to the larger ones. Tux3 is designed to scale, and it will when the time comes. I look forward to putting Shardmap through its billion file test in due course. However, right now it would be wise to stay focused on basic functionality suited to a workstation because volunteer devs tend to have those. After that, phones are a natural direction, where hard core ACID commit and really smooth file ops are particularly attractive. Has anybody else a deja vu? Yes, the onto-troll strikes again... Everybody has her/his own interpretation about what open source means. I really thought there could be some kind of a constructive discussion about such a file system or at least about interesting technical features that can be used for other file systems like e.g. a potential EXT5, when I relaxed my position some days ago and proposed that also ideas are referenced correctly in relation with open source projects, specifically in relation with Tux3. Now, I have the impression that this is not possible and due to this any progress is hard to achieve. Thanks Best Regards Do not feed the trolls. C.S. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: How fast can we fsync?
On the 2nd of May 2015 12:26, Daniel Phillips wrote: Aloha everybody On Friday, May 1, 2015 6:07:48 PM PDT, David Lang wrote: On Fri, 1 May 2015, Daniel Phillips wrote: On Friday, May 1, 2015 8:38:55 AM PDT, Dave Chinner wrote: Well, yes - I never claimed XFS is a general purpose filesystem. It is a high performance filesystem. Is is also becoming more relevant to general purpose systems as low cost storage gains capabilities that used to be considered the domain of high performance storage... OK. Well, Tux3 is general purpose and that means we care about single spinning disk and small systems. keep in mind that if you optimize only for the small systems you may not scale as well to the larger ones. Tux3 is designed to scale, and it will when the time comes. I look forward to putting Shardmap through its billion file test in due course. However, right now it would be wise to stay focused on basic functionality suited to a workstation because volunteer devs tend to have those. After that, phones are a natural direction, where hard core ACID commit and really smooth file ops are particularly attractive. Has anybody else a deja vu? Nevertheless, why don't you just put your fsync, your interpretations (ACID, shardmap, etc.) of my things (OntoFS, file system of SASOS4Fun, and OntoBase), and whatever gimmicks you have in mind together into a prototypical file system, test it, and sent a message with a short description focused solely on others' and your foundational ideas and the technical features, a WWW address where the code can be found, and your test results to this mailing list without your marketing and self-promotion BEFORE and NOT DUE COURSE respectively NOT AFTER anybody else does a similar work or I am so annoyed that I implement it? Also, if it is so general that XFS and EXT4 should adapt it, then why don't you help to improve these file systems? Btw.: I have rejected my claims I made in that email, but definitely not given up my copyright if it is valid. per the ramdisk but, possibly not as relavent as you may think. This is why it's good to test on as many different systems as you can. As you run into different types of performance you can then pick ones to keep and test all the time. I keep being surprised how well it works for things we never tested before. Single spinning disk is interesting now, but will be less interesting later. multiple spinning disks in an array of some sort is going to remain very interesting for quite a while. The way to do md well is to integrate it into the block layer like Freebsd does (GEOM) and expose a richer interface for the filesystem. That is how I think Tux3 should work with big iron raid. I hope to be able to tackle that sometime before the stars start winking out. now, some things take a lot more work to test than others. Getting time on a system with a high performance, high capacity RAID is hard, but getting hold of an SSD from Fry's is much easier. If it's a budget item, ping me directly and I can donate one for testing (the cost of a drive is within my unallocated budget and using that to improve Linux is worthwhile) Thanks. As I'm reading Dave's comments, he isn't attacking you the way you seem to think he is. He is pointing ot that there are problems with your data, but he's also taking a lot of time to explain what's happening (and yes, some of this is probably because your simple tests with XFS made it look so bad) I hope the lightening up trend is a trend. the other filesystems don't use naive algortihms, they use something more complex, and while your current numbers are interesting, they are only preliminary until you add something to handle fragmentation. That can cause very significant problems. Fsync is pretty much agnostic to fragmentation, so those results are unlikely to change substantially even if we happen to do a lousy job on allocation policy, which I naturally consider unlikely. In fact, Tux3 fsync is going to get faster over time for a couple of reasons: the minimum blocks per commit will be reduced, and we will get rid of most of the seeks to beginning of volume that we currently suffer per commit. Remember how fabulous btrfs looked in the initial reports? and then corner cases were found that caused real problems and as the algorithms have been changed to prevent those corner cases from being so easy to hit, the common case has suffered somewhat. This isn't an attack on Tux2 or btrfs, it's just a reality of programming. If you are not accounting for all the corner cases, everything is easier, and faster. Mine is a lame i5 minitower with 4GB from Fry's. Yours is clearly way more substantial, so I can't compare my numbers directly to yours. If you are doing tests with a 4G ramdisk on a machine with only 4G of RAM, it seems like you end up testing a lot more than just the filesystem. Testing in such low memory situations can indentify
Re: Tux3 Report: How fast can we fsync?
On the 2nd of May 2015 12:26, Daniel Phillips wrote: Aloha everybody On Friday, May 1, 2015 6:07:48 PM PDT, David Lang wrote: On Fri, 1 May 2015, Daniel Phillips wrote: On Friday, May 1, 2015 8:38:55 AM PDT, Dave Chinner wrote: Well, yes - I never claimed XFS is a general purpose filesystem. It is a high performance filesystem. Is is also becoming more relevant to general purpose systems as low cost storage gains capabilities that used to be considered the domain of high performance storage... OK. Well, Tux3 is general purpose and that means we care about single spinning disk and small systems. keep in mind that if you optimize only for the small systems you may not scale as well to the larger ones. Tux3 is designed to scale, and it will when the time comes. I look forward to putting Shardmap through its billion file test in due course. However, right now it would be wise to stay focused on basic functionality suited to a workstation because volunteer devs tend to have those. After that, phones are a natural direction, where hard core ACID commit and really smooth file ops are particularly attractive. Has anybody else a deja vu? Nevertheless, why don't you just put your fsync, your interpretations (ACID, shardmap, etc.) of my things (OntoFS, file system of SASOS4Fun, and OntoBase), and whatever gimmicks you have in mind together into a prototypical file system, test it, and sent a message with a short description focused solely on others' and your foundational ideas and the technical features, a WWW address where the code can be found, and your test results to this mailing list without your marketing and self-promotion BEFORE and NOT DUE COURSE respectively NOT AFTER anybody else does a similar work or I am so annoyed that I implement it? Also, if it is so general that XFS and EXT4 should adapt it, then why don't you help to improve these file systems? Btw.: I have rejected my claims I made in that email, but definitely not given up my copyright if it is valid. per the ramdisk but, possibly not as relavent as you may think. This is why it's good to test on as many different systems as you can. As you run into different types of performance you can then pick ones to keep and test all the time. I keep being surprised how well it works for things we never tested before. Single spinning disk is interesting now, but will be less interesting later. multiple spinning disks in an array of some sort is going to remain very interesting for quite a while. The way to do md well is to integrate it into the block layer like Freebsd does (GEOM) and expose a richer interface for the filesystem. That is how I think Tux3 should work with big iron raid. I hope to be able to tackle that sometime before the stars start winking out. now, some things take a lot more work to test than others. Getting time on a system with a high performance, high capacity RAID is hard, but getting hold of an SSD from Fry's is much easier. If it's a budget item, ping me directly and I can donate one for testing (the cost of a drive is within my unallocated budget and using that to improve Linux is worthwhile) Thanks. As I'm reading Dave's comments, he isn't attacking you the way you seem to think he is. He is pointing ot that there are problems with your data, but he's also taking a lot of time to explain what's happening (and yes, some of this is probably because your simple tests with XFS made it look so bad) I hope the lightening up trend is a trend. the other filesystems don't use naive algortihms, they use something more complex, and while your current numbers are interesting, they are only preliminary until you add something to handle fragmentation. That can cause very significant problems. Fsync is pretty much agnostic to fragmentation, so those results are unlikely to change substantially even if we happen to do a lousy job on allocation policy, which I naturally consider unlikely. In fact, Tux3 fsync is going to get faster over time for a couple of reasons: the minimum blocks per commit will be reduced, and we will get rid of most of the seeks to beginning of volume that we currently suffer per commit. Remember how fabulous btrfs looked in the initial reports? and then corner cases were found that caused real problems and as the algorithms have been changed to prevent those corner cases from being so easy to hit, the common case has suffered somewhat. This isn't an attack on Tux2 or btrfs, it's just a reality of programming. If you are not accounting for all the corner cases, everything is easier, and faster. Mine is a lame i5 minitower with 4GB from Fry's. Yours is clearly way more substantial, so I can't compare my numbers directly to yours. If you are doing tests with a 4G ramdisk on a machine with only 4G of RAM, it seems like you end up testing a lot more than just the filesystem. Testing in such low memory situations can indentify
Re: Tux3 Report: How fast can we fsync?
On 2nd of May 2015 18:30, Richard Weinberger wrote: On Sat, May 2, 2015 at 6:00 PM, Christian Stroetmann stroetm...@ontolab.com wrote: On the 2nd of May 2015 12:26, Daniel Phillips wrote: Aloha everybody On Friday, May 1, 2015 6:07:48 PM PDT, David Lang wrote: On Fri, 1 May 2015, Daniel Phillips wrote: On Friday, May 1, 2015 8:38:55 AM PDT, Dave Chinner wrote: Well, yes - I never claimed XFS is a general purpose filesystem. It is a high performance filesystem. Is is also becoming more relevant to general purpose systems as low cost storage gains capabilities that used to be considered the domain of high performance storage... OK. Well, Tux3 is general purpose and that means we care about single spinning disk and small systems. keep in mind that if you optimize only for the small systems you may not scale as well to the larger ones. Tux3 is designed to scale, and it will when the time comes. I look forward to putting Shardmap through its billion file test in due course. However, right now it would be wise to stay focused on basic functionality suited to a workstation because volunteer devs tend to have those. After that, phones are a natural direction, where hard core ACID commit and really smooth file ops are particularly attractive. Has anybody else a deja vu? Yes, the onto-troll strikes again... Everybody has her/his own interpretation about what open source means. I really thought there could be some kind of a constructive discussion about such a file system or at least about interesting technical features that can be used for other file systems like e.g. a potential EXT5, when I relaxed my position some days ago and proposed that also ideas are referenced correctly in relation with open source projects, specifically in relation with Tux3. Now, I have the impression that this is not possible and due to this any progress is hard to achieve. Thanks Best Regards Do not feed the trolls. C.S. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On the 30th of April 2015 17:14, Daniel Phillips wrote: Hallo hardcore coders On 04/30/2015 07:28 AM, Howard Chu wrote: Daniel Phillips wrote: On 04/30/2015 06:48 AM, Mike Galbraith wrote: On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote: On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: Lovely sounding argument, but it is wrong because Tux3 still beats XFS even with seek time factored out of the equation. Hm. Do you have big-storage comparison numbers to back that? I'm no storage guy (waiting for holographic crystal arrays to obsolete all this crap;), but Dave's big-storage guy words made sense to me. This has nothing to do with big storage. The proposition was that seek time is the reason for Tux3's fsync performance. That claim was easily falsified by removing the seek time. Dave's big storage words are there to draw attention away from the fact that XFS ran the Git tests four times slower than Tux3 and three times slower than Ext4. Whatever the big storage excuse is for that, the fact is, XFS obviously sucks at little storage. If you allocate spanning the disk from start of life, you're going to eat seeks that others don't until later. That seemed rather obvious and straight forward. It is a logical falacy. It mixes a grain of truth (spreading all over the disk causes extra seeks) with an obvious falsehood (it is not necessarily the only possible way to avoid long term fragmentation). You're reading into it what isn't there. Spreading over the disk isn't (just) about avoiding fragmentation - it's about delivering consistent and predictable latency. It is undeniable that if you start by only allocating from the fastest portion of the platter, you are going to see performance slow down over time. If you start by spreading allocations across the entire platter, you make the worst-case and average-case latency equal, which is exactly what a lot of folks are looking for. Another fallacy: intentionally running slower than necessary is not necessarily the only way to deliver consistent and predictable latency. Not only that, but intentionally running slower than necessary does not necessarily guarantee performing better than some alternate strategy later. Anyway, let's not be silly. Everybody in the room who wants Git to run 4 times slower with no guarantee of any benefit in the future, please raise your hand. He flat stated that xfs has passable performance on single bit of rust, and openly explained why. I see no misdirection, only some evidence of bad blood between you two. Raising the spectre of theoretical fragmentation issues when we have not even begun that work is a straw man and intellectually dishonest. You have to wonder why he does it. It is destructive to our community image and harmful to progress. It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Now here is where you lose the bet: we already know that linear allocation with wrap ends horribly right? However, as above, we start linear, without compromise, but because of the gaps we leave, we are able to switch to a slower strategy, but not nearly as slow as the ugly tangle we get with simple wrap. So impact over the lifetime of the filesystem is positive, not negative, and what seemed to be self evident to you turns out to be wrong. In short, we would rather deliver as much performance as possible, all the time. I really don't need to think about it very hard to know that is what I want, and what most users want. I will make you a bet in return: when we get to doing that part properly, the quality of the work will be just as high as everything else we have completed so far. Why would we suddenly get lazy? Regards, Daniel -- How? Maybe this is explained and discussed in a new thread about allocation or so. Thanks Best Regards Have fun C.S. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
On the 30th of April 2015 17:14, Daniel Phillips wrote: Hallo hardcore coders On 04/30/2015 07:28 AM, Howard Chu wrote: Daniel Phillips wrote: On 04/30/2015 06:48 AM, Mike Galbraith wrote: On Thu, 2015-04-30 at 05:58 -0700, Daniel Phillips wrote: On Thursday, April 30, 2015 5:07:21 AM PDT, Mike Galbraith wrote: On Thu, 2015-04-30 at 04:14 -0700, Daniel Phillips wrote: Lovely sounding argument, but it is wrong because Tux3 still beats XFS even with seek time factored out of the equation. Hm. Do you have big-storage comparison numbers to back that? I'm no storage guy (waiting for holographic crystal arrays to obsolete all this crap;), but Dave's big-storage guy words made sense to me. This has nothing to do with big storage. The proposition was that seek time is the reason for Tux3's fsync performance. That claim was easily falsified by removing the seek time. Dave's big storage words are there to draw attention away from the fact that XFS ran the Git tests four times slower than Tux3 and three times slower than Ext4. Whatever the big storage excuse is for that, the fact is, XFS obviously sucks at little storage. If you allocate spanning the disk from start of life, you're going to eat seeks that others don't until later. That seemed rather obvious and straight forward. It is a logical falacy. It mixes a grain of truth (spreading all over the disk causes extra seeks) with an obvious falsehood (it is not necessarily the only possible way to avoid long term fragmentation). You're reading into it what isn't there. Spreading over the disk isn't (just) about avoiding fragmentation - it's about delivering consistent and predictable latency. It is undeniable that if you start by only allocating from the fastest portion of the platter, you are going to see performance slow down over time. If you start by spreading allocations across the entire platter, you make the worst-case and average-case latency equal, which is exactly what a lot of folks are looking for. Another fallacy: intentionally running slower than necessary is not necessarily the only way to deliver consistent and predictable latency. Not only that, but intentionally running slower than necessary does not necessarily guarantee performing better than some alternate strategy later. Anyway, let's not be silly. Everybody in the room who wants Git to run 4 times slower with no guarantee of any benefit in the future, please raise your hand. He flat stated that xfs has passable performance on single bit of rust, and openly explained why. I see no misdirection, only some evidence of bad blood between you two. Raising the spectre of theoretical fragmentation issues when we have not even begun that work is a straw man and intellectually dishonest. You have to wonder why he does it. It is destructive to our community image and harmful to progress. It is a fact of life that when you change one aspect of an intimately interconnected system, something else will change as well. You have naive/nonexistent free space management now; when you design something workable there it is going to impact everything else you've already done. It's an easy bet that the impact will be negative, the only question is to what degree. You might lose that bet. For example, suppose we do strictly linear allocation each delta, and just leave nice big gaps between the deltas for future expansion. Clearly, we run at similar or identical speed to the current naive strategy until we must start filling in the gaps, and at that point our layout is not any worse than XFS, which started bad and stayed that way. Now here is where you lose the bet: we already know that linear allocation with wrap ends horribly right? However, as above, we start linear, without compromise, but because of the gaps we leave, we are able to switch to a slower strategy, but not nearly as slow as the ugly tangle we get with simple wrap. So impact over the lifetime of the filesystem is positive, not negative, and what seemed to be self evident to you turns out to be wrong. In short, we would rather deliver as much performance as possible, all the time. I really don't need to think about it very hard to know that is what I want, and what most users want. I will make you a bet in return: when we get to doing that part properly, the quality of the work will be just as high as everything else we have completed so far. Why would we suddenly get lazy? Regards, Daniel -- How? Maybe this is explained and discussed in a new thread about allocation or so. Thanks Best Regards Have fun C.S. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
On the 20th of June 2013 22:27, Daniel Phillips wrote: On 06/20/2013 12:12 PM, Christian Stroetmann wrote: 1. Stop copying my intellectual properties related with file systems and implementing them. You always came several months too late and I am not interested to let it become a running gag, definitely. 2. Stop marketing my ideas, especially in a way that confuses the public about the true origin even further. I am already marketing them on my own. 3. Give credits to my intellectual properties in any case, even if you make a derivation, and take care about the correct licensing. Could you please direct us to details of your design so that we may properly appreciate it? Note that the key idea in Shardmap is not simply logging a hash table, but sharding it and logging it as a forest of fifos. Regards, Daniel Around 2 years ago, I looked at some details of the FS design and discussed the copyright issue with one of my attorneys. Today, I would like to make the following (maybe closing) words before somebody says I would block a development: 1. In general, there is a copyright for every protectable work done by a person in the moment of its publication, but in practice it is hard to prove, specifically in such a technical case. I will not go into the legal details. Said this, at least I reject my claims, but still think that generally it would by a constructive measure if even ideas are referenced in relation with open source hard- and software. In my case it led to the situation that I have stopped to publicate ideas, with some very few exceptions. 2. In particular respectively from the point of view of the software design, the implementation is virtually what I have proposed (as well). Indeed, there are some interesting details and elegant paraphrases. 3. I think it would be interesting to analyze how well this FS works respectively to compare this FS with databases that implement something surprisingly similar. C.S. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
On the 20th of June 2013 22:27, Daniel Phillips wrote: On 06/20/2013 12:12 PM, Christian Stroetmann wrote: 1. Stop copying my intellectual properties related with file systems and implementing them. You always came several months too late and I am not interested to let it become a running gag, definitely. 2. Stop marketing my ideas, especially in a way that confuses the public about the true origin even further. I am already marketing them on my own. 3. Give credits to my intellectual properties in any case, even if you make a derivation, and take care about the correct licensing. Could you please direct us to details of your design so that we may properly appreciate it? Note that the key idea in Shardmap is not simply logging a hash table, but sharding it and logging it as a forest of fifos. Regards, Daniel Around 2 years ago, I looked at some details of the FS design and discussed the copyright issue with one of my attorneys. Today, I would like to make the following (maybe closing) words before somebody says I would block a development: 1. In general, there is a copyright for every protectable work done by a person in the moment of its publication, but in practice it is hard to prove, specifically in such a technical case. I will not go into the legal details. Said this, at least I reject my claims, but still think that generally it would by a constructive measure if even ideas are referenced in relation with open source hard- and software. In my case it led to the situation that I have stopped to publicate ideas, with some very few exceptions. 2. In particular respectively from the point of view of the software design, the implementation is virtually what I have proposed (as well). Indeed, there are some interesting details and elegant paraphrases. 3. I think it would be interesting to analyze how well this FS works respectively to compare this FS with databases that implement something surprisingly similar. C.S. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs
be highly concurrent to handle machines with hundreds of CPU cores. Funnily enough, we already have a couple of persistent storage implementations that solve these problems to varying degrees. ext4 is one of them, if you ignore the scalability and concurrency requirements. XFS is the other. And both will run unmodified on a persistant ram block device, which we *already have*. Yeah! :D And so back to DAX. What users actually want from their high speed persistant RAM storage is direct, cpu addressable access to that persistent storage. They don't want to have to care about how to find an object in the persistent storage - that's what filesystems are for - they just want to be able to read and write to it directly. That's what DAX does - it provides existing filesystems a method for exposing direct access to the persistent RAM to applications in a manner that application developers are already familiar with. It's a win-win situation all round. IOWs, ext4/XFS + DAX gets us to a place that is good enough for most users and the hardware capabilities we expect to see in the next 5 years. And hopefully that will be long enough to bring a purpose built, next generation persistent memory filesystem to production quality that can take full advantage of the technology... Please, if possible, then could you be so kind and give such a very good summarization or a sketch about the future development path and system architecture? How does this mentioned purpose built, next generation persistent memory filesystem looks like? How does it differ from the DAX + FS approach and which advantages will it offer? Would it be some kind of an object storage system that possibly uses the said structures used by the kernel (see the two little questions above again)? Do we have to keep the term file for everything? Cheers, Dave. With all the best Christian Stroetmann -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs
machines with hundreds of CPU cores. Funnily enough, we already have a couple of persistent storage implementations that solve these problems to varying degrees. ext4 is one of them, if you ignore the scalability and concurrency requirements. XFS is the other. And both will run unmodified on a persistant ram block device, which we *already have*. Yeah! :D And so back to DAX. What users actually want from their high speed persistant RAM storage is direct, cpu addressable access to that persistent storage. They don't want to have to care about how to find an object in the persistent storage - that's what filesystems are for - they just want to be able to read and write to it directly. That's what DAX does - it provides existing filesystems a method for exposing direct access to the persistent RAM to applications in a manner that application developers are already familiar with. It's a win-win situation all round. IOWs, ext4/XFS + DAX gets us to a place that is good enough for most users and the hardware capabilities we expect to see in the next 5 years. And hopefully that will be long enough to bring a purpose built, next generation persistent memory filesystem to production quality that can take full advantage of the technology... Please, if possible, then could you be so kind and give such a very good summarization or a sketch about the future development path and system architecture? How does this mentioned purpose built, next generation persistent memory filesystem looks like? How does it differ from the DAX + FS approach and which advantages will it offer? Would it be some kind of an object storage system that possibly uses the said structures used by the kernel (see the two little questions above again)? Do we have to keep the term file for everything? Cheers, Dave. With all the best Christian Stroetmann -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/2] Add a super operation for writeback
On the 3rd of June 2014 16:57, Theodore Ts'o wrote: On Tue, Jun 03, 2014 at 07:30:32AM +0200, Christian Stroetmann wrote: In general, I do not believe that the complexity problems of soft updates, atomic writes, and related techniques can be solved by hand/manually. So my suggestion is to automatically handle the complexity problem of e.g. dependancies in a way that is comparable to a(n on-the-fly) file-system compiler so to say that works on a very large dependancy graph (having several billions of graph vertices actually). And at this point an abstraction like it is given with Featherstitch helps to feed and control this special FS compiler. Well, if you want to try to implement something like this, go for it! I am already active since some weeks. I'd be very curious to see how well (a) how much CPU overhead it takes to crunch on a dependency graph with billions of vertices, and (b) how easily can it be to express these dependencies and maintainable such a dependency language would be. Sounds like a great research topic, and To a) A run is expected to take some few hours on a single computing node. Also, such a graph processing must not be done all the time, but only if a new application demands a specific handling of the data in respect to e.g. one of the ACID criterias. That is the reason why I put "on-the-fly" in paranthesis. To b) I hoped that file system developers could make some suggestions or point to some no-gos. I am also thinking about Petri-Nets in this relation, though it is just an idea. I would also like to mention that it could be used in conjunction with Non-Volatile Memory (NVM) as well. I'll note the Call For Papers for FAST 2015 is out, and if you can solve these problems, it would make a great FAST 2015 submission: https://www.usenix.org/conference/fast15/call-for-papers Are you serious or have I missed the 1st of April once again? Actually, I could only write a general overview about the approach comparable to a white paper, but nothing more. Cheers, - Ted Best regards Christian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/2] Add a super operation for writeback
On the 3rd of June 2014 16:57, Theodore Ts'o wrote: On Tue, Jun 03, 2014 at 07:30:32AM +0200, Christian Stroetmann wrote: In general, I do not believe that the complexity problems of soft updates, atomic writes, and related techniques can be solved by hand/manually. So my suggestion is to automatically handle the complexity problem of e.g. dependancies in a way that is comparable to a(n on-the-fly) file-system compiler so to say that works on a very large dependancy graph (having several billions of graph vertices actually). And at this point an abstraction like it is given with Featherstitch helps to feed and control this special FS compiler. Well, if you want to try to implement something like this, go for it! I am already active since some weeks. I'd be very curious to see how well (a) how much CPU overhead it takes to crunch on a dependency graph with billions of vertices, and (b) how easily can it be to express these dependencies and maintainable such a dependency language would be. Sounds like a great research topic, and To a) A run is expected to take some few hours on a single computing node. Also, such a graph processing must not be done all the time, but only if a new application demands a specific handling of the data in respect to e.g. one of the ACID criterias. That is the reason why I put on-the-fly in paranthesis. To b) I hoped that file system developers could make some suggestions or point to some no-gos. I am also thinking about Petri-Nets in this relation, though it is just an idea. I would also like to mention that it could be used in conjunction with Non-Volatile Memory (NVM) as well. I'll note the Call For Papers for FAST 2015 is out, and if you can solve these problems, it would make a great FAST 2015 submission: https://www.usenix.org/conference/fast15/call-for-papers Are you serious or have I missed the 1st of April once again? Actually, I could only write a general overview about the approach comparable to a white paper, but nothing more. Cheers, - Ted Best regards Christian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/2] Add a super operation for writeback
On the 3rd of June 2014 05:39, Dave Chinner wrote: On Mon, Jun 02, 2014 at 10:30:07AM +0200, Christian Stroetmann wrote: When I followed the advice of Dave Chinner: "We're not going to merge that page forking stuff (like you were told at LSF 2013 more than a year ago: http://lwn.net/Articles/548091/) without rigorous design review and a demonstration of the solutions to all the hard corner cases it has" given in his e-mail related with the presentation of the latest version of the Tux3 file system (see [1]) and read the linked article, I found in the second comments: "Parts of this almost sound like it either a.) overlaps with or b.) would benefit greatly from something similar to Featherstitch [[2]]." Could it be that we have with Featherstitch a general solution already that is said to be even "file system agnostic"? Honestly, I thought that something like this would make its way into the Linux code base. Here's what I said about the last proposal (a few months ago) for integrating featherstitch into the kernel: http://www.spinics.net/lists/linux-fsdevel/msg72799.html It's not a viable solution. Cheers, Dave. How annoying, I did not remember your e-mail of the referenced thread "[Lsf-pc] [LSF/MM TOPIC] atomic block device" despite I saved it on local disk. Thanks a lot for the reminder. I also directly saw the problem with the research prototype Featherstitch, specifically the point "All the filesystem modules it has are built into the featherstitch kernel module, and called through a VFS shim layer". But it is just a prototype and its concept of abstraction has not to be copied 1:1 into the Linux code base. In general, I do not believe that the complexity problems of soft updates, atomic writes, and related techniques can be solved by hand/manually. So my suggestion is to automatically handle the complexity problem of e.g. dependancies in a way that is comparable to a(n on-the-fly) file-system compiler so to say that works on a very large dependancy graph (having several billions of graph vertices actually). And at this point an abstraction like it is given with Featherstitch helps to feed and control this special FS compiler. Actually, I have to follow the discussion further on the one hand and go deeper into the highly complex problem space on the other hand. With all the best Christian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/2] Add a super operation for writeback
On the 1st of June 2014 23:41, Daniel Phillips wrote: Hi, This is the first of four core changes we would like for Tux3. We start with a hard one and suggest a simple solution. The first patch in this series adds a new super operation to write back multiple inodes in a single call. The second patch applies to our linux-tux3 repository at git.kernel.org to demonstrate how this interface is used, and removes about 450 lines of workaround code. Traditionally, core kernel tells each mounted filesystems which dirty pages of which inodes should be flushed to disk, but unfortunately, is blissfully ignorant of filesystem-specific ordering constraints. This scheme was really invented for Ext2 and has not evolved much recently. Tux3, with its strong ordering and optimized delta commit, cannot tolerate random flushing and therefore takes responsibility for flush ordering itself. On the other hand, Tux3 has no good way to know when is the right time to flush, but core is good at that. This proposed API harmonizes those two competencies so that Tux3 and core each take care of what they are good at, and not more. The API extension consists of a new writeback hook and two helpers to be uses within the hook. The hook sits about halfway down the fs-writeback stack, just after core has determined that some dirty inodes should be flushed to disk and just before it starts thinking about which inodes those should be. At that point, core calls Tux3 instead of continuing on down the usual do_writepages path. Tux3 responds by staging a new delta commit, using the new helpers to tell core which inodes were flushed versus being left dirty in cache. This is pretty much the same behavior as the traditional writeout path, but less convoluted, probably more efficient, and certainly easier to analyze. The new writeback hook looks like: progress = sb->s_op->writeback(sb,_control,_pages); This should be self-explanatory: nr_pages and progress have the semantics of existing usage in fs-writeback; writeback_control is ignored by Tux3, but that is only because Tux3 always flushes everything and does not require hints for now. We can safely assume that or equivalent is wanted here. An obvious wart is the overlap between "progress" and "nr_pages", but fs-writeback thinks that way, so it would not make much sense to improve one without improving the other. Tux3 necessarily keeps its own dirty inode list, which is an area of overlap with fs-writeback. In a perfect world, there would be just one dirty inode list per superblock, on which both fs-writeback and Tux3 would operate. That would be a deeper core change than seems appropriate right now. Potential races are a big issue with this API, which is no surprise. The fs-writeback scheme requires keeping several kinds of object in sync: tux3 dirty inode lists, fs-writeback dirty inode lists and inode dirty state. The new helpers inode_writeback_done(inode) and inode_writeback_touch(inode) take care of that while hiding internal details of the fs-writeback implementation. Tux3 calls inode_writeback_done when it has flushed an inode and marked it clean, or calls inode_writeback_touch if it intends to retain a dirty inode in cache. These have simple implementations. The former just removes a clean inode from any fs-writeback list. The latter updates the inode's dirty timestamp so that fs-writeback does not keep trying flush it. Both these things could be done more efficiently by re-engineering fs-writeback, but we prefer to work with the existing scheme for now. Hirofumi's masterful hack nicely avoided racy removal of inodes from the writeback list by taking an internal fs-writeback lock inside filesystem code. The new helper requires dropping i_lock inside filesystem code and retaking it in the helper, so inode redirty can race with writeback list removal. This does not seem to present a problem because filesystem code is able to enforce strict alternation of cleaning and calling the helper. As an offsetting advantage, writeback lock contention is reduced. Compared to Hirofumi's hack, the cost of this interface is one additional spinlock per inode_writeback_done, which is insignificant compared to the convoluted code path that is avoided. Regards, Daniel When I followed the advice of Dave Chinner: "We're not going to merge that page forking stuff (like you were told at LSF 2013 more than a year ago: http://lwn.net/Articles/548091/) without rigorous design review and a demonstration of the solutions to all the hard corner cases it has" given in his e-mail related with the presentation of the latest version of the Tux3 file system (see [1]) and read the linked article, I found in the second comments: "Parts of this almost sound like it either a.) overlaps with or b.) would benefit greatly from something similar to Featherstitch [[2]]." Could it be that we have with Featherstitch a general solution already that is said to be even "file system agnostic"? Honestly, I
Re: [RFC][PATCH 1/2] Add a super operation for writeback
On the 1st of June 2014 23:41, Daniel Phillips wrote: Hi, This is the first of four core changes we would like for Tux3. We start with a hard one and suggest a simple solution. The first patch in this series adds a new super operation to write back multiple inodes in a single call. The second patch applies to our linux-tux3 repository at git.kernel.org to demonstrate how this interface is used, and removes about 450 lines of workaround code. Traditionally, core kernel tells each mounted filesystems which dirty pages of which inodes should be flushed to disk, but unfortunately, is blissfully ignorant of filesystem-specific ordering constraints. This scheme was really invented for Ext2 and has not evolved much recently. Tux3, with its strong ordering and optimized delta commit, cannot tolerate random flushing and therefore takes responsibility for flush ordering itself. On the other hand, Tux3 has no good way to know when is the right time to flush, but core is good at that. This proposed API harmonizes those two competencies so that Tux3 and core each take care of what they are good at, and not more. The API extension consists of a new writeback hook and two helpers to be uses within the hook. The hook sits about halfway down the fs-writeback stack, just after core has determined that some dirty inodes should be flushed to disk and just before it starts thinking about which inodes those should be. At that point, core calls Tux3 instead of continuing on down the usual do_writepages path. Tux3 responds by staging a new delta commit, using the new helpers to tell core which inodes were flushed versus being left dirty in cache. This is pretty much the same behavior as the traditional writeout path, but less convoluted, probably more efficient, and certainly easier to analyze. The new writeback hook looks like: progress = sb-s_op-writeback(sb,writeback_control,nr_pages); This should be self-explanatory: nr_pages and progress have the semantics of existing usage in fs-writeback; writeback_control is ignored by Tux3, but that is only because Tux3 always flushes everything and does not require hints for now. We can safely assume thatwbc or equivalent is wanted here. An obvious wart is the overlap between progress and nr_pages, but fs-writeback thinks that way, so it would not make much sense to improve one without improving the other. Tux3 necessarily keeps its own dirty inode list, which is an area of overlap with fs-writeback. In a perfect world, there would be just one dirty inode list per superblock, on which both fs-writeback and Tux3 would operate. That would be a deeper core change than seems appropriate right now. Potential races are a big issue with this API, which is no surprise. The fs-writeback scheme requires keeping several kinds of object in sync: tux3 dirty inode lists, fs-writeback dirty inode lists and inode dirty state. The new helpers inode_writeback_done(inode) and inode_writeback_touch(inode) take care of that while hiding internal details of the fs-writeback implementation. Tux3 calls inode_writeback_done when it has flushed an inode and marked it clean, or calls inode_writeback_touch if it intends to retain a dirty inode in cache. These have simple implementations. The former just removes a clean inode from any fs-writeback list. The latter updates the inode's dirty timestamp so that fs-writeback does not keep trying flush it. Both these things could be done more efficiently by re-engineering fs-writeback, but we prefer to work with the existing scheme for now. Hirofumi's masterful hack nicely avoided racy removal of inodes from the writeback list by taking an internal fs-writeback lock inside filesystem code. The new helper requires dropping i_lock inside filesystem code and retaking it in the helper, so inode redirty can race with writeback list removal. This does not seem to present a problem because filesystem code is able to enforce strict alternation of cleaning and calling the helper. As an offsetting advantage, writeback lock contention is reduced. Compared to Hirofumi's hack, the cost of this interface is one additional spinlock per inode_writeback_done, which is insignificant compared to the convoluted code path that is avoided. Regards, Daniel When I followed the advice of Dave Chinner: We're not going to merge that page forking stuff (like you were told at LSF 2013 more than a year ago: http://lwn.net/Articles/548091/) without rigorous design review and a demonstration of the solutions to all the hard corner cases it has given in his e-mail related with the presentation of the latest version of the Tux3 file system (see [1]) and read the linked article, I found in the second comments: Parts of this almost sound like it either a.) overlaps with or b.) would benefit greatly from something similar to Featherstitch [[2]]. Could it be that we have with Featherstitch a general solution already that is said to be even file system agnostic? Honestly, I
Re: [RFC][PATCH 1/2] Add a super operation for writeback
On the 3rd of June 2014 05:39, Dave Chinner wrote: On Mon, Jun 02, 2014 at 10:30:07AM +0200, Christian Stroetmann wrote: When I followed the advice of Dave Chinner: We're not going to merge that page forking stuff (like you were told at LSF 2013 more than a year ago: http://lwn.net/Articles/548091/) without rigorous design review and a demonstration of the solutions to all the hard corner cases it has given in his e-mail related with the presentation of the latest version of the Tux3 file system (see [1]) and read the linked article, I found in the second comments: Parts of this almost sound like it either a.) overlaps with or b.) would benefit greatly from something similar to Featherstitch [[2]]. Could it be that we have with Featherstitch a general solution already that is said to be even file system agnostic? Honestly, I thought that something like this would make its way into the Linux code base. Here's what I said about the last proposal (a few months ago) for integrating featherstitch into the kernel: http://www.spinics.net/lists/linux-fsdevel/msg72799.html It's not a viable solution. Cheers, Dave. How annoying, I did not remember your e-mail of the referenced thread [Lsf-pc] [LSF/MM TOPIC] atomic block device despite I saved it on local disk. Thanks a lot for the reminder. I also directly saw the problem with the research prototype Featherstitch, specifically the point All the filesystem modules it has are built into the featherstitch kernel module, and called through a VFS shim layer. But it is just a prototype and its concept of abstraction has not to be copied 1:1 into the Linux code base. In general, I do not believe that the complexity problems of soft updates, atomic writes, and related techniques can be solved by hand/manually. So my suggestion is to automatically handle the complexity problem of e.g. dependancies in a way that is comparable to a(n on-the-fly) file-system compiler so to say that works on a very large dependancy graph (having several billions of graph vertices actually). And at this point an abstraction like it is given with Featherstitch helps to feed and control this special FS compiler. Actually, I have to follow the discussion further on the one hand and go deeper into the highly complex problem space on the other hand. With all the best Christian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] New Linux Patch Review Group
Am 02.04.2014 00:20, schrieb Christian Stroetmann: Am 02.04.2014 00:02, schrieb Richard Weinberger: On Tue, Apr 1, 2014 at 11:33 PM, Christian Stroetmann wrote: On Tue, 01.April.2014 17:55, Felipe Balbi wrote: On Tue, Apr 01, 2014 at 11:40:16AM -0400, Chris Mason wrote: On 04/01/2014 11:16 AM, Boaz Harrosh wrote: On 04/01/2014 05:41 PM, Chris Mason wrote: Hello everyone, During last week's Collab summit, Jon Corbet suggested we use the power of social media to improve the Linux kernel patch review process. We thought this was a great idea, and have been experimenting with a new Facebook group dedicated to patch discussion and review. The new group provides a dramatically improved development workflow, including: * One click patch or comment approval * Comments enhanced with pictures and video * Who has seen your patches and comments * Searchable index of past submissions * A strong community without anonymous flames To help capture the group discussion in the final patch submission, we suggest adding a Liked-by: tag to commits that have been through group review. To use the new group, please join: https://www.facebook.com/groups/linuxpatches/ Once you've joined, you can post patches in the group, or email patches to linuxpatc...@groups.facebook.com -chris NACK! I do not have facebook and I do not like patches to be discussed behind my back. On the mailing list we don't even want HTML with bold lettered words so no thanks facebook adds nothing Please obliterate this bad idea. (And I do not have Facebook shares or care to) It's always hard to move on to new technologies. But at some point we have to recognize that the internet has developed a rich culture that the kernel community isn't taking full advantage of. I certainly don't expect everyone to convert right away, but there's a whole world out there beyond port 25. Agreed Acked-by: Felipe Balbi We might even be able to "recruit" a much more diverse group of reviewers who are undiscovered as of now ;-) Sorry, but definitely: Nack!!! We (the majority of the Linux maintainers) voted already on fb.com to make it out primary developing eco system and will abandon LKML starting with April 1st. Please, allow me to ask some questions: 1. It was proposed "During last week's Collab summit" and now it is decided already? 2. Do you have a link to the discussion on LKML or somewhere else? 3. Did you voted on fb.com or on LKML during? 4. Is there a link to the vote or any related informations? 5. You mean with "April 1st" since today respectively yesterday LKML will be stopped to function? 6. And with "eco system" you mean that a majority of the Linux maintainers want to use fb.com for the development of Linux? Honestly, I do not find this funny somehow, but a little curious. Best regards C. Stroetmann If it was an April fool hoax, then I have to applaud Chris Mason and the others a lot, get the post address for the beer, thank everybody for telling me that there is a world besides the monitor, and beg your pardon for my seriousness. Now, it seems to be that I have to learn how to get my fb.com account until April 2015. Hopefully, I can find a helping hand for this modern technology. Nevertheless, my suggestion to set such a platfrom up with an open source framework was really meant in this way. Good luck Christian Stroetmann -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] New Linux Patch Review Group
Am 02.04.2014 00:02, schrieb Richard Weinberger: On Tue, Apr 1, 2014 at 11:33 PM, Christian Stroetmann wrote: On Tue, 01.April.2014 17:55, Felipe Balbi wrote: On Tue, Apr 01, 2014 at 11:40:16AM -0400, Chris Mason wrote: On 04/01/2014 11:16 AM, Boaz Harrosh wrote: On 04/01/2014 05:41 PM, Chris Mason wrote: Hello everyone, During last week's Collab summit, Jon Corbet suggested we use the power of social media to improve the Linux kernel patch review process. We thought this was a great idea, and have been experimenting with a new Facebook group dedicated to patch discussion and review. The new group provides a dramatically improved development workflow, including: * One click patch or comment approval * Comments enhanced with pictures and video * Who has seen your patches and comments * Searchable index of past submissions * A strong community without anonymous flames To help capture the group discussion in the final patch submission, we suggest adding a Liked-by: tag to commits that have been through group review. To use the new group, please join: https://www.facebook.com/groups/linuxpatches/ Once you've joined, you can post patches in the group, or email patches to linuxpatc...@groups.facebook.com -chris NACK! I do not have facebook and I do not like patches to be discussed behind my back. On the mailing list we don't even want HTML with bold lettered words so no thanks facebook adds nothing Please obliterate this bad idea. (And I do not have Facebook shares or care to) It's always hard to move on to new technologies. But at some point we have to recognize that the internet has developed a rich culture that the kernel community isn't taking full advantage of. I certainly don't expect everyone to convert right away, but there's a whole world out there beyond port 25. Agreed Acked-by: Felipe Balbi We might even be able to "recruit" a much more diverse group of reviewers who are undiscovered as of now ;-) Sorry, but definitely: Nack!!! We (the majority of the Linux maintainers) voted already on fb.com to make it out primary developing eco system and will abandon LKML starting with April 1st. Please, allow me to ask some questions: 1. It was proposed "During last week's Collab summit" and now it is decided already? 2. Do you have a link to the discussion on LKML or somewhere else? 3. Did you voted on fb.com or on LKML during? 4. Is there a link to the vote or any related informations? 5. You mean with "April 1st" since today respectively yesterday LKML will be stopped to function? 6. And with "eco system" you mean that a majority of the Linux maintainers want to use fb.com for the development of Linux? Honestly, I do not find this funny somehow, but a little curious. Best regards C. Stroetmann -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] New Linux Patch Review Group
On Tue, 01.April.2014 17:55, Felipe Balbi wrote: On Tue, Apr 01, 2014 at 11:40:16AM -0400, Chris Mason wrote: On 04/01/2014 11:16 AM, Boaz Harrosh wrote: On 04/01/2014 05:41 PM, Chris Mason wrote: Hello everyone, During last week's Collab summit, Jon Corbet suggested we use the power of social media to improve the Linux kernel patch review process. We thought this was a great idea, and have been experimenting with a new Facebook group dedicated to patch discussion and review. The new group provides a dramatically improved development workflow, including: * One click patch or comment approval * Comments enhanced with pictures and video * Who has seen your patches and comments * Searchable index of past submissions * A strong community without anonymous flames To help capture the group discussion in the final patch submission, we suggest adding a Liked-by: tag to commits that have been through group review. To use the new group, please join: https://www.facebook.com/groups/linuxpatches/ Once you've joined, you can post patches in the group, or email patches to linuxpatc...@groups.facebook.com -chris NACK! I do not have facebook and I do not like patches to be discussed behind my back. On the mailing list we don't even want HTML with bold lettered words so no thanks facebook adds nothing Please obliterate this bad idea. (And I do not have Facebook shares or care to) It's always hard to move on to new technologies. But at some point we have to recognize that the internet has developed a rich culture that the kernel community isn't taking full advantage of. I certainly don't expect everyone to convert right away, but there's a whole world out there beyond port 25. Agreed Acked-by: Felipe Balbi We might even be able to "recruit" a much more diverse group of reviewers who are undiscovered as of now ;-) Sorry, but definitely: Nack!!! First of all, I do not think that it will truely support the whole (business) process of code development and review. Also, I do not think that it will attract more developers and reviewers. But if you really think that your arguments are right, then why do not use a solution that is based on the same social concepts like for example Linux itself and set up an own social network or channel for the whole Linux "eco-system" respectively community by using for example Friendica [1], Diaspora [2], StatusNet [3], pump.io [4], or what else fits. Have fun Christian Stroetmannn [1] Friendica www.friendica.com [2] Diaspora diasporafoundation.org [3] StatusNet www.status.net [4] pump.io www.pump.io -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] New Linux Patch Review Group
On Tue, 01.April.2014 17:55, Felipe Balbi wrote: On Tue, Apr 01, 2014 at 11:40:16AM -0400, Chris Mason wrote: On 04/01/2014 11:16 AM, Boaz Harrosh wrote: On 04/01/2014 05:41 PM, Chris Mason wrote: Hello everyone, During last week's Collab summit, Jon Corbet suggested we use the power of social media to improve the Linux kernel patch review process. We thought this was a great idea, and have been experimenting with a new Facebook group dedicated to patch discussion and review. The new group provides a dramatically improved development workflow, including: * One click patch or comment approval * Comments enhanced with pictures and video * Who has seen your patches and comments * Searchable index of past submissions * A strong community without anonymous flames To help capture the group discussion in the final patch submission, we suggest adding a Liked-by: tag to commits that have been through group review. To use the new group, please join: https://www.facebook.com/groups/linuxpatches/ Once you've joined, you can post patches in the group, or email patches to linuxpatc...@groups.facebook.com -chris NACK! I do not have facebook and I do not like patches to be discussed behind my back. On the mailing list we don't even want HTML with bold lettered words so no thanks facebook adds nothing Please obliterate this bad idea. (And I do not have Facebook shares or care to) It's always hard to move on to new technologies. But at some point we have to recognize that the internet has developed a rich culture that the kernel community isn't taking full advantage of. I certainly don't expect everyone to convert right away, but there's a whole world out there beyond port 25. Agreed Acked-by: Felipe Balbiba...@ti.com We might even be able to recruit a much more diverse group of reviewers who are undiscovered as of now ;-) Sorry, but definitely: Nack!!! First of all, I do not think that it will truely support the whole (business) process of code development and review. Also, I do not think that it will attract more developers and reviewers. But if you really think that your arguments are right, then why do not use a solution that is based on the same social concepts like for example Linux itself and set up an own social network or channel for the whole Linux eco-system respectively community by using for example Friendica [1], Diaspora [2], StatusNet [3], pump.io [4], or what else fits. Have fun Christian Stroetmannn [1] Friendica www.friendica.com [2] Diaspora diasporafoundation.org [3] StatusNet www.status.net [4] pump.io www.pump.io -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] New Linux Patch Review Group
Am 02.04.2014 00:02, schrieb Richard Weinberger: On Tue, Apr 1, 2014 at 11:33 PM, Christian Stroetmann stroetm...@ontolinux.com wrote: On Tue, 01.April.2014 17:55, Felipe Balbi wrote: On Tue, Apr 01, 2014 at 11:40:16AM -0400, Chris Mason wrote: On 04/01/2014 11:16 AM, Boaz Harrosh wrote: On 04/01/2014 05:41 PM, Chris Mason wrote: Hello everyone, During last week's Collab summit, Jon Corbet suggested we use the power of social media to improve the Linux kernel patch review process. We thought this was a great idea, and have been experimenting with a new Facebook group dedicated to patch discussion and review. The new group provides a dramatically improved development workflow, including: * One click patch or comment approval * Comments enhanced with pictures and video * Who has seen your patches and comments * Searchable index of past submissions * A strong community without anonymous flames To help capture the group discussion in the final patch submission, we suggest adding a Liked-by: tag to commits that have been through group review. To use the new group, please join: https://www.facebook.com/groups/linuxpatches/ Once you've joined, you can post patches in the group, or email patches to linuxpatc...@groups.facebook.com -chris NACK! I do not have facebook and I do not like patches to be discussed behind my back. On the mailing list we don't even want HTML with bold lettered words so no thanks facebook adds nothing Please obliterate this bad idea. (And I do not have Facebook shares or care to) It's always hard to move on to new technologies. But at some point we have to recognize that the internet has developed a rich culture that the kernel community isn't taking full advantage of. I certainly don't expect everyone to convert right away, but there's a whole world out there beyond port 25. Agreed Acked-by: Felipe Balbiba...@ti.com We might even be able to recruit a much more diverse group of reviewers who are undiscovered as of now ;-) Sorry, but definitely: Nack!!! We (the majority of the Linux maintainers) voted already on fb.com to make it out primary developing eco system and will abandon LKML starting with April 1st. Please, allow me to ask some questions: 1. It was proposed During last week's Collab summit and now it is decided already? 2. Do you have a link to the discussion on LKML or somewhere else? 3. Did you voted on fb.com or on LKML during? 4. Is there a link to the vote or any related informations? 5. You mean with April 1st since today respectively yesterday LKML will be stopped to function? 6. And with eco system you mean that a majority of the Linux maintainers want to use fb.com for the development of Linux? Honestly, I do not find this funny somehow, but a little curious. Best regards C. Stroetmann -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] New Linux Patch Review Group
Am 02.04.2014 00:20, schrieb Christian Stroetmann: Am 02.04.2014 00:02, schrieb Richard Weinberger: On Tue, Apr 1, 2014 at 11:33 PM, Christian Stroetmann stroetm...@ontolinux.com wrote: On Tue, 01.April.2014 17:55, Felipe Balbi wrote: On Tue, Apr 01, 2014 at 11:40:16AM -0400, Chris Mason wrote: On 04/01/2014 11:16 AM, Boaz Harrosh wrote: On 04/01/2014 05:41 PM, Chris Mason wrote: Hello everyone, During last week's Collab summit, Jon Corbet suggested we use the power of social media to improve the Linux kernel patch review process. We thought this was a great idea, and have been experimenting with a new Facebook group dedicated to patch discussion and review. The new group provides a dramatically improved development workflow, including: * One click patch or comment approval * Comments enhanced with pictures and video * Who has seen your patches and comments * Searchable index of past submissions * A strong community without anonymous flames To help capture the group discussion in the final patch submission, we suggest adding a Liked-by: tag to commits that have been through group review. To use the new group, please join: https://www.facebook.com/groups/linuxpatches/ Once you've joined, you can post patches in the group, or email patches to linuxpatc...@groups.facebook.com -chris NACK! I do not have facebook and I do not like patches to be discussed behind my back. On the mailing list we don't even want HTML with bold lettered words so no thanks facebook adds nothing Please obliterate this bad idea. (And I do not have Facebook shares or care to) It's always hard to move on to new technologies. But at some point we have to recognize that the internet has developed a rich culture that the kernel community isn't taking full advantage of. I certainly don't expect everyone to convert right away, but there's a whole world out there beyond port 25. Agreed Acked-by: Felipe Balbiba...@ti.com We might even be able to recruit a much more diverse group of reviewers who are undiscovered as of now ;-) Sorry, but definitely: Nack!!! We (the majority of the Linux maintainers) voted already on fb.com to make it out primary developing eco system and will abandon LKML starting with April 1st. Please, allow me to ask some questions: 1. It was proposed During last week's Collab summit and now it is decided already? 2. Do you have a link to the discussion on LKML or somewhere else? 3. Did you voted on fb.com or on LKML during? 4. Is there a link to the vote or any related informations? 5. You mean with April 1st since today respectively yesterday LKML will be stopped to function? 6. And with eco system you mean that a majority of the Linux maintainers want to use fb.com for the development of Linux? Honestly, I do not find this funny somehow, but a little curious. Best regards C. Stroetmann If it was an April fool hoax, then I have to applaud Chris Mason and the others a lot, get the post address for the beer, thank everybody for telling me that there is a world besides the monitor, and beg your pardon for my seriousness. Now, it seems to be that I have to learn how to get my fb.com account until April 2015. Hopefully, I can find a helping hand for this modern technology. Nevertheless, my suggestion to set such a platfrom up with an open source framework was really meant in this way. Good luck Christian Stroetmann -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Dear Mr. Richard Weinberger: Thank you very much for the reminder and the prove again that a profound discussion seems not to be possible. Even more important is the point that the discussion related with the ReiserFS was different than this discussion, because this time I have not presented the LogHashFS to the open source community, but another person has taken copyright descriptions from my websites and wanted to make it an open source project and this even by the support of another company, which by the way has its very own business strategy. Now, other and I have heard what we wanted: In the moment you have no more arguments you become offensive and begin to mob and to intirigue. Besides this, ReiserFS is virtually dead Furthermore, that journalist from the Linux Magazin said it due to other political and economical reasons in the B.R.D. as well and most potentially did never something that is important for the open source community. Sooner or later he will get a letter from my attorney for this offensive with the demand to beg for pardon publicly in the ReiserFS and Tux3 mailing lists. Said this, I will not sent any e-mails to this Chuck Nonsense thread anymore. It was a mistake at all to try it again. Sincerely Christian Stroetmann Let's do the same as in 2009[1] and finish this thread. [1] http://www.spinics.net/lists/reiserfs-devel/msg01543.html -- Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Dear Mr. Andreas Karlsson: Thank you for the quote. If we really want to bring this down to the lowest level, then an idea is not copyrighted per se, for sure. But, what is the problem? That I have used the wrong word? The context was clear and hence the meaning of the word "ideas". So, simply add the word "described", "presented", or "publicated" before the word "ideas" or substitute the word "ideas" with the term "technical descriptions given in textual form". Or is the problem that the idea has been publicated and everybody is now free to take it? The answer is: yes and no. Yes: Everybody can implement an idea, concept, system, or method of operation even if it is given with a copyrighted representation, and give it away or even sell it, because in the latter case it is not patented. But how should somebody give something away or sell something without giving a description about what it is? This leads to the other case. No: What you and many other persons might misunderstand is the fact that if an idea is copyrighted by being represented in a textual, visual, or other form, it even does not matter at all if somebody else takes another text or image as long as the sense/the idea described in this other way is still the same as described in the original text or image, because it does not matter in which way the copyrighted thing is communicated. As a good example take a melody publicated with written notes. It does not matter on which instrument you play it or in which style you sing it. It is still the same melody. Or take a written script of a movie that is narrated from the view of the main protagonist in the original plot and from the view of another actor or a voice in the back in a copied plot. In all cases it is still the same copyrighted idea/story. Said this, every other description or documentation that uses different terms, a source code, even a visual graphic, or whatever representation of a file system that has the characteristical features of my log-structured hashing based data storage system with one or more of the optional features like for example consistent hashing, different data structures on the physical storage system and in the (virtual) memory, finger table, logged finger or hash tables in memory, and ACID properties (see again the given links), as it is the case with the latest description of the Tux3 FS with Shardmap, transports/communicates the copyrighted description of the same idea/concept/system in large parts or even as a whole somehow. And simply coding and compiling it does not help as well due to the many possibilities of re-engineering. Regards Christian Stroetmann Hi, I assume it is serious since ideas cannot be copyrighted in most (or maybe even all) countries. From the FAQ of the U.S. Copyright Office [1]: "Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed." "How do I protect my idea? Copyright does not protect ideas, concepts, systems, or methods of doing something. You may express your ideas in writing or drawings and claim copyright in your description, but be aware that copyright will not protect the idea itself as revealed in your written or artistic work." Andreas [1] http://www.copyright.gov/help/faq/faq-protect.html#what_protect On 06/24/2013 05:16 PM, Christian Stroetmann wrote: Dear Mr. Pavel Machek: Is this a serious comment? Nevertheless, this is a copyrighted idea [1]. Sincerely Christian Stroetmann [1] Log-Structured Hash-based File System (LogHashFS or LHFS; www.ontonics.com/innovation/pipeline.htm#loghashfs) Hi! At first you came up with a file system that can handle a great many/billions files and has ACID feature, which are both features of my Ontologic File System (OntoFS; see [1]). Both were said to be a no-go at that time (around 2007 and 2008). Then you came up, with my concept of a log-structured hashing based file system [2] and [3], presented it as your invention yesterday [4], and even integrated it with your Tux3 file system that already has or should have the said features of my OntoFS. I only waited for this step by somebody strongly connected with the company Samsung since the October 2012. AIso, I do think that both steps are very clear signs that shows what is going on behind the curtain. And now your are so bold and please me that I should credit these ideas in the sense of crediting your ideas. For sure, I always do claim for copyright of my ideas, and the true question is if you are allowed to implement them at all. In this conjunction, I would give Fortunately, you can't copyright ideas. Chuck Norris managed to do it once, but you can't. Pavel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Dear Mr. Pavel Machek: Is this a serious comment? Nevertheless, this is a copyrighted idea [1]. Sincerely Christian Stroetmann [1] Log-Structured Hash-based File System (LogHashFS or LHFS; www.ontonics.com/innovation/pipeline.htm#loghashfs) Hi! At first you came up with a file system that can handle a great many/billions files and has ACID feature, which are both features of my Ontologic File System (OntoFS; see [1]). Both were said to be a no-go at that time (around 2007 and 2008). Then you came up, with my concept of a log-structured hashing based file system [2] and [3], presented it as your invention yesterday [4], and even integrated it with your Tux3 file system that already has or should have the said features of my OntoFS. I only waited for this step by somebody strongly connected with the company Samsung since the October 2012. AIso, I do think that both steps are very clear signs that shows what is going on behind the curtain. And now your are so bold and please me that I should credit these ideas in the sense of crediting your ideas. For sure, I always do claim for copyright of my ideas, and the true question is if you are allowed to implement them at all. In this conjunction, I would give Fortunately, you can't copyright ideas. Chuck Norris managed to do it once, but you can't. Pavel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Dear Mr. Pavel Machek: Is this a serious comment? Nevertheless, this is a copyrighted idea [1]. Sincerely Christian Stroetmann [1] Log-Structured Hash-based File System (LogHashFS or LHFS; www.ontonics.com/innovation/pipeline.htm#loghashfs) Hi! At first you came up with a file system that can handle a great many/billions files and has ACID feature, which are both features of my Ontologic File System (OntoFS; see [1]). Both were said to be a no-go at that time (around 2007 and 2008). Then you came up, with my concept of a log-structured hashing based file system [2] and [3], presented it as your invention yesterday [4], and even integrated it with your Tux3 file system that already has or should have the said features of my OntoFS. I only waited for this step by somebody strongly connected with the company Samsung since the October 2012. AIso, I do think that both steps are very clear signs that shows what is going on behind the curtain. And now your are so bold and please me that I should credit these ideas in the sense of crediting your ideas. For sure, I always do claim for copyright of my ideas, and the true question is if you are allowed to implement them at all. In this conjunction, I would give Fortunately, you can't copyright ideas. Chuck Norris managed to do it once, but you can't. Pavel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Dear Mr. Andreas Karlsson: Thank you for the quote. If we really want to bring this down to the lowest level, then an idea is not copyrighted per se, for sure. But, what is the problem? That I have used the wrong word? The context was clear and hence the meaning of the word ideas. So, simply add the word described, presented, or publicated before the word ideas or substitute the word ideas with the term technical descriptions given in textual form. Or is the problem that the idea has been publicated and everybody is now free to take it? The answer is: yes and no. Yes: Everybody can implement an idea, concept, system, or method of operation even if it is given with a copyrighted representation, and give it away or even sell it, because in the latter case it is not patented. But how should somebody give something away or sell something without giving a description about what it is? This leads to the other case. No: What you and many other persons might misunderstand is the fact that if an idea is copyrighted by being represented in a textual, visual, or other form, it even does not matter at all if somebody else takes another text or image as long as the sense/the idea described in this other way is still the same as described in the original text or image, because it does not matter in which way the copyrighted thing is communicated. As a good example take a melody publicated with written notes. It does not matter on which instrument you play it or in which style you sing it. It is still the same melody. Or take a written script of a movie that is narrated from the view of the main protagonist in the original plot and from the view of another actor or a voice in the back in a copied plot. In all cases it is still the same copyrighted idea/story. Said this, every other description or documentation that uses different terms, a source code, even a visual graphic, or whatever representation of a file system that has the characteristical features of my log-structured hashing based data storage system with one or more of the optional features like for example consistent hashing, different data structures on the physical storage system and in the (virtual) memory, finger table, logged finger or hash tables in memory, and ACID properties (see again the given links), as it is the case with the latest description of the Tux3 FS with Shardmap, transports/communicates the copyrighted description of the same idea/concept/system in large parts or even as a whole somehow. And simply coding and compiling it does not help as well due to the many possibilities of re-engineering. Regards Christian Stroetmann Hi, I assume it is serious since ideas cannot be copyrighted in most (or maybe even all) countries. From the FAQ of the U.S. Copyright Office [1]: Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed. How do I protect my idea? Copyright does not protect ideas, concepts, systems, or methods of doing something. You may express your ideas in writing or drawings and claim copyright in your description, but be aware that copyright will not protect the idea itself as revealed in your written or artistic work. Andreas [1] http://www.copyright.gov/help/faq/faq-protect.html#what_protect On 06/24/2013 05:16 PM, Christian Stroetmann wrote: Dear Mr. Pavel Machek: Is this a serious comment? Nevertheless, this is a copyrighted idea [1]. Sincerely Christian Stroetmann [1] Log-Structured Hash-based File System (LogHashFS or LHFS; www.ontonics.com/innovation/pipeline.htm#loghashfs) Hi! At first you came up with a file system that can handle a great many/billions files and has ACID feature, which are both features of my Ontologic File System (OntoFS; see [1]). Both were said to be a no-go at that time (around 2007 and 2008). Then you came up, with my concept of a log-structured hashing based file system [2] and [3], presented it as your invention yesterday [4], and even integrated it with your Tux3 file system that already has or should have the said features of my OntoFS. I only waited for this step by somebody strongly connected with the company Samsung since the October 2012. AIso, I do think that both steps are very clear signs that shows what is going on behind the curtain. And now your are so bold and please me that I should credit these ideas in the sense of crediting your ideas. For sure, I always do claim for copyright of my ideas, and the true question is if you are allowed to implement them at all. In this conjunction, I would give Fortunately, you can't copyright ideas. Chuck Norris managed to do it once, but you can't. Pavel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Dear Mr. Richard Weinberger: Thank you very much for the reminder and the prove again that a profound discussion seems not to be possible. Even more important is the point that the discussion related with the ReiserFS was different than this discussion, because this time I have not presented the LogHashFS to the open source community, but another person has taken copyright descriptions from my websites and wanted to make it an open source project and this even by the support of another company, which by the way has its very own business strategy. Now, other and I have heard what we wanted: In the moment you have no more arguments you become offensive and begin to mob and to intirigue. Besides this, ReiserFS is virtually dead Furthermore, that journalist from the Linux Magazin said it due to other political and economical reasons in the B.R.D. as well and most potentially did never something that is important for the open source community. Sooner or later he will get a letter from my attorney for this offensive with the demand to beg for pardon publicly in the ReiserFS and Tux3 mailing lists. Said this, I will not sent any e-mails to this Chuck Nonsense thread anymore. It was a mistake at all to try it again. Sincerely Christian Stroetmann Let's do the same as in 2009[1] and finish this thread. [1] http://www.spinics.net/lists/reiserfs-devel/msg01543.html -- Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Hello Mr. Daniel Philips, I'm sorry to say so, but your are really a funny person. At first you came up with a file system that can handle a great many/billions files and has ACID feature, which are both features of my Ontologic File System (OntoFS; see [1]). Both were said to be a no-go at that time (around 2007 and 2008). Then you came up, with my concept of a log-structured hashing based file system [2] and [3], presented it as your invention yesterday [4], and even integrated it with your Tux3 file system that already has or should have the said features of my OntoFS. I only waited for this step by somebody strongly connected with the company Samsung since the October 2012. AIso, I do think that both steps are very clear signs that shows what is going on behind the curtain. And now your are so bold and please me that I should credit these ideas in the sense of crediting your ideas. For sure, I always do claim for copyright of my ideas, and the true question is if you are allowed to implement them at all. In this conjunction, I would give the other mailing list members the information as well, that I do not need something technical from you at all, that has to be credited. I only meant it as part of a broad hint and for lowering the noise on the mailing list. Besides this, your permanent marketing by using speech acts from my websites is annoying, as it is the case with playing here the unknown now. Furthermore, you already were given the count with your last screwed test and you have nothing better to do than to come up with my log-structured hashing based file system and again a marketing story. I really have to ask the question: Who do you want to kid? Who do you want to provoke? Who do you want to mislead? Also, I truely thought that the broad hints given some weeks ago and yesterday again would be clear enough, and I still think so respectively that you really got the issue. But if as a matter of fact this might be not the case, I simply say it directly without any decorating flowers: 1. Stop copying my intellectual properties related with file systems and implementing them. You always came several months too late and I am not interested to let it become a running gag, definitely. 2. Stop marketing my ideas, especially in a way that confuses the public about the true origin even further. I am already marketing them on my own. 3. Give credits to my intellectual properties in any case, even if you make a derivation, and take care about the correct licensing. [1] OntoFS (www.ontolinux.com/technology/ontofs.htm) [2] SASOS4Fun (www.ontonics.com/innovation/pipeline.htm#sasos4fun) Do not confuse SIP with SipHash, but put SipHash in relation with "the size of [a] hash table [is determined] by sampling the input", as we understood the description. [3] Ontonics, OntoLab, and OntoLinux Further steps (www.ontomax.com/newsarchive/2012/october.htm#08.October.2012) [4] Meet Shardmap, the designated successor of HTree (lkml.org/lkml/2013/6/18/869) Btw. 1: Firstly, Daniel Philips had no CC list at all with his initial e-mail. Secondly, the issue with the CC list on my side was a mistake on the one hand (forgot to push the CC button) and a part of the last broad hint on the other hand. Btw. 2: Are you Google and now Samsung or both? Sincerely Christian Stroetmann Hi Christian, You are welcome, and I hope that your project can make good use of this technology. Please do credit your sources if you use these ideas, and please keep the CC list intact in further replies. What is the scale of your application, that is, how many index entries do you expect? Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Hello Mr. Daniel Philips, I'm sorry to say so, but your are really a funny person. At first you came up with a file system that can handle a great many/billions files and has ACID feature, which are both features of my Ontologic File System (OntoFS; see [1]). Both were said to be a no-go at that time (around 2007 and 2008). Then you came up, with my concept of a log-structured hashing based file system [2] and [3], presented it as your invention yesterday [4], and even integrated it with your Tux3 file system that already has or should have the said features of my OntoFS. I only waited for this step by somebody strongly connected with the company Samsung since the October 2012. AIso, I do think that both steps are very clear signs that shows what is going on behind the curtain. And now your are so bold and please me that I should credit these ideas in the sense of crediting your ideas. For sure, I always do claim for copyright of my ideas, and the true question is if you are allowed to implement them at all. In this conjunction, I would give the other mailing list members the information as well, that I do not need something technical from you at all, that has to be credited. I only meant it as part of a broad hint and for lowering the noise on the mailing list. Besides this, your permanent marketing by using speech acts from my websites is annoying, as it is the case with playing here the unknown now. Furthermore, you already were given the count with your last screwed test and you have nothing better to do than to come up with my log-structured hashing based file system and again a marketing story. I really have to ask the question: Who do you want to kid? Who do you want to provoke? Who do you want to mislead? Also, I truely thought that the broad hints given some weeks ago and yesterday again would be clear enough, and I still think so respectively that you really got the issue. But if as a matter of fact this might be not the case, I simply say it directly without any decorating flowers: 1. Stop copying my intellectual properties related with file systems and implementing them. You always came several months too late and I am not interested to let it become a running gag, definitely. 2. Stop marketing my ideas, especially in a way that confuses the public about the true origin even further. I am already marketing them on my own. 3. Give credits to my intellectual properties in any case, even if you make a derivation, and take care about the correct licensing. [1] OntoFS (www.ontolinux.com/technology/ontofs.htm) [2] SASOS4Fun (www.ontonics.com/innovation/pipeline.htm#sasos4fun) Do not confuse SIP with SipHash, but put SipHash in relation with the size of [a] hash table [is determined] by sampling the input, as we understood the description. [3] Ontonics, OntoLab, and OntoLinux Further steps (www.ontomax.com/newsarchive/2012/october.htm#08.October.2012) [4] Meet Shardmap, the designated successor of HTree (lkml.org/lkml/2013/6/18/869) Btw. 1: Firstly, Daniel Philips had no CC list at all with his initial e-mail. Secondly, the issue with the CC list on my side was a mistake on the one hand (forgot to push the CC button) and a part of the last broad hint on the other hand. Btw. 2: Are you Google and now Samsung or both? Sincerely Christian Stroetmann Hi Christian, You are welcome, and I hope that your project can make good use of this technology. Please do credit your sources if you use these ideas, and please keep the CC list intact in further replies. What is the scale of your application, that is, how many index entries do you expect? Regards, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Aloha everybody We would like to thank the developers very much for giving technical details about how we could implement our file system indexing (see [1] and [2]). [1] SASOS4Fun (www.ontonics.com/innovation/pipeline.htm#sasos4fun) Do not confuse SIP with SipHash, but put SipHash in relation with "the size of [a] hash table [is determined] by sampling the input", as we understood the description. [2] Ontonics, OntoLab, and OntoLinux Further steps (www.ontomax.com/newsarchive/2012/october.htm#08.October.2012) Sincerely Christian Stroetmann -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Meet Shardmap, the designated successor of HTree
Aloha everybody We would like to thank the developers very much for giving technical details about how we could implement our file system indexing (see [1] and [2]). [1] SASOS4Fun (www.ontonics.com/innovation/pipeline.htm#sasos4fun) Do not confuse SIP with SipHash, but put SipHash in relation with the size of [a] hash table [is determined] by sampling the input, as we understood the description. [2] Ontonics, OntoLab, and OntoLinux Further steps (www.ontomax.com/newsarchive/2012/october.htm#08.October.2012) Sincerely Christian Stroetmann -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Faster than tmpfs, what?
Aloha hardcore coders Thank you very much for working out the facts, Dave. You proved why I had all the years such a special suspicious feeling by reading between the lines of the Tux3 e-mails sent to the mailing-list, which should not mean that I do not like the work around the Tux3 file system in general. Quite contrary, it is highly interesting to watch if there are possibilites to bring the whole field further. But this kind of marketing seen in the past is truely not constructive but contemporary. Have fun in the sun Christian Stroetmann On Tue, May 07, 2013 at 04:24:05PM -0700, Daniel Phillips wrote: When something sounds to good to be true, it usually is. But not always. Today Hirofumi posted some nigh on unbelievable dbench results that show Tux3 beating tmpfs. To put this in perspective, we normally regard tmpfs as unbeatable because it is just a thin shim between the standard VFS mechanisms that every filesystem must use, and the swap device. Our usual definition of successful optimization is that we end up somewhere between Ext4 and Tmpfs, or in other words, faster than Ext4. This time we got an excellent surprise. The benchmark: dbench -t 30 -c client2.txt 1& (while true; do sync; sleep 4; done) I'm deeply suspicious of what is in that client2.txt file. dbench on ext4 on a 4 SSD RAID0 array with a single process gets 130MB/s (kernel is 3.9.0). Your workload gives you over 1GB/s on ext4. tux3: Operation CountAvgLatMaxLat NTCreateX1477980 0.00312.944 ReadX2316653 0.002 0.499 LockX 4812 0.002 0.207 UnlockX 4812 0.001 0.221 Throughput 1546.81 MB/sec 1 clients 1 procs max_latency=12.950 ms Hmmm... No "Flush" operations. Gotcha - you've removed the data integrity operations from the benchmark. Ah, I get it now - you've done that so the front end of tux3 won't encounter any blocking operations and so can offload 100% of operations. It also explains the sync call every 4 seconds to keep tux3 back end writing out to disk so that a) all the offloaded work is done by the sync process and not measured by the benchmark, and b) so the front end doesn't overrun queues and throttle or run out of memory. Oh, so nicely contrived. But terribly obvious now that I've found it. You've carefully crafted the benchmark to demonstrate a best case workload for the tux3 architecture, then carefully not measured the overhead of the work tux3 has offloaded, and then not disclosed any of this in the hope that all people will look at is the headline. This would make a great case study for a "BenchMarketing For Dummies" book. Shame for you that you sent it to a list where people see the dbench numbers for ext4 and immediately think "that's not right" and then look deeper. Phoronix might swallow your sensationalist headline grab without analysis, but I don't think I'm alone in my suspicion that there was something stinky about your numbers. Perhaps in future you'll disclose such information with your results, otherwise nobody is ever going to trust anything you say about tux3 Cheers, Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Tux3 Report: Faster than tmpfs, what?
Aloha hardcore coders Thank you very much for working out the facts, Dave. You proved why I had all the years such a special suspicious feeling by reading between the lines of the Tux3 e-mails sent to the mailing-list, which should not mean that I do not like the work around the Tux3 file system in general. Quite contrary, it is highly interesting to watch if there are possibilites to bring the whole field further. But this kind of marketing seen in the past is truely not constructive but contemporary. Have fun in the sun Christian Stroetmann On Tue, May 07, 2013 at 04:24:05PM -0700, Daniel Phillips wrote: When something sounds to good to be true, it usually is. But not always. Today Hirofumi posted some nigh on unbelievable dbench results that show Tux3 beating tmpfs. To put this in perspective, we normally regard tmpfs as unbeatable because it is just a thin shim between the standard VFS mechanisms that every filesystem must use, and the swap device. Our usual definition of successful optimization is that we end up somewhere between Ext4 and Tmpfs, or in other words, faster than Ext4. This time we got an excellent surprise. The benchmark: dbench -t 30 -c client2.txt 1 (while true; do sync; sleep 4; done) I'm deeply suspicious of what is in that client2.txt file. dbench on ext4 on a 4 SSD RAID0 array with a single process gets 130MB/s (kernel is 3.9.0). Your workload gives you over 1GB/s on ext4. tux3: Operation CountAvgLatMaxLat NTCreateX1477980 0.00312.944 ReadX2316653 0.002 0.499 LockX 4812 0.002 0.207 UnlockX 4812 0.001 0.221 Throughput 1546.81 MB/sec 1 clients 1 procs max_latency=12.950 ms Hmmm... No Flush operations. Gotcha - you've removed the data integrity operations from the benchmark. Ah, I get it now - you've done that so the front end of tux3 won't encounter any blocking operations and so can offload 100% of operations. It also explains the sync call every 4 seconds to keep tux3 back end writing out to disk so that a) all the offloaded work is done by the sync process and not measured by the benchmark, and b) so the front end doesn't overrun queues and throttle or run out of memory. Oh, so nicely contrived. But terribly obvious now that I've found it. You've carefully crafted the benchmark to demonstrate a best case workload for the tux3 architecture, then carefully not measured the overhead of the work tux3 has offloaded, and then not disclosed any of this in the hope that all people will look at is the headline. This would make a great case study for a BenchMarketing For Dummies book. Shame for you that you sent it to a list where people see the dbench numbers for ext4 and immediately think that's not right and then look deeper. Phoronix might swallow your sensationalist headline grab without analysis, but I don't think I'm alone in my suspicion that there was something stinky about your numbers. Perhaps in future you'll disclose such information with your results, otherwise nobody is ever going to trust anything you say about tux3 Cheers, Dave. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/