Update: Ext3 vs NTFS performance
> Subject: Ext3 vs NTFS performance > > Hello all, > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > video workloads. The Windows CIFS client will attempt a poor-man's > pre-allocation of the file on the server by sending 1-byte writes at > 128K-byte strides, breaking block allocation on ext3 and leading to > fragmentation and poor performance. This will happen for many > applications (including iTunes) as the CIFS client issues these > pre-allocates under the application layer. > > I've posted a brief paper on Intel's OSS website > (http://softwarecommunity.intel.com/articles/eng/1259.htm). > Please give > it a read and let me know what you think. In particular, I'd like to > arrive at the right place to fix this problem: is it in the > filesystem, > VFS, or Samba? > > thanks, > Mason > > (please CC responses to mason dot b dot cabot at intel dot com) > Folks: thanks for the comments from the initial posting of this note. We've looked further into the problem and found that Samba 3.0.20 or greater fills the performance gap for ext3: the "strict allocate" flag now zero fills the file, forcing allocation in the underlying filesystem and avoiding fragmentation. An update to the original whitepaper will be posted soon to the same location on Intel's OSS website. thanks, Mason (please CC responses to mason dot b dot cabot at intel dot com) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Update: Ext3 vs NTFS performance
Subject: Ext3 vs NTFS performance Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? thanks, Mason (please CC responses to mason dot b dot cabot at intel dot com) Folks: thanks for the comments from the initial posting of this note. We've looked further into the problem and found that Samba 3.0.20 or greater fills the performance gap for ext3: the strict allocate flag now zero fills the file, forcing allocation in the underlying filesystem and avoiding fragmentation. An update to the original whitepaper will be posted soon to the same location on Intel's OSS website. thanks, Mason (please CC responses to mason dot b dot cabot at intel dot com) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Cabot, Mason B wrote: Philip: the best response I can offer is that we have traced the application's file system accesses and seen no such one-byte writes occuring at that level. They are generated somewhere below the application. Additionally, while we have observed iTunes on Windows issuing these one-byte writes, ethereal traces for iTunes on Mac OSX show no such behavior. Because of these observations I think it is reasonable to conclude that the Windows CIFS client is generating the one-byte writes. Can you duplicate this behavior with a very simple test program, rather than iTunes? Will something as simple as open() and write() with a 32 KB buffer of random data in a loop cause this behavior? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Jörn Engel <[EMAIL PROTECTED]> wrote: > On Fri, 4 May 2007 10:46:10 +0100, Christoph Hellwig wrote: >> Which means the right place to fix this is samba. Samba just need >> to intersept lseek and pread/pwrite to never allocate sparse files >> but do the right thing instead. Now what the right thing would probably >> be a preallocate instead of writing zeroes, and we need to provide the >> infrastructure for them to do it, which is in progress currently. > > Why do preallocate and not just truncate the file? If it's done by samba, it's racy. Only the kernel can reliably tell a write-beyond-eof from a write-before-eof. Either it should unconditionally turn these preallocation-writes into truncates, or have a flag which will turn this feature on and which can be used to turn the lseek into a real preallocation call. I don't think unconditionally turning these writes into truncate would be good, it would change the behaviour of dd bs=1 count=$(($n*$BLOCKSIZE+1)). -- Top 100 things you don't want the sysadmin to say: 17. dd if=/dev/null of=/vmunix Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Jörn Engel [EMAIL PROTECTED] wrote: On Fri, 4 May 2007 10:46:10 +0100, Christoph Hellwig wrote: Which means the right place to fix this is samba. Samba just need to intersept lseek and pread/pwrite to never allocate sparse files but do the right thing instead. Now what the right thing would probably be a preallocate instead of writing zeroes, and we need to provide the infrastructure for them to do it, which is in progress currently. Why do preallocate and not just truncate the file? If it's done by samba, it's racy. Only the kernel can reliably tell a write-beyond-eof from a write-before-eof. Either it should unconditionally turn these preallocation-writes into truncates, or have a flag which will turn this feature on and which can be used to turn the lseek into a real preallocation call. I don't think unconditionally turning these writes into truncate would be good, it would change the behaviour of dd bs=1 count=$(($n*$BLOCKSIZE+1)). -- Top 100 things you don't want the sysadmin to say: 17. dd if=/dev/null of=/vmunix Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Cabot, Mason B wrote: Philip: the best response I can offer is that we have traced the application's file system accesses and seen no such one-byte writes occuring at that level. They are generated somewhere below the application. Additionally, while we have observed iTunes on Windows issuing these one-byte writes, ethereal traces for iTunes on Mac OSX show no such behavior. Because of these observations I think it is reasonable to conclude that the Windows CIFS client is generating the one-byte writes. Can you duplicate this behavior with a very simple test program, rather than iTunes? Will something as simple as open() and write() with a 32 KB buffer of random data in a loop cause this behavior? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, 4 May 2007 10:46:10 +0100, Christoph Hellwig wrote: > > Which means the right place to fix this is samba. Samba just need > to intersept lseek and pread/pwrite to never allocate sparse files > but do the right thing instead. Now what the right thing would probably > be a preallocate instead of writing zeroes, and we need to provide the > infrastructure for them to do it, which is in progress currently. Why do preallocate and not just truncate the file? If the write is a single 0x00 somewhere beyond EOF, as appears to be the pattern, truncate will do just as well if not better. And it is available now. Jörn -- Joern's library part 6: http://www.gzip.org/zlib/feldspar.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, 4 May 2007 10:46:10 +0100, Christoph Hellwig wrote: Which means the right place to fix this is samba. Samba just need to intersept lseek and pread/pwrite to never allocate sparse files but do the right thing instead. Now what the right thing would probably be a preallocate instead of writing zeroes, and we need to provide the infrastructure for them to do it, which is in progress currently. Why do preallocate and not just truncate the file? If the write is a single 0x00 somewhere beyond EOF, as appears to be the pattern, truncate will do just as well if not better. And it is available now. Jörn -- Joern's library part 6: http://www.gzip.org/zlib/feldspar.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
2007/5/6, Bodo Eggert <[EMAIL PROTECTED]>: Theodore Tso <[EMAIL PROTECTED]> wrote: > But as has already been discussed on this thread, in situations where > the fileserver is under high memory pressure, any filesystem (XFS or > ext4) would still end up allocating blocks out of order, resulting in > fragmentation. Explicit preallocation, as opposed to delayed > allocation, is really the best long-term solution; and in order to do > that, Samba needs to detect this scenario --- which as has been noted, > there appears to be no good reason for the Windows CIFS client (or any > other application)to be doing this, other than perhaps to deliberate > trigger a worst case allocation pattern in ext3 --- and translate it > into a explicit preallocation request. There is an interface to tell the kernel about the way the file will be accessed. IMO this interface should be used to do the preallocation, too. The other question is: How to tell the poor-bill's preallocation from a very clever application that communicates with another application and which is supposed to zero out that exact byte from the data the other application sent. I was tempted to say "just let samba cache these calls", but it would be wrong. You'll need magic in the kernel to DTRT. There are three correct ways of handling these one-zerobyte-writes after EOF: 1) Extend the file like truncate 2) Extend the file like write() (current behaviour) 3) Preallocate these blocks (to be implemented) 4) Write all zeroes (current behaviour for FAT) (2) will cause bad allocations, it's obviously worse than (1). (3) would be better than (1) and (2), but only xfs(?) and ext4 will support this in the near future. (4) should double the write time, but give the best possible read speed. According to [1], the expected read speed is about as high as (1) gives, "playback performance improves to expected levels". If preallocation does not seem to make a big difference, I don't think we should do (4) as a replacement untill the filesystem does support real preallocations. I suggest: 1) Make samba use fadvise(MIGHT_PREALLOCATE) 2) Make the kernel turn these 1-byte-writes-after-EOF into truncates on MIGHT_PREALLOCATE, and possibly turn off MIGHT_PREALLOCATE on other read/writes 3) Make the kernel fadvise(PREALLOCATE, $filesize) on MIGHT_PREALLOCATE + lseek(0), turning off the MIGHT_PREALLOCATE Possibly it might also turn on FADV_SEQUENTIAL. 4) Make the filesystems optionally preallocate the desired area, or ignore fadvise(PREALLOCATE, $filesize) instead. [1] http://softwarecommunity.intel.com/articles/eng/1259.htm -- It is still called paranoia when they really are out to get you. Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] So it would be possible, that "Explicit Preallocation" + "Delayed Allocation" + (some other technology) would minimize file-system fragmentation. And further more, massive fragments of large downloads may could be solved by "Explicit Preallocation" too. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Andrew Morton writes: "Cabot, Mason B" <[EMAIL PROTECTED]> wrote: I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. Oh my gawd, what a stupid hack. Now we know what the MS interoperability lab has been working on. Stupid or not, this is their protocol. The cifs filesystem driver needs a patch to do this. Probably that'll help get better performance when Linux is writing to a Windows server. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Theodore Tso <[EMAIL PROTECTED]> wrote: > But as has already been discussed on this thread, in situations where > the fileserver is under high memory pressure, any filesystem (XFS or > ext4) would still end up allocating blocks out of order, resulting in > fragmentation. Explicit preallocation, as opposed to delayed > allocation, is really the best long-term solution; and in order to do > that, Samba needs to detect this scenario --- which as has been noted, > there appears to be no good reason for the Windows CIFS client (or any > other application)to be doing this, other than perhaps to deliberate > trigger a worst case allocation pattern in ext3 --- and translate it > into a explicit preallocation request. There is an interface to tell the kernel about the way the file will be accessed. IMO this interface should be used to do the preallocation, too. The other question is: How to tell the poor-bill's preallocation from a very clever application that communicates with another application and which is supposed to zero out that exact byte from the data the other application sent. I was tempted to say "just let samba cache these calls", but it would be wrong. You'll need magic in the kernel to DTRT. There are three correct ways of handling these one-zerobyte-writes after EOF: 1) Extend the file like truncate 2) Extend the file like write() (current behaviour) 3) Preallocate these blocks (to be implemented) 4) Write all zeroes (current behaviour for FAT) (2) will cause bad allocations, it's obviously worse than (1). (3) would be better than (1) and (2), but only xfs(?) and ext4 will support this in the near future. (4) should double the write time, but give the best possible read speed. According to [1], the expected read speed is about as high as (1) gives, "playback performance improves to expected levels". If preallocation does not seem to make a big difference, I don't think we should do (4) as a replacement untill the filesystem does support real preallocations. I suggest: 1) Make samba use fadvise(MIGHT_PREALLOCATE) 2) Make the kernel turn these 1-byte-writes-after-EOF into truncates on MIGHT_PREALLOCATE, and possibly turn off MIGHT_PREALLOCATE on other read/writes 3) Make the kernel fadvise(PREALLOCATE, $filesize) on MIGHT_PREALLOCATE + lseek(0), turning off the MIGHT_PREALLOCATE Possibly it might also turn on FADV_SEQUENTIAL. 4) Make the filesystems optionally preallocate the desired area, or ignore fadvise(PREALLOCATE, $filesize) instead. [1] http://softwarecommunity.intel.com/articles/eng/1259.htm -- It is still called paranoia when they really are out to get you. Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 07:49:13PM +0400, Michael Tokarev wrote: > How about providing a way to stop kernel (or filesystem) to make gaps > in files instead? Like some ioctl(fd, FS_NOGAPS, 1) -- pretty much > like 'doze has, just the opposite (on windows, this flag is "on" by > default). Giving filesystems non-hole semantics is non-trivial. Not allowing for holes creates a lot of complications in unix-like filesystems. > But the main point is that samba has to keep track of things which it > doesn't do now, and those things becomes.. interesting (difficult if > at all possible to track) in multi-user/concurrent-writes environment. Samba is there to deal with a braindead protocol and braindead clients, so let it continue to do that. No need to push this into the kernel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Sat, May 05, 2007 at 11:13:36AM +0800, Xu CanHao wrote: > On 5 Mai, 10:20, Theodore Tso <[EMAIL PROTECTED]> wrote: > > > >This is being worked on already. XFS has a per-filesystem ioctl, but > >we want to create a filesystem-independent system call, > >sys_fallocate(), that would wired into the already existing > >posix_fallocate() function exported by glibc. > > The story told us: an application must look to the file-systems, ext3 > is good at aaa, is not good at bbb; XFS is good at ccc, is not good at > ddd; reiserfs is good at eee, is not good at fff > > For this scenario, XFS is good at dealing with fragmentation while ext3 not. That's true. XFS has the ability to do delayed allocations, so that the blocks don't get allocated until they are written out. Hence, a workload that writes a pattern which uses random access writes in strides of 128k, and then goes back to fill them in, will result in fragmentation given ext3's current block reservation allocation algorithm --- but, as long as the system isn't under high memory pressure, XFS will do better in this particular scenario. Actually, ext3 does have a block reservation system, which will prevent this scenario if the random access writes are within a range of 32k or so --- which is enough to protect against the bad effects of more common random access write patterns, such as those used when writing out ELF object files, for example. Increasing EXT3_DEFAULT_RESERVE_BLOCKS by a factor of 4 would adaopt the ext3 block reservation system to this pathalogical workload, and we could easily add a tunable mount option to change the reservation size used by ext3. Unfortunately, this could make fragmentation work for other workloads. So adding delayed allocation to ext4 is a better solution. But as has already been discussed on this thread, in situations where the fileserver is under high memory pressure, any filesystem (XFS or ext4) would still end up allocating blocks out of order, resulting in fragmentation. Explicit preallocation, as opposed to delayed allocation, is really the best long-term solution; and in order to do that, Samba needs to detect this scenario --- which as has been noted, there appears to be no good reason for the Windows CIFS client (or any other application)to be doing this, other than perhaps to deliberate trigger a worst case allocation pattern in ext3 --- and translate it into a explicit preallocation request. Regards, - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Sat, May 05, 2007 at 11:13:36AM +0800, Xu CanHao wrote: On 5 Mai, 10:20, Theodore Tso [EMAIL PROTECTED] wrote: This is being worked on already. XFS has a per-filesystem ioctl, but we want to create a filesystem-independent system call, sys_fallocate(), that would wired into the already existing posix_fallocate() function exported by glibc. The story told us: an application must look to the file-systems, ext3 is good at aaa, is not good at bbb; XFS is good at ccc, is not good at ddd; reiserfs is good at eee, is not good at fff For this scenario, XFS is good at dealing with fragmentation while ext3 not. That's true. XFS has the ability to do delayed allocations, so that the blocks don't get allocated until they are written out. Hence, a workload that writes a pattern which uses random access writes in strides of 128k, and then goes back to fill them in, will result in fragmentation given ext3's current block reservation allocation algorithm --- but, as long as the system isn't under high memory pressure, XFS will do better in this particular scenario. Actually, ext3 does have a block reservation system, which will prevent this scenario if the random access writes are within a range of 32k or so --- which is enough to protect against the bad effects of more common random access write patterns, such as those used when writing out ELF object files, for example. Increasing EXT3_DEFAULT_RESERVE_BLOCKS by a factor of 4 would adaopt the ext3 block reservation system to this pathalogical workload, and we could easily add a tunable mount option to change the reservation size used by ext3. Unfortunately, this could make fragmentation work for other workloads. So adding delayed allocation to ext4 is a better solution. But as has already been discussed on this thread, in situations where the fileserver is under high memory pressure, any filesystem (XFS or ext4) would still end up allocating blocks out of order, resulting in fragmentation. Explicit preallocation, as opposed to delayed allocation, is really the best long-term solution; and in order to do that, Samba needs to detect this scenario --- which as has been noted, there appears to be no good reason for the Windows CIFS client (or any other application)to be doing this, other than perhaps to deliberate trigger a worst case allocation pattern in ext3 --- and translate it into a explicit preallocation request. Regards, - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 07:49:13PM +0400, Michael Tokarev wrote: How about providing a way to stop kernel (or filesystem) to make gaps in files instead? Like some ioctl(fd, FS_NOGAPS, 1) -- pretty much like 'doze has, just the opposite (on windows, this flag is on by default). Giving filesystems non-hole semantics is non-trivial. Not allowing for holes creates a lot of complications in unix-like filesystems. But the main point is that samba has to keep track of things which it doesn't do now, and those things becomes.. interesting (difficult if at all possible to track) in multi-user/concurrent-writes environment. Samba is there to deal with a braindead protocol and braindead clients, so let it continue to do that. No need to push this into the kernel. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Theodore Tso [EMAIL PROTECTED] wrote: But as has already been discussed on this thread, in situations where the fileserver is under high memory pressure, any filesystem (XFS or ext4) would still end up allocating blocks out of order, resulting in fragmentation. Explicit preallocation, as opposed to delayed allocation, is really the best long-term solution; and in order to do that, Samba needs to detect this scenario --- which as has been noted, there appears to be no good reason for the Windows CIFS client (or any other application)to be doing this, other than perhaps to deliberate trigger a worst case allocation pattern in ext3 --- and translate it into a explicit preallocation request. There is an interface to tell the kernel about the way the file will be accessed. IMO this interface should be used to do the preallocation, too. The other question is: How to tell the poor-bill's preallocation from a very clever application that communicates with another application and which is supposed to zero out that exact byte from the data the other application sent. I was tempted to say just let samba cache these calls, but it would be wrong. You'll need magic in the kernel to DTRT. There are three correct ways of handling these one-zerobyte-writes after EOF: 1) Extend the file like truncate 2) Extend the file like write() (current behaviour) 3) Preallocate these blocks (to be implemented) 4) Write all zeroes (current behaviour for FAT) (2) will cause bad allocations, it's obviously worse than (1). (3) would be better than (1) and (2), but only xfs(?) and ext4 will support this in the near future. (4) should double the write time, but give the best possible read speed. According to [1], the expected read speed is about as high as (1) gives, playback performance improves to expected levels. If preallocation does not seem to make a big difference, I don't think we should do (4) as a replacement untill the filesystem does support real preallocations. I suggest: 1) Make samba use fadvise(MIGHT_PREALLOCATE) 2) Make the kernel turn these 1-byte-writes-after-EOF into truncates on MIGHT_PREALLOCATE, and possibly turn off MIGHT_PREALLOCATE on other read/writes 3) Make the kernel fadvise(PREALLOCATE, $filesize) on MIGHT_PREALLOCATE + lseek(0), turning off the MIGHT_PREALLOCATE Possibly it might also turn on FADV_SEQUENTIAL. 4) Make the filesystems optionally preallocate the desired area, or ignore fadvise(PREALLOCATE, $filesize) instead. [1] http://softwarecommunity.intel.com/articles/eng/1259.htm -- It is still called paranoia when they really are out to get you. Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Andrew Morton writes: Cabot, Mason B [EMAIL PROTECTED] wrote: I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. Oh my gawd, what a stupid hack. Now we know what the MS interoperability lab has been working on. Stupid or not, this is their protocol. The cifs filesystem driver needs a patch to do this. Probably that'll help get better performance when Linux is writing to a Windows server. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
2007/5/6, Bodo Eggert [EMAIL PROTECTED]: Theodore Tso [EMAIL PROTECTED] wrote: But as has already been discussed on this thread, in situations where the fileserver is under high memory pressure, any filesystem (XFS or ext4) would still end up allocating blocks out of order, resulting in fragmentation. Explicit preallocation, as opposed to delayed allocation, is really the best long-term solution; and in order to do that, Samba needs to detect this scenario --- which as has been noted, there appears to be no good reason for the Windows CIFS client (or any other application)to be doing this, other than perhaps to deliberate trigger a worst case allocation pattern in ext3 --- and translate it into a explicit preallocation request. There is an interface to tell the kernel about the way the file will be accessed. IMO this interface should be used to do the preallocation, too. The other question is: How to tell the poor-bill's preallocation from a very clever application that communicates with another application and which is supposed to zero out that exact byte from the data the other application sent. I was tempted to say just let samba cache these calls, but it would be wrong. You'll need magic in the kernel to DTRT. There are three correct ways of handling these one-zerobyte-writes after EOF: 1) Extend the file like truncate 2) Extend the file like write() (current behaviour) 3) Preallocate these blocks (to be implemented) 4) Write all zeroes (current behaviour for FAT) (2) will cause bad allocations, it's obviously worse than (1). (3) would be better than (1) and (2), but only xfs(?) and ext4 will support this in the near future. (4) should double the write time, but give the best possible read speed. According to [1], the expected read speed is about as high as (1) gives, playback performance improves to expected levels. If preallocation does not seem to make a big difference, I don't think we should do (4) as a replacement untill the filesystem does support real preallocations. I suggest: 1) Make samba use fadvise(MIGHT_PREALLOCATE) 2) Make the kernel turn these 1-byte-writes-after-EOF into truncates on MIGHT_PREALLOCATE, and possibly turn off MIGHT_PREALLOCATE on other read/writes 3) Make the kernel fadvise(PREALLOCATE, $filesize) on MIGHT_PREALLOCATE + lseek(0), turning off the MIGHT_PREALLOCATE Possibly it might also turn on FADV_SEQUENTIAL. 4) Make the filesystems optionally preallocate the desired area, or ignore fadvise(PREALLOCATE, $filesize) instead. [1] http://softwarecommunity.intel.com/articles/eng/1259.htm -- It is still called paranoia when they really are out to get you. Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] So it would be possible, that Explicit Preallocation + Delayed Allocation + (some other technology) would minimize file-system fragmentation. And further more, massive fragments of large downloads may could be solved by Explicit Preallocation too. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, 1 May 2007 13:43:18 -0700 "Cabot, Mason B" <[EMAIL PROTECTED]> wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. On 5 Mai, 10:20, Theodore Tso <[EMAIL PROTECTED]> wrote: This is being worked on already. XFS has a per-filesystem ioctl, but we want to create a filesystem-independent system call, sys_fallocate(), that would wired into the already existing posix_fallocate() function exported by glibc. The story told us: an application must look to the file-systems, ext3 is good at aaa, is not good at bbb; XFS is good at ccc, is not good at ddd; reiserfs is good at eee, is not good at fff For this scenario, XFS is good at dealing with fragmentation while ext3 not. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 07:49:13PM +0400, Michael Tokarev wrote: > > How about providing a way to stop kernel (or filesystem) to make gaps > in files instead? Like some ioctl(fd, FS_NOGAPS, 1) -- pretty much > like 'doze has, just the opposite (on windows, this flag is "on" by > default). This is being worked on already. XFS has a per-filesystem ioctl, but we want to create a filesystem-independent system call, sys_fallocate(), that would wired into the already existing posix_fallocate() function exported by glibc. > It's even worse: imagine samba transforms this into write(zeros) (as > preallocate isn't available yet), and at the same time, another process > is writing there... Which will be perfectly valid in current case, but > will go wrong way (overwriting just-written data with zeros) in this > new scenario. Samba can just use the posix_fallocate() system call. Note that if you have two processes are writing to the same file without proper locking, you're probably going to run into potential problems anyway. What if one process is writing whole blockfuls of data, while some brain-damaged Windows client is writing a byte of zero every 128k, and thus subtly corrupting the data written by the first process? We can't fix brain-damaged applications that aren't doing proper application level locking (Aside, of course, from convincing people to switch away from Vista to Linux. :-) - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Ext3 vs NTFS performance
> > Cabot, Mason B wrote: > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > NTFS/WinXP and have found that NTFS significantly > outperforms ext3 for > > video workloads. The Windows CIFS client will attempt a poor-man's > > pre-allocation of the file on the server by sending 1-byte writes at > > 128K-byte strides, breaking block allocation on ext3 and leading to > > fragmentation and poor performance. This will happen for many > > applications (including iTunes) as the CIFS client issues these > > pre-allocates under the application layer. > > This is rather hard to believe so I think some more information is in > order. Specifically, how do you know that it is the windows > kernel that > is issuing these writes and not the application? Under what > application > access patterns does it do this? > > This is just rather hard to believe seeing as how, iirc, the CIFS > protocol has commands to extend the file size properly rather > than with > this hack, and unless it is asked to by the application, the > cifs client > should not be trying to extend files. > Philip: the best response I can offer is that we have traced the application's file system accesses and seen no such one-byte writes occuring at that level. They are generated somewhere below the application. Additionally, while we have observed iTunes on Windows issuing these one-byte writes, ethereal traces for iTunes on Mac OSX show no such behavior. Because of these observations I think it is reasonable to conclude that the Windows CIFS client is generating the one-byte writes. thanks, Mason - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 08:23:08AM -0400, Theodore Tso wrote: > On Thu, May 03, 2007 at 02:14:52PM -0700, Valerie Henson wrote: > > > I'd really like to see a generic VFS-level detection of > > read()/write()/creat()/mkdir()/etc. patterns which could detect things > > like "Oh, this file is likely to be deleted immediately, wait and see > > if it goes away and don't bother sending it on to the FS immediately" > > or "Looks like this file will grow pretty big, let's go pre-allocate > > some space for it." This is probably best done as a set of helper > > functions in the usual way. > > What patterns do you think means things like "this file is likely to > be deleted immediate", or "this file will grow pretty big"? I don't > think there are any that would be generally valid. I wouldn't have guessed that either, but it turns out there are: http://www.eecs.harvard.edu/~ellard/pubs/able-usenix04.pdf We present evidence that attributes that are known to the file system when a file is created, such as its name, permission mode, and owner, are often strongly related to future properties of the file such as its ultimate size, lifespan, and access pattern. More importantly, we show that we can exploit these relationships to automatically generate predictive models for these properties, and that these predictions are sufficiently accurate to enable opti- mizations. For example, lock files have predictable names and permissions, and live for a fraction of second in most cases. Files which are appended a few hundred bytes at a time are probably log files and will continue to grow in this manner. Some of their predictions were 98% accurate! In any case, any predictive algorithms we already do at the file system level can be done at the VFS level, and shared between file systems, instead of being reimplemented over and over again. Just food for thought. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Cabot, Mason B wrote: I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. This is rather hard to believe so I think some more information is in order. Specifically, how do you know that it is the windows kernel that is issuing these writes and not the application? Under what application access patterns does it do this? This is just rather hard to believe seeing as how, iirc, the CIFS protocol has commands to extend the file size properly rather than with this hack, and unless it is asked to by the application, the cifs client should not be trying to extend files. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Christoph Hellwig wrote: > On Fri, May 04, 2007 at 09:12:31AM +0100, Anton Altaparmakov wrote: >> Nothing to do with win32 functions. Windows does NOT create sparse >> files therefore it never can have an issue like ext3 does in this >> scenario. Windows will cause nice allocations to happen because of >> this and the 1-byte writes are perfectly sensible in this regard. >> (Although a little odd as Windows has a proper API for doing >> preallocation so I don't get why it is not using that instead...) > > Which means the right place to fix this is samba. Samba just need > to intersept lseek and pread/pwrite to never allocate sparse files > but do the right thing instead. Now what the right thing would probably > be a preallocate instead of writing zeroes, and we need to provide the > infrastructure for them to do it, which is in progress currently. > (And in fact samba already does the right thing for XFS if you use > the prealloc samba vfs module, which AFAIK is not the default) Hmm. How about providing a way to stop kernel (or filesystem) to make gaps in files instead? Like some ioctl(fd, FS_NOGAPS, 1) -- pretty much like 'doze has, just the opposite (on windows, this flag is "on" by default). Fixing this issue in samba means that samba has to keep/track more state data than it currently does. Detecting such seek+write has some costs. It's even worse: imagine samba transforms this into write(zeros) (as preallocate isn't available yet), and at the same time, another process is writing there... Which will be perfectly valid in current case, but will go wrong way (overwriting just-written data with zeros) in this new scenario. But the main point is that samba has to keep track of things which it doesn't do now, and those things becomes.. interesting (difficult if at all possible to track) in multi-user/concurrent-writes environment. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On 4 May 2007, at 10:46, Christoph Hellwig wrote: On Fri, May 04, 2007 at 09:12:31AM +0100, Anton Altaparmakov wrote: Nothing to do with win32 functions. Windows does NOT create sparse files therefore it never can have an issue like ext3 does in this scenario. Windows will cause nice allocations to happen because of this and the 1-byte writes are perfectly sensible in this regard. (Although a little odd as Windows has a proper API for doing preallocation so I don't get why it is not using that instead...) Which means the right place to fix this is samba. Absolutely, agreed. Samba just need to intersept lseek and pread/pwrite to never allocate sparse files but do the right thing instead. Now what the right thing would probably be a preallocate instead of writing zeroes, and we need to provide the infrastructure for them to do it, which is in progress currently. (And in fact samba already does the right thing for XFS if you use the prealloc samba vfs module, which AFAIK is not the default) Best regards, Anton -- Anton Altaparmakov (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 02:14:52PM -0700, Valerie Henson wrote: > But in terms of what we should do to fix it, there is the possibility > of some debate. In general, I think there is a lot of code stuck down > in individual file systems - especially in XFS - that could be > usefully hoisted up to a higher level as generic helper functions. > For example, we've got at least two implementations of reservations, > one in XFS and one in ext3/4. At least some of the code could be > generic - both file systems want to reserve long contiguous extents - > with the actual mechanics of looking up and reserving free blocks > implemented in per-fs code. I'm not so sure. Most of the block allocation (and pre-allocation) code is actually of necessity going to be filesystem specific. There are patches currently in the ext4 patch queue which would provide a filesystem-generic preallocate system call, and that makes sense. And delayed allocation could be done more in the VM --- but the actual reservation code? It's not at all clear it makes sense to try to generalize it, since filesystems like XFS which look up free blocks via extents have fundamentally different abstractions which would be more efficient for them. > I'd really like to see a generic VFS-level detection of > read()/write()/creat()/mkdir()/etc. patterns which could detect things > like "Oh, this file is likely to be deleted immediately, wait and see > if it goes away and don't bother sending it on to the FS immediately" > or "Looks like this file will grow pretty big, let's go pre-allocate > some space for it." This is probably best done as a set of helper > functions in the usual way. What patterns do you think means things like "this file is likely to be deleted immediate", or "this file will grow pretty big"? I don't think there are any that would be generally valid. The only thing which I think makes sense is to delayed allocation, which as I said part of which could be done in the VM/VFS layer, and an explicit API for large files that need to persistent preallocation. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 09:12:31AM +0100, Anton Altaparmakov wrote: > Nothing to do with win32 functions. Windows does NOT create sparse > files therefore it never can have an issue like ext3 does in this > scenario. Windows will cause nice allocations to happen because of > this and the 1-byte writes are perfectly sensible in this regard. > (Although a little odd as Windows has a proper API for doing > preallocation so I don't get why it is not using that instead...) Which means the right place to fix this is samba. Samba just need to intersept lseek and pread/pwrite to never allocate sparse files but do the right thing instead. Now what the right thing would probably be a preallocate instead of writing zeroes, and we need to provide the infrastructure for them to do it, which is in progress currently. (And in fact samba already does the right thing for XFS if you use the prealloc samba vfs module, which AFAIK is not the default) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On 3 May 2007, at 23:40, Bernd Eckenfels wrote: In article <[EMAIL PROTECTED]> you wrote: For this particular case, Ted is probably right and the only place we'll ever see this insane poor man's pre-allocate pattern is from the Windows CIFS client, in which case fixing this in Samba makes sense - although I'm a bit horrified by the idea of writing 128K of zeroes to pre-allocate... oh well, it's temporary, and what we care about here is the read performance, more than the write performance. What about an ioctl or advice to avoid holes? Which could be issued by samba? Is that related to SetFileValidData and SetEndOfFile win32 functions? What is the windows client calling, and what command is transmitted by smb? Nothing to do with win32 functions. Windows does NOT create sparse files therefore it never can have an issue like ext3 does in this scenario. Windows will cause nice allocations to happen because of this and the 1-byte writes are perfectly sensible in this regard. (Although a little odd as Windows has a proper API for doing preallocation so I don't get why it is not using that instead...) As far as I know the only time Windows will create sparse files is if you specifically mark a file as sparse using the FSCTL_SET_SPARSE ioctl and then create a sparse region using the FSCTL_SET_ZERO_DATA ioctl. Best regards, Anton -- Anton Altaparmakov (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On 3 May 2007, at 23:40, Bernd Eckenfels wrote: In article [EMAIL PROTECTED] you wrote: For this particular case, Ted is probably right and the only place we'll ever see this insane poor man's pre-allocate pattern is from the Windows CIFS client, in which case fixing this in Samba makes sense - although I'm a bit horrified by the idea of writing 128K of zeroes to pre-allocate... oh well, it's temporary, and what we care about here is the read performance, more than the write performance. What about an ioctl or advice to avoid holes? Which could be issued by samba? Is that related to SetFileValidData and SetEndOfFile win32 functions? What is the windows client calling, and what command is transmitted by smb? Nothing to do with win32 functions. Windows does NOT create sparse files therefore it never can have an issue like ext3 does in this scenario. Windows will cause nice allocations to happen because of this and the 1-byte writes are perfectly sensible in this regard. (Although a little odd as Windows has a proper API for doing preallocation so I don't get why it is not using that instead...) As far as I know the only time Windows will create sparse files is if you specifically mark a file as sparse using the FSCTL_SET_SPARSE ioctl and then create a sparse region using the FSCTL_SET_ZERO_DATA ioctl. Best regards, Anton -- Anton Altaparmakov aia21 at cam.ac.uk (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 09:12:31AM +0100, Anton Altaparmakov wrote: Nothing to do with win32 functions. Windows does NOT create sparse files therefore it never can have an issue like ext3 does in this scenario. Windows will cause nice allocations to happen because of this and the 1-byte writes are perfectly sensible in this regard. (Although a little odd as Windows has a proper API for doing preallocation so I don't get why it is not using that instead...) Which means the right place to fix this is samba. Samba just need to intersept lseek and pread/pwrite to never allocate sparse files but do the right thing instead. Now what the right thing would probably be a preallocate instead of writing zeroes, and we need to provide the infrastructure for them to do it, which is in progress currently. (And in fact samba already does the right thing for XFS if you use the prealloc samba vfs module, which AFAIK is not the default) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 02:14:52PM -0700, Valerie Henson wrote: But in terms of what we should do to fix it, there is the possibility of some debate. In general, I think there is a lot of code stuck down in individual file systems - especially in XFS - that could be usefully hoisted up to a higher level as generic helper functions. For example, we've got at least two implementations of reservations, one in XFS and one in ext3/4. At least some of the code could be generic - both file systems want to reserve long contiguous extents - with the actual mechanics of looking up and reserving free blocks implemented in per-fs code. I'm not so sure. Most of the block allocation (and pre-allocation) code is actually of necessity going to be filesystem specific. There are patches currently in the ext4 patch queue which would provide a filesystem-generic preallocate system call, and that makes sense. And delayed allocation could be done more in the VM --- but the actual reservation code? It's not at all clear it makes sense to try to generalize it, since filesystems like XFS which look up free blocks via extents have fundamentally different abstractions which would be more efficient for them. I'd really like to see a generic VFS-level detection of read()/write()/creat()/mkdir()/etc. patterns which could detect things like Oh, this file is likely to be deleted immediately, wait and see if it goes away and don't bother sending it on to the FS immediately or Looks like this file will grow pretty big, let's go pre-allocate some space for it. This is probably best done as a set of helper functions in the usual way. What patterns do you think means things like this file is likely to be deleted immediate, or this file will grow pretty big? I don't think there are any that would be generally valid. The only thing which I think makes sense is to delayed allocation, which as I said part of which could be done in the VM/VFS layer, and an explicit API for large files that need to persistent preallocation. - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On 4 May 2007, at 10:46, Christoph Hellwig wrote: On Fri, May 04, 2007 at 09:12:31AM +0100, Anton Altaparmakov wrote: Nothing to do with win32 functions. Windows does NOT create sparse files therefore it never can have an issue like ext3 does in this scenario. Windows will cause nice allocations to happen because of this and the 1-byte writes are perfectly sensible in this regard. (Although a little odd as Windows has a proper API for doing preallocation so I don't get why it is not using that instead...) Which means the right place to fix this is samba. Absolutely, agreed. Samba just need to intersept lseek and pread/pwrite to never allocate sparse files but do the right thing instead. Now what the right thing would probably be a preallocate instead of writing zeroes, and we need to provide the infrastructure for them to do it, which is in progress currently. (And in fact samba already does the right thing for XFS if you use the prealloc samba vfs module, which AFAIK is not the default) Best regards, Anton -- Anton Altaparmakov aia21 at cam.ac.uk (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Christoph Hellwig wrote: On Fri, May 04, 2007 at 09:12:31AM +0100, Anton Altaparmakov wrote: Nothing to do with win32 functions. Windows does NOT create sparse files therefore it never can have an issue like ext3 does in this scenario. Windows will cause nice allocations to happen because of this and the 1-byte writes are perfectly sensible in this regard. (Although a little odd as Windows has a proper API for doing preallocation so I don't get why it is not using that instead...) Which means the right place to fix this is samba. Samba just need to intersept lseek and pread/pwrite to never allocate sparse files but do the right thing instead. Now what the right thing would probably be a preallocate instead of writing zeroes, and we need to provide the infrastructure for them to do it, which is in progress currently. (And in fact samba already does the right thing for XFS if you use the prealloc samba vfs module, which AFAIK is not the default) Hmm. How about providing a way to stop kernel (or filesystem) to make gaps in files instead? Like some ioctl(fd, FS_NOGAPS, 1) -- pretty much like 'doze has, just the opposite (on windows, this flag is on by default). Fixing this issue in samba means that samba has to keep/track more state data than it currently does. Detecting such seek+write has some costs. It's even worse: imagine samba transforms this into write(zeros) (as preallocate isn't available yet), and at the same time, another process is writing there... Which will be perfectly valid in current case, but will go wrong way (overwriting just-written data with zeros) in this new scenario. But the main point is that samba has to keep track of things which it doesn't do now, and those things becomes.. interesting (difficult if at all possible to track) in multi-user/concurrent-writes environment. /mjt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Cabot, Mason B wrote: I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. This is rather hard to believe so I think some more information is in order. Specifically, how do you know that it is the windows kernel that is issuing these writes and not the application? Under what application access patterns does it do this? This is just rather hard to believe seeing as how, iirc, the CIFS protocol has commands to extend the file size properly rather than with this hack, and unless it is asked to by the application, the cifs client should not be trying to extend files. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 08:23:08AM -0400, Theodore Tso wrote: On Thu, May 03, 2007 at 02:14:52PM -0700, Valerie Henson wrote: I'd really like to see a generic VFS-level detection of read()/write()/creat()/mkdir()/etc. patterns which could detect things like Oh, this file is likely to be deleted immediately, wait and see if it goes away and don't bother sending it on to the FS immediately or Looks like this file will grow pretty big, let's go pre-allocate some space for it. This is probably best done as a set of helper functions in the usual way. What patterns do you think means things like this file is likely to be deleted immediate, or this file will grow pretty big? I don't think there are any that would be generally valid. I wouldn't have guessed that either, but it turns out there are: http://www.eecs.harvard.edu/~ellard/pubs/able-usenix04.pdf We present evidence that attributes that are known to the file system when a file is created, such as its name, permission mode, and owner, are often strongly related to future properties of the file such as its ultimate size, lifespan, and access pattern. More importantly, we show that we can exploit these relationships to automatically generate predictive models for these properties, and that these predictions are sufficiently accurate to enable opti- mizations. For example, lock files have predictable names and permissions, and live for a fraction of second in most cases. Files which are appended a few hundred bytes at a time are probably log files and will continue to grow in this manner. Some of their predictions were 98% accurate! In any case, any predictive algorithms we already do at the file system level can be done at the VFS level, and shared between file systems, instead of being reimplemented over and over again. Just food for thought. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Ext3 vs NTFS performance
Cabot, Mason B wrote: I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. This is rather hard to believe so I think some more information is in order. Specifically, how do you know that it is the windows kernel that is issuing these writes and not the application? Under what application access patterns does it do this? This is just rather hard to believe seeing as how, iirc, the CIFS protocol has commands to extend the file size properly rather than with this hack, and unless it is asked to by the application, the cifs client should not be trying to extend files. Philip: the best response I can offer is that we have traced the application's file system accesses and seen no such one-byte writes occuring at that level. They are generated somewhere below the application. Additionally, while we have observed iTunes on Windows issuing these one-byte writes, ethereal traces for iTunes on Mac OSX show no such behavior. Because of these observations I think it is reasonable to conclude that the Windows CIFS client is generating the one-byte writes. thanks, Mason - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 07:49:13PM +0400, Michael Tokarev wrote: How about providing a way to stop kernel (or filesystem) to make gaps in files instead? Like some ioctl(fd, FS_NOGAPS, 1) -- pretty much like 'doze has, just the opposite (on windows, this flag is on by default). This is being worked on already. XFS has a per-filesystem ioctl, but we want to create a filesystem-independent system call, sys_fallocate(), that would wired into the already existing posix_fallocate() function exported by glibc. It's even worse: imagine samba transforms this into write(zeros) (as preallocate isn't available yet), and at the same time, another process is writing there... Which will be perfectly valid in current case, but will go wrong way (overwriting just-written data with zeros) in this new scenario. Samba can just use the posix_fallocate() system call. Note that if you have two processes are writing to the same file without proper locking, you're probably going to run into potential problems anyway. What if one process is writing whole blockfuls of data, while some brain-damaged Windows client is writing a byte of zero every 128k, and thus subtly corrupting the data written by the first process? We can't fix brain-damaged applications that aren't doing proper application level locking (Aside, of course, from convincing people to switch away from Vista to Linux. :-) - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, 1 May 2007 13:43:18 -0700 Cabot, Mason B [EMAIL PROTECTED] wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. On 5 Mai, 10:20, Theodore Tso [EMAIL PROTECTED] wrote: This is being worked on already. XFS has a per-filesystem ioctl, but we want to create a filesystem-independent system call, sys_fallocate(), that would wired into the already existing posix_fallocate() function exported by glibc. The story told us: an application must look to the file-systems, ext3 is good at aaa, is not good at bbb; XFS is good at ccc, is not good at ddd; reiserfs is good at eee, is not good at fff For this scenario, XFS is good at dealing with fragmentation while ext3 not. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
In article <[EMAIL PROTECTED]> you wrote: > For this particular case, Ted is probably right and the only place > we'll ever see this insane poor man's pre-allocate pattern is from the > Windows CIFS client, in which case fixing this in Samba makes sense - > although I'm a bit horrified by the idea of writing 128K of zeroes to > pre-allocate... oh well, it's temporary, and what we care about here > is the read performance, more than the write performance. What about an ioctl or advice to avoid holes? Which could be issued by samba? Is that related to SetFileValidData and SetEndOfFile win32 functions? What is the windows client calling, and what command is transmitted by smb? Gruss Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote: > On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: > > Hello all, > > > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > > video workloads. The Windows CIFS client will attempt a poor-man's > > pre-allocation of the file on the server by sending 1-byte writes at > > 128K-byte strides, breaking block allocation on ext3 and leading to > > fragmentation and poor performance. This will happen for many > > applications (including iTunes) as the CIFS client issues these > > pre-allocates under the application layer. > > > > I've posted a brief paper on Intel's OSS website > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > > it a read and let me know what you think. In particular, I'd like to > > arrive at the right place to fix this problem: is it in the filesystem, > > VFS, or Samba? > > As I commented on IRC to Val Henson - the XFS performance indicates > that it is not a VFS or Samba problem. In terms of what piece of code we can swap out and get good performance, the problem is indeed in ext3 - it's clear that the cause of the bad performance is the 1-byte writes resulting in ext3 fragmenting the on-disk layout of the file, and replacing it with XFS results in nice, clean, unfragmented files. But in terms of what we should do to fix it, there is the possibility of some debate. In general, I think there is a lot of code stuck down in individual file systems - especially in XFS - that could be usefully hoisted up to a higher level as generic helper functions. For example, we've got at least two implementations of reservations, one in XFS and one in ext3/4. At least some of the code could be generic - both file systems want to reserve long contiguous extents - with the actual mechanics of looking up and reserving free blocks implemented in per-fs code. I'd really like to see a generic VFS-level detection of read()/write()/creat()/mkdir()/etc. patterns which could detect things like "Oh, this file is likely to be deleted immediately, wait and see if it goes away and don't bother sending it on to the FS immediately" or "Looks like this file will grow pretty big, let's go pre-allocate some space for it." This is probably best done as a set of helper functions in the usual way. For this particular case, Ted is probably right and the only place we'll ever see this insane poor man's pre-allocate pattern is from the Windows CIFS client, in which case fixing this in Samba makes sense - although I'm a bit horrified by the idea of writing 128K of zeroes to pre-allocate... oh well, it's temporary, and what we care about here is the read performance, more than the write performance. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 10:15:11AM +1000, David Chinner wrote: [ bad fragmentation from a funky write one byte every 128k system ] > > This only becomes a problem if the system has enough pages dirty to > be triggering throttling so that the 1byte writes are converted before > the data actually hits the server. > > Even then, if you are on an XFS filesystem with a sunit/swidth set, > the alocation alignments and speculative allocations will go a long > way to preventing fragmentations. > > If that doesn't work, then set the extent allocation size hint on the > XFS inode to 128k or 256k to set the minimum all ocation size for the > file to span the distance between the 1 byte writes. This attribute > can be inherited from the parent directory on create, so it's a > set and forget type of thing... > > i.e. XFS has lots of ways to prevent perfromance from degrading > on these sorts of issues. I'm not surprised that XFS would fair the best in this workload, but this sounds like a lot of magic that shouldn't be required. The fact that it is good to have the allocation knobs and delalloc in general doesn't mean that samba shouldn't do the right thing and preallocate the space in a sensible fashion. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 10:15:11AM +1000, David Chinner wrote: [ bad fragmentation from a funky write one byte every 128k system ] This only becomes a problem if the system has enough pages dirty to be triggering throttling so that the 1byte writes are converted before the data actually hits the server. Even then, if you are on an XFS filesystem with a sunit/swidth set, the alocation alignments and speculative allocations will go a long way to preventing fragmentations. If that doesn't work, then set the extent allocation size hint on the XFS inode to 128k or 256k to set the minimum all ocation size for the file to span the distance between the 1 byte writes. This attribute can be inherited from the parent directory on create, so it's a set and forget type of thing... i.e. XFS has lots of ways to prevent perfromance from degrading on these sorts of issues. I'm not surprised that XFS would fair the best in this workload, but this sounds like a lot of magic that shouldn't be required. The fact that it is good to have the allocation knobs and delalloc in general doesn't mean that samba shouldn't do the right thing and preallocate the space in a sensible fashion. -chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote: On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? As I commented on IRC to Val Henson - the XFS performance indicates that it is not a VFS or Samba problem. In terms of what piece of code we can swap out and get good performance, the problem is indeed in ext3 - it's clear that the cause of the bad performance is the 1-byte writes resulting in ext3 fragmenting the on-disk layout of the file, and replacing it with XFS results in nice, clean, unfragmented files. But in terms of what we should do to fix it, there is the possibility of some debate. In general, I think there is a lot of code stuck down in individual file systems - especially in XFS - that could be usefully hoisted up to a higher level as generic helper functions. For example, we've got at least two implementations of reservations, one in XFS and one in ext3/4. At least some of the code could be generic - both file systems want to reserve long contiguous extents - with the actual mechanics of looking up and reserving free blocks implemented in per-fs code. I'd really like to see a generic VFS-level detection of read()/write()/creat()/mkdir()/etc. patterns which could detect things like Oh, this file is likely to be deleted immediately, wait and see if it goes away and don't bother sending it on to the FS immediately or Looks like this file will grow pretty big, let's go pre-allocate some space for it. This is probably best done as a set of helper functions in the usual way. For this particular case, Ted is probably right and the only place we'll ever see this insane poor man's pre-allocate pattern is from the Windows CIFS client, in which case fixing this in Samba makes sense - although I'm a bit horrified by the idea of writing 128K of zeroes to pre-allocate... oh well, it's temporary, and what we care about here is the read performance, more than the write performance. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
In article [EMAIL PROTECTED] you wrote: For this particular case, Ted is probably right and the only place we'll ever see this insane poor man's pre-allocate pattern is from the Windows CIFS client, in which case fixing this in Samba makes sense - although I'm a bit horrified by the idea of writing 128K of zeroes to pre-allocate... oh well, it's temporary, and what we care about here is the read performance, more than the write performance. What about an ioctl or advice to avoid holes? Which could be issued by samba? Is that related to SetFileValidData and SetEndOfFile win32 functions? What is the windows client calling, and what command is transmitted by smb? Gruss Bernd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
David Chinner wrote: > On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > > video workloads. The Windows CIFS client will attempt a poor-man's > > pre-allocation of the file on the server by sending 1-byte writes at > > 128K-byte strides, breaking block allocation on ext3 and leading to > > fragmentation and poor performance. This will happen for many > > applications (including iTunes) as the CIFS client issues these > > pre-allocates under the application layer. > > > > I've posted a brief paper on Intel's OSS website > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > > it a read and let me know what you think. In particular, I'd like to > > arrive at the right place to fix this problem: is it in the filesystem, > > VFS, or Samba? It's a Samba problem. Samba doesn't do async writes, which v3.0 should have fixed. Did you try that? > As I commented on IRC to Val Henson - the XFS performance indicates > that it is not a VFS or Samba problem. XFS somewhat hides the Samba problem, by efficiently syncing to disk. Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 03:46:21PM -0400, Chris Mason wrote: > On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote: > > On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: > > > Hello all, > > > > > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > > > video workloads. The Windows CIFS client will attempt a poor-man's > > > pre-allocation of the file on the server by sending 1-byte writes at > > > 128K-byte strides, breaking block allocation on ext3 and leading to > > > fragmentation and poor performance. This will happen for many > > > applications (including iTunes) as the CIFS client issues these > > > pre-allocates under the application layer. > > > > > > I've posted a brief paper on Intel's OSS website > > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > > > it a read and let me know what you think. In particular, I'd like to > > > arrive at the right place to fix this problem: is it in the filesystem, > > > VFS, or Samba? > > > > As I commented on IRC to Val Henson - the XFS performance indicates > > that it is not a VFS or Samba problem. > > > > I'd say it's probably delayed allocation that is making the > > difference here - no allocation occurs on the single byte writes, it > > occurs when the larger data writes are flushed to disk. Hence no > > adverse fragmentation will occur and there wil be no extra > > allocations being done. > > > > Hence I think it's probably a filesystm problem - it would be > > interesting to see how ext4 performs on this workload > > If we rely on delalloc for this, what happens if another proc on the > same fs is doing synchronous writes to other files? (say for mail > delivery). Will random FS commits force delayed allocations to become > real? Not on XFS. > Also, I'd expect a sufficiently loaded server to break down eventually > as load/users increase. The cost of a bad delalloc decision gets much > higher if we're using it as a crutch for this kind of bad userland > coding. This only becomes a problem if the system has enough pages dirty to be triggering throttling so that the 1byte writes are converted before the data actually hits the server. Even then, if you are on an XFS filesystem with a sunit/swidth set, the alocation alignments and speculative allocations will go a long way to preventing fragmentations. If that doesn't work, then set the extent allocation size hint on the XFS inode to 128k or 256k to set the minimum all ocation size for the file to span the distance between the 1 byte writes. This attribute can be inherited from the parent directory on create, so it's a set and forget type of thing... i.e. XFS has lots of ways to prevent perfromance from degrading on these sorts of issues. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 04:38:55PM -0400, Jeff Garzik wrote: > > I think we mostly have consensus on a calling convention which > >all of the architectures (s390, power, arm, ia64, etc.); of course > >then we will need to get glibc to support the new system call. > > glibc has had support for a while, in emulated form: > http://www.uwsg.iu.edu/hypermail/linux/kernel/0004.1/1153.html > > So when kernel support arrives, it should be easy and (hopefully) > seamless to plug in the new syscall. Yep. Although unfortunately given where we are in distro release cycles (and I'm not sure where glibc is in its release cycle), it'll probably be a year or so before most users will see the benefits. So it would be nice if we can get samba using the fallocate() support now, in the hopes that we can get all of the pieces aligned in time for the next major enterprise distro releases. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Theodore Tso wrote: FYI, we are currently closing on a new system call so that glibc's fallocate() will be able to call into the appropriate per-filesystem routines in a portable way, since ext4 will have persistent preallocation support. Yep. I think we mostly have consensus on a calling convention which all of the architectures (s390, power, arm, ia64, etc.); of course then we will need to get glibc to support the new system call. glibc has had support for a while, in emulated form: http://www.uwsg.iu.edu/hypermail/linux/kernel/0004.1/1153.html So when kernel support arrives, it should be easy and (hopefully) seamless to plug in the new syscall. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote: > On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: > > Hello all, > > > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > > video workloads. The Windows CIFS client will attempt a poor-man's > > pre-allocation of the file on the server by sending 1-byte writes at > > 128K-byte strides, breaking block allocation on ext3 and leading to > > fragmentation and poor performance. This will happen for many > > applications (including iTunes) as the CIFS client issues these > > pre-allocates under the application layer. > > > > I've posted a brief paper on Intel's OSS website > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > > it a read and let me know what you think. In particular, I'd like to > > arrive at the right place to fix this problem: is it in the filesystem, > > VFS, or Samba? > > As I commented on IRC to Val Henson - the XFS performance indicates > that it is not a VFS or Samba problem. > > I'd say it's probably delayed allocation that is making the > difference here - no allocation occurs on the single byte writes, it > occurs when the larger data writes are flushed to disk. Hence no > adverse fragmentation will occur and there wil be no extra > allocations being done. > > Hence I think it's probably a filesystm problem - it would be > interesting to see how ext4 performs on this workload If we rely on delalloc for this, what happens if another proc on the same fs is doing synchronous writes to other files? (say for mail delivery). Will random FS commits force delayed allocations to become real? Also, I'd expect a sufficiently loaded server to break down eventually as load/users increase. The cost of a bad delalloc decision gets much higher if we're using it as a crutch for this kind of bad userland coding. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 11:08:10AM -0700, Jeremy Allison wrote: > > The right place is clearly Samba. I can't think of any other program > > or filesystem protocol where writing a 1 byte write at 128k strides > > would be used to signal a desire to do preallocation. In fact, it's > > hard to think of a worse way of doing things. > > In fact they don't need to do this - there's an explicit CIFS > set file allocation call to pre-allocate size they could use. > > There's a specific Samba VFS module that has XFS specific calls > to do this - vfs_prealloc. - but this won't work on ext3. Jeremy, FYI, we are currently closing on a new system call so that glibc's fallocate() will be able to call into the appropriate per-filesystem routines in a portable way, since ext4 will have persistent preallocation support. I think we mostly have consensus on a calling convention which all of the architectures (s390, power, arm, ia64, etc.); of course then we will need to get glibc to support the new system call. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 08:40:35PM +0200, Andi Kleen wrote: > Theodore Tso <[EMAIL PROTECTED]> writes: > > > On Wed, May 02, 2007 at 02:21:40PM +0200, Andi Kleen wrote: > > > Andrew Morton <[EMAIL PROTECTED]> writes: > > > > > > > > Conceivably we could address this in the filesystem without mucking > > > > other > > > > things up. But I'd have thought the simplest damage-control would be to > > > > detect this pattern in samba and to then use glibc's fallocate(). > > > > > > The advantage of detecting it in kernel would be that it would handle > > > Linux applications that do this (I suspect there are some) too. > > > > Um, which applications do you suspect? So we can hunt down those user > > space applications programmers and slap them silly? Or rather, > > unsilly, since that there's no good reason to ever suspect that > > writing a byte every 128k would result in a good allocation layout on disk? > > Anything that uses glibc fallocate() ? Glibc's fallocate current writes all zeros, not 1 byte every 128kbytes. And once we wire up the new sys_fallocate() support, we'll have the right preallocation support in ext4. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 12:16:38PM -0400, Theodore Tso wrote: > On Tue, May 01, 2007 at 02:23:25PM -0700, Andrew Morton wrote: > > On Tue, 1 May 2007 13:43:18 -0700 > > "Cabot, Mason B" <[EMAIL PROTECTED]> wrote: > > > > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > > > video workloads. The Windows CIFS client will attempt a poor-man's > > > pre-allocation of the file on the server by sending 1-byte writes at > > > 128K-byte strides, breaking block allocation on ext3 and leading to > > > fragmentation and poor performance. This will happen for many > > > applications (including iTunes) as the CIFS client issues these > > > pre-allocates under the application layer. > > > > Oh my gawd, what a stupid hack. Now we know what the MS interoperability > > lab has been working on. > > I wonder if they patented this technique as well, as well as one of > their dozen or so patents they are filing every day? "A Method of > Screwing Over Samba's Performance So that Windows Longhorn Can Compete > On Performance" coming soon, to a patent database near you! :-) > > > > I've posted a brief paper on Intel's OSS website > > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > > > it a read and let me know what you think. In particular, I'd like to > > > arrive at the right place to fix this problem: is it in the filesystem, > > > VFS, or Samba? > > The right place is clearly Samba. I can't think of any other program > or filesystem protocol where writing a 1 byte write at 128k strides > would be used to signal a desire to do preallocation. In fact, it's > hard to think of a worse way of doing things. In fact they don't need to do this - there's an explicit CIFS set file allocation call to pre-allocate size they could use. There's a specific Samba VFS module that has XFS specific calls to do this - vfs_prealloc. - but this won't work on ext3. Jeremy. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Theodore Tso <[EMAIL PROTECTED]> writes: > On Wed, May 02, 2007 at 02:21:40PM +0200, Andi Kleen wrote: > > Andrew Morton <[EMAIL PROTECTED]> writes: > > > > > > Conceivably we could address this in the filesystem without mucking other > > > things up. But I'd have thought the simplest damage-control would be to > > > detect this pattern in samba and to then use glibc's fallocate(). > > > > The advantage of detecting it in kernel would be that it would handle > > Linux applications that do this (I suspect there are some) too. > > Um, which applications do you suspect? So we can hunt down those user > space applications programmers and slap them silly? Or rather, > unsilly, since that there's no good reason to ever suspect that > writing a byte every 128k would result in a good allocation layout on disk? Anything that uses glibc fallocate() ? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, May 01, 2007 at 02:23:25PM -0700, Andrew Morton wrote: > On Tue, 1 May 2007 13:43:18 -0700 > "Cabot, Mason B" <[EMAIL PROTECTED]> wrote: > > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > > video workloads. The Windows CIFS client will attempt a poor-man's > > pre-allocation of the file on the server by sending 1-byte writes at > > 128K-byte strides, breaking block allocation on ext3 and leading to > > fragmentation and poor performance. This will happen for many > > applications (including iTunes) as the CIFS client issues these > > pre-allocates under the application layer. > > Oh my gawd, what a stupid hack. Now we know what the MS interoperability > lab has been working on. I wonder if they patented this technique as well, as well as one of their dozen or so patents they are filing every day? "A Method of Screwing Over Samba's Performance So that Windows Longhorn Can Compete On Performance" coming soon, to a patent database near you! :-) > > I've posted a brief paper on Intel's OSS website > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > > it a read and let me know what you think. In particular, I'd like to > > arrive at the right place to fix this problem: is it in the filesystem, > > VFS, or Samba? The right place is clearly Samba. I can't think of any other program or filesystem protocol where writing a 1 byte write at 128k strides would be used to signal a desire to do preallocation. In fact, it's hard to think of a worse way of doing things. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 02:21:40PM +0200, Andi Kleen wrote: > Andrew Morton <[EMAIL PROTECTED]> writes: > > > > Conceivably we could address this in the filesystem without mucking other > > things up. But I'd have thought the simplest damage-control would be to > > detect this pattern in samba and to then use glibc's fallocate(). > > The advantage of detecting it in kernel would be that it would handle > Linux applications that do this (I suspect there are some) too. Um, which applications do you suspect? So we can hunt down those user space applications programmers and slap them silly? Or rather, unsilly, since that there's no good reason to ever suspect that writing a byte every 128k would result in a good allocation layout on disk? - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, May 01, 2007 at 11:54:04PM -0400, Gerhard Mack wrote: > On Tue, 1 May 2007, Cabot, Mason B wrote: > > > Hello all, > > > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > > video workloads. The Windows CIFS client will attempt a poor-man's > > pre-allocation of the file on the server by sending 1-byte writes at > > 128K-byte strides, breaking block allocation on ext3 and leading to > > fragmentation and poor performance. This will happen for many > > applications (including iTunes) as the CIFS client issues these > > pre-allocates under the application layer. > > > > I've posted a brief paper on Intel's OSS website > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > > it a read and let me know what you think. In particular, I'd like to > > arrive at the right place to fix this problem: is it in the filesystem, > > VFS, or Samba? > > > > thanks, > > Mason > > > > Just out of curiosity do other filesystems(reiser, xfs) take the same > performance hit? XFS was also tested - it is as fast as the Windows NTFS based server. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: > Hello all, > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > video workloads. The Windows CIFS client will attempt a poor-man's > pre-allocation of the file on the server by sending 1-byte writes at > 128K-byte strides, breaking block allocation on ext3 and leading to > fragmentation and poor performance. This will happen for many > applications (including iTunes) as the CIFS client issues these > pre-allocates under the application layer. > > I've posted a brief paper on Intel's OSS website > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > it a read and let me know what you think. In particular, I'd like to > arrive at the right place to fix this problem: is it in the filesystem, > VFS, or Samba? As I commented on IRC to Val Henson - the XFS performance indicates that it is not a VFS or Samba problem. I'd say it's probably delayed allocation that is making the difference here - no allocation occurs on the single byte writes, it occurs when the larger data writes are flushed to disk. Hence no adverse fragmentation will occur and there wil be no extra allocations being done. Hence I think it's probably a filesystm problem - it would be interesting to see how ext4 performs on this workload Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Andrew Morton <[EMAIL PROTECTED]> writes: > > Conceivably we could address this in the filesystem without mucking other > things up. But I'd have thought the simplest damage-control would be to > detect this pattern in samba and to then use glibc's fallocate(). The advantage of detecting it in kernel would be that it would handle Linux applications that do this (I suspect there are some) too. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Andrew Morton [EMAIL PROTECTED] writes: Conceivably we could address this in the filesystem without mucking other things up. But I'd have thought the simplest damage-control would be to detect this pattern in samba and to then use glibc's fallocate(). The advantage of detecting it in kernel would be that it would handle Linux applications that do this (I suspect there are some) too. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? As I commented on IRC to Val Henson - the XFS performance indicates that it is not a VFS or Samba problem. I'd say it's probably delayed allocation that is making the difference here - no allocation occurs on the single byte writes, it occurs when the larger data writes are flushed to disk. Hence no adverse fragmentation will occur and there wil be no extra allocations being done. Hence I think it's probably a filesystm problem - it would be interesting to see how ext4 performs on this workload Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, May 01, 2007 at 11:54:04PM -0400, Gerhard Mack wrote: On Tue, 1 May 2007, Cabot, Mason B wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? thanks, Mason Just out of curiosity do other filesystems(reiser, xfs) take the same performance hit? XFS was also tested - it is as fast as the Windows NTFS based server. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 02:21:40PM +0200, Andi Kleen wrote: Andrew Morton [EMAIL PROTECTED] writes: Conceivably we could address this in the filesystem without mucking other things up. But I'd have thought the simplest damage-control would be to detect this pattern in samba and to then use glibc's fallocate(). The advantage of detecting it in kernel would be that it would handle Linux applications that do this (I suspect there are some) too. Um, which applications do you suspect? So we can hunt down those user space applications programmers and slap them silly? Or rather, unsilly, since that there's no good reason to ever suspect that writing a byte every 128k would result in a good allocation layout on disk? - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, May 01, 2007 at 02:23:25PM -0700, Andrew Morton wrote: On Tue, 1 May 2007 13:43:18 -0700 Cabot, Mason B [EMAIL PROTECTED] wrote: I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. Oh my gawd, what a stupid hack. Now we know what the MS interoperability lab has been working on. I wonder if they patented this technique as well, as well as one of their dozen or so patents they are filing every day? A Method of Screwing Over Samba's Performance So that Windows Longhorn Can Compete On Performance coming soon, to a patent database near you! :-) I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? The right place is clearly Samba. I can't think of any other program or filesystem protocol where writing a 1 byte write at 128k strides would be used to signal a desire to do preallocation. In fact, it's hard to think of a worse way of doing things. - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Theodore Tso [EMAIL PROTECTED] writes: On Wed, May 02, 2007 at 02:21:40PM +0200, Andi Kleen wrote: Andrew Morton [EMAIL PROTECTED] writes: Conceivably we could address this in the filesystem without mucking other things up. But I'd have thought the simplest damage-control would be to detect this pattern in samba and to then use glibc's fallocate(). The advantage of detecting it in kernel would be that it would handle Linux applications that do this (I suspect there are some) too. Um, which applications do you suspect? So we can hunt down those user space applications programmers and slap them silly? Or rather, unsilly, since that there's no good reason to ever suspect that writing a byte every 128k would result in a good allocation layout on disk? Anything that uses glibc fallocate() ? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 12:16:38PM -0400, Theodore Tso wrote: On Tue, May 01, 2007 at 02:23:25PM -0700, Andrew Morton wrote: On Tue, 1 May 2007 13:43:18 -0700 Cabot, Mason B [EMAIL PROTECTED] wrote: I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. Oh my gawd, what a stupid hack. Now we know what the MS interoperability lab has been working on. I wonder if they patented this technique as well, as well as one of their dozen or so patents they are filing every day? A Method of Screwing Over Samba's Performance So that Windows Longhorn Can Compete On Performance coming soon, to a patent database near you! :-) I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? The right place is clearly Samba. I can't think of any other program or filesystem protocol where writing a 1 byte write at 128k strides would be used to signal a desire to do preallocation. In fact, it's hard to think of a worse way of doing things. In fact they don't need to do this - there's an explicit CIFS set file allocation call to pre-allocate size they could use. There's a specific Samba VFS module that has XFS specific calls to do this - vfs_prealloc. - but this won't work on ext3. Jeremy. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 08:40:35PM +0200, Andi Kleen wrote: Theodore Tso [EMAIL PROTECTED] writes: On Wed, May 02, 2007 at 02:21:40PM +0200, Andi Kleen wrote: Andrew Morton [EMAIL PROTECTED] writes: Conceivably we could address this in the filesystem without mucking other things up. But I'd have thought the simplest damage-control would be to detect this pattern in samba and to then use glibc's fallocate(). The advantage of detecting it in kernel would be that it would handle Linux applications that do this (I suspect there are some) too. Um, which applications do you suspect? So we can hunt down those user space applications programmers and slap them silly? Or rather, unsilly, since that there's no good reason to ever suspect that writing a byte every 128k would result in a good allocation layout on disk? Anything that uses glibc fallocate() ? Glibc's fallocate current writes all zeros, not 1 byte every 128kbytes. And once we wire up the new sys_fallocate() support, we'll have the right preallocation support in ext4. - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 11:08:10AM -0700, Jeremy Allison wrote: The right place is clearly Samba. I can't think of any other program or filesystem protocol where writing a 1 byte write at 128k strides would be used to signal a desire to do preallocation. In fact, it's hard to think of a worse way of doing things. In fact they don't need to do this - there's an explicit CIFS set file allocation call to pre-allocate size they could use. There's a specific Samba VFS module that has XFS specific calls to do this - vfs_prealloc. - but this won't work on ext3. Jeremy, FYI, we are currently closing on a new system call so that glibc's fallocate() will be able to call into the appropriate per-filesystem routines in a portable way, since ext4 will have persistent preallocation support. I think we mostly have consensus on a calling convention which all of the architectures (s390, power, arm, ia64, etc.); of course then we will need to get glibc to support the new system call. - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote: On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? As I commented on IRC to Val Henson - the XFS performance indicates that it is not a VFS or Samba problem. I'd say it's probably delayed allocation that is making the difference here - no allocation occurs on the single byte writes, it occurs when the larger data writes are flushed to disk. Hence no adverse fragmentation will occur and there wil be no extra allocations being done. Hence I think it's probably a filesystm problem - it would be interesting to see how ext4 performs on this workload If we rely on delalloc for this, what happens if another proc on the same fs is doing synchronous writes to other files? (say for mail delivery). Will random FS commits force delayed allocations to become real? Also, I'd expect a sufficiently loaded server to break down eventually as load/users increase. The cost of a bad delalloc decision gets much higher if we're using it as a crutch for this kind of bad userland coding. -chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
Theodore Tso wrote: FYI, we are currently closing on a new system call so that glibc's fallocate() will be able to call into the appropriate per-filesystem routines in a portable way, since ext4 will have persistent preallocation support. Yep. I think we mostly have consensus on a calling convention which all of the architectures (s390, power, arm, ia64, etc.); of course then we will need to get glibc to support the new system call. glibc has had support for a while, in emulated form: http://www.uwsg.iu.edu/hypermail/linux/kernel/0004.1/1153.html So when kernel support arrives, it should be easy and (hopefully) seamless to plug in the new syscall. Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 04:38:55PM -0400, Jeff Garzik wrote: I think we mostly have consensus on a calling convention which all of the architectures (s390, power, arm, ia64, etc.); of course then we will need to get glibc to support the new system call. glibc has had support for a while, in emulated form: http://www.uwsg.iu.edu/hypermail/linux/kernel/0004.1/1153.html So when kernel support arrives, it should be easy and (hopefully) seamless to plug in the new syscall. Yep. Although unfortunately given where we are in distro release cycles (and I'm not sure where glibc is in its release cycle), it'll probably be a year or so before most users will see the benefits. So it would be nice if we can get samba using the fallocate() support now, in the hopes that we can get all of the pieces aligned in time for the next major enterprise distro releases. - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Wed, May 02, 2007 at 03:46:21PM -0400, Chris Mason wrote: On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote: On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? As I commented on IRC to Val Henson - the XFS performance indicates that it is not a VFS or Samba problem. I'd say it's probably delayed allocation that is making the difference here - no allocation occurs on the single byte writes, it occurs when the larger data writes are flushed to disk. Hence no adverse fragmentation will occur and there wil be no extra allocations being done. Hence I think it's probably a filesystm problem - it would be interesting to see how ext4 performs on this workload If we rely on delalloc for this, what happens if another proc on the same fs is doing synchronous writes to other files? (say for mail delivery). Will random FS commits force delayed allocations to become real? Not on XFS. Also, I'd expect a sufficiently loaded server to break down eventually as load/users increase. The cost of a bad delalloc decision gets much higher if we're using it as a crutch for this kind of bad userland coding. This only becomes a problem if the system has enough pages dirty to be triggering throttling so that the 1byte writes are converted before the data actually hits the server. Even then, if you are on an XFS filesystem with a sunit/swidth set, the alocation alignments and speculative allocations will go a long way to preventing fragmentations. If that doesn't work, then set the extent allocation size hint on the XFS inode to 128k or 256k to set the minimum all ocation size for the file to span the distance between the 1 byte writes. This attribute can be inherited from the parent directory on create, so it's a set and forget type of thing... i.e. XFS has lots of ways to prevent perfromance from degrading on these sorts of issues. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
David Chinner wrote: On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? It's a Samba problem. Samba doesn't do async writes, which v3.0 should have fixed. Did you try that? As I commented on IRC to Val Henson - the XFS performance indicates that it is not a VFS or Samba problem. XFS somewhat hides the Samba problem, by efficiently syncing to disk. Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, 1 May 2007, Cabot, Mason B wrote: > Hello all, > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > video workloads. The Windows CIFS client will attempt a poor-man's > pre-allocation of the file on the server by sending 1-byte writes at > 128K-byte strides, breaking block allocation on ext3 and leading to > fragmentation and poor performance. This will happen for many > applications (including iTunes) as the CIFS client issues these > pre-allocates under the application layer. > > I've posted a brief paper on Intel's OSS website > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > it a read and let me know what you think. In particular, I'd like to > arrive at the right place to fix this problem: is it in the filesystem, > VFS, or Samba? > > thanks, > Mason > Just out of curiosity do other filesystems(reiser, xfs) take the same performance hit? Gerjard -- Gerhard Mack [EMAIL PROTECTED] <>< As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, 1 May 2007 13:43:18 -0700 "Cabot, Mason B" <[EMAIL PROTECTED]> wrote: > Hello all, > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > video workloads. The Windows CIFS client will attempt a poor-man's > pre-allocation of the file on the server by sending 1-byte writes at > 128K-byte strides, breaking block allocation on ext3 and leading to > fragmentation and poor performance. This will happen for many > applications (including iTunes) as the CIFS client issues these > pre-allocates under the application layer. Oh my gawd, what a stupid hack. Now we know what the MS interoperability lab has been working on. > I've posted a brief paper on Intel's OSS website > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > it a read and let me know what you think. In particular, I'd like to > arrive at the right place to fix this problem: is it in the filesystem, > VFS, or Samba? Conceivably we could address this in the filesystem without mucking other things up. But I'd have thought the simplest damage-control would be to detect this pattern in samba and to then use glibc's fallocate(). At present glibc will emulate fallocate() by writing zeroes. There are patches floating about to implement fallocate in-kernel and if/when that turns up and is supported in glibc, the modified samba will automatically start to use it. Are you sure there isn't some registry setting to prevent the CIFS client from doing the client-side preallocation? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Ext3 vs NTFS performance
Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? thanks, Mason (please CC responses to mason dot b dot cabot at intel dot com) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Ext3 vs NTFS performance
Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? thanks, Mason (please CC responses to mason dot b dot cabot at intel dot com) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, 1 May 2007 13:43:18 -0700 Cabot, Mason B [EMAIL PROTECTED] wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. Oh my gawd, what a stupid hack. Now we know what the MS interoperability lab has been working on. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? Conceivably we could address this in the filesystem without mucking other things up. But I'd have thought the simplest damage-control would be to detect this pattern in samba and to then use glibc's fallocate(). At present glibc will emulate fallocate() by writing zeroes. There are patches floating about to implement fallocate in-kernel and if/when that turns up and is supported in glibc, the modified samba will automatically start to use it. Are you sure there isn't some registry setting to prevent the CIFS client from doing the client-side preallocation? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Tue, 1 May 2007, Cabot, Mason B wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? thanks, Mason Just out of curiosity do other filesystems(reiser, xfs) take the same performance hit? Gerjard -- Gerhard Mack [EMAIL PROTECTED] As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/