Re: [Pvfs2-developers] TroveSyncData settings
On Wed, Nov 29, 2006 at 04:05:35PM -0600, Sam Lang wrote: > I had a note that we should change the default aio data-sync code to > only sync at the end of an IO request, instead of for each trove > operation (in FlowBufferSize chunks). Doing this at the end of an > io.sm seemed a little messy, but if/when we have request ids (hints) > being passed to the trove interface, we could use that as a way to > know to flush at the end. In any case, it sounds like its better to > flush early and often than at the end of a request? I have no data to back this up but it feels like flushing early and often will help only if you have a good disk subsystem. > From a user perspective, we usually tell people to enable data sync > if they're concerned about losing data. Now we're talking about > getting better performance with data sync enabled (at least in some > cases). > Does it make sense to sync even with data sync disabled if > we can figure out that better performance would result? I can't imagine a use case where someone would be upset if we were able to both deliver better performance and also sync data even if they didn't ask us to. If we can figure out what's best (maybe with a quick test when the server starts up?), then yes, sync even with sync disabled. ==rob -- Rob Latham Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Re: [Pvfs2-developers] TroveSyncData settings
On Nov 29, 2006, at 3:44 PM, Rob Ross wrote: That's what I was thinking -- that we could ask the I/O thread to do the syncing rather than stalling out other progress. Wanna try it and see if it helps :)? Rob Phil Carns wrote: No. Both alt aio and the normal dbpf method sync as a seperate step after the aio list operation completes. This is technically possible with alt aio, though- you would just need to pass a flag through to tell the I/O thread to sync after the pwrite(). That would probably be pretty helpful, so the trove worker thread doesn't get stuck waiting on the sync... -Phil Rob Ross wrote: This is similar to using O_DIRECT, which has also shown benefits. With alt aio, do we sync in the context of the I/O thread? Thanks, Rob Phil Carns wrote: One thing that we noticed while testing for storage challenge was that (and everyone correct me if I'm wrong here) enabling the data-sync causes a flush/sync to occur after every sizeof (FlowBuffer) bytes had been written. I can imagine how this would help a SAN, but I'm perplexed how it helps localdisk, what buffer size are you playing with? We found that unless we were using HUGE (~size of cache on storage controller) flowbuffers that this caused way too many syncs/seeks on the disks and hurt performance quite a bit, maybe even as bad as 50% performance because things were not being optimized for our disk subsystems and we were issuing many small ops instead of fewer large ones. Granted I havent been able to get 2.6.0 building properly yet to test the latest out, but this was definitely the case for us on the 2.5 releases. You are definitely right about the data sync option causing a flush/sync on every sizeof(FlowBuffer). I had a note that we should change the default aio data-sync code to only sync at the end of an IO request, instead of for each trove operation (in FlowBufferSize chunks). Doing this at the end of an io.sm seemed a little messy, but if/when we have request ids (hints) being passed to the trove interface, we could use that as a way to know to flush at the end. In any case, it sounds like its better to flush early and often than at the end of a request? From a user perspective, we usually tell people to enable data sync if they're concerned about losing data. Now we're talking about getting better performance with data sync enabled (at least in some cases). Does it make sense to sync even with data sync disabled if we can figure out that better performance would result? -sam I don't really have a good explanation for why this doesn't seem to burn us anymore on local disk. Our settings are standard, except for: - 512KB flow buffer size - alt aio method - 512KB tcp buffers (with larger /proc tcp settings) This testing was done on some version prior to 2.6.0 also (I think it was a merge of some in-between release, so it is hard to pin down a version number). It may also have something to do with the controller and local disks being used? All of our local disk configurations are actually hardware raid 5 with some variety of the megaraid controller, and these are fairly new boxes. -Phil ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2- developers ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Re: [Pvfs2-developers] TroveSyncData settings
That's what I was thinking -- that we could ask the I/O thread to do the syncing rather than stalling out other progress. Wanna try it and see if it helps :)? Rob Phil Carns wrote: No. Both alt aio and the normal dbpf method sync as a seperate step after the aio list operation completes. This is technically possible with alt aio, though- you would just need to pass a flag through to tell the I/O thread to sync after the pwrite(). That would probably be pretty helpful, so the trove worker thread doesn't get stuck waiting on the sync... -Phil Rob Ross wrote: This is similar to using O_DIRECT, which has also shown benefits. With alt aio, do we sync in the context of the I/O thread? Thanks, Rob Phil Carns wrote: One thing that we noticed while testing for storage challenge was that (and everyone correct me if I'm wrong here) enabling the data-sync causes a flush/sync to occur after every sizeof(FlowBuffer) bytes had been written. I can imagine how this would help a SAN, but I'm perplexed how it helps localdisk, what buffer size are you playing with? We found that unless we were using HUGE (~size of cache on storage controller) flowbuffers that this caused way too many syncs/seeks on the disks and hurt performance quite a bit, maybe even as bad as 50% performance because things were not being optimized for our disk subsystems and we were issuing many small ops instead of fewer large ones. Granted I havent been able to get 2.6.0 building properly yet to test the latest out, but this was definitely the case for us on the 2.5 releases. You are definitely right about the data sync option causing a flush/sync on every sizeof(FLowBuffer). I don't really have a good explanation for why this doesn't seem to burn us anymore on local disk. Our settings are standard, except for: - 512KB flow buffer size - alt aio method - 512KB tcp buffers (with larger /proc tcp settings) This testing was done on some version prior to 2.6.0 also (I think it was a merge of some in-between release, so it is hard to pin down a version number). It may also have something to do with the controller and local disks being used? All of our local disk configurations are actually hardware raid 5 with some variety of the megaraid controller, and these are fairly new boxes. -Phil ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Re: [Pvfs2-developers] TroveSyncData settings
No. Both alt aio and the normal dbpf method sync as a seperate step after the aio list operation completes. This is technically possible with alt aio, though- you would just need to pass a flag through to tell the I/O thread to sync after the pwrite(). That would probably be pretty helpful, so the trove worker thread doesn't get stuck waiting on the sync... -Phil Rob Ross wrote: This is similar to using O_DIRECT, which has also shown benefits. With alt aio, do we sync in the context of the I/O thread? Thanks, Rob Phil Carns wrote: One thing that we noticed while testing for storage challenge was that (and everyone correct me if I'm wrong here) enabling the data-sync causes a flush/sync to occur after every sizeof(FlowBuffer) bytes had been written. I can imagine how this would help a SAN, but I'm perplexed how it helps localdisk, what buffer size are you playing with? We found that unless we were using HUGE (~size of cache on storage controller) flowbuffers that this caused way too many syncs/seeks on the disks and hurt performance quite a bit, maybe even as bad as 50% performance because things were not being optimized for our disk subsystems and we were issuing many small ops instead of fewer large ones. Granted I havent been able to get 2.6.0 building properly yet to test the latest out, but this was definitely the case for us on the 2.5 releases. You are definitely right about the data sync option causing a flush/sync on every sizeof(FLowBuffer). I don't really have a good explanation for why this doesn't seem to burn us anymore on local disk. Our settings are standard, except for: - 512KB flow buffer size - alt aio method - 512KB tcp buffers (with larger /proc tcp settings) This testing was done on some version prior to 2.6.0 also (I think it was a merge of some in-between release, so it is hard to pin down a version number). It may also have something to do with the controller and local disks being used? All of our local disk configurations are actually hardware raid 5 with some variety of the megaraid controller, and these are fairly new boxes. -Phil ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Re: [Pvfs2-developers] TroveSyncData settings
This is similar to using O_DIRECT, which has also shown benefits. With alt aio, do we sync in the context of the I/O thread? Thanks, Rob Phil Carns wrote: One thing that we noticed while testing for storage challenge was that (and everyone correct me if I'm wrong here) enabling the data-sync causes a flush/sync to occur after every sizeof(FlowBuffer) bytes had been written. I can imagine how this would help a SAN, but I'm perplexed how it helps localdisk, what buffer size are you playing with? We found that unless we were using HUGE (~size of cache on storage controller) flowbuffers that this caused way too many syncs/seeks on the disks and hurt performance quite a bit, maybe even as bad as 50% performance because things were not being optimized for our disk subsystems and we were issuing many small ops instead of fewer large ones. Granted I havent been able to get 2.6.0 building properly yet to test the latest out, but this was definitely the case for us on the 2.5 releases. You are definitely right about the data sync option causing a flush/sync on every sizeof(FLowBuffer). I don't really have a good explanation for why this doesn't seem to burn us anymore on local disk. Our settings are standard, except for: - 512KB flow buffer size - alt aio method - 512KB tcp buffers (with larger /proc tcp settings) This testing was done on some version prior to 2.6.0 also (I think it was a merge of some in-between release, so it is hard to pin down a version number). It may also have something to do with the controller and local disks being used? All of our local disk configurations are actually hardware raid 5 with some variety of the megaraid controller, and these are fairly new boxes. -Phil ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Re: [Pvfs2-developers] TroveSyncData settings
One thing that we noticed while testing for storage challenge was that (and everyone correct me if I'm wrong here) enabling the data-sync causes a flush/sync to occur after every sizeof(FlowBuffer) bytes had been written. I can imagine how this would help a SAN, but I'm perplexed how it helps localdisk, what buffer size are you playing with? We found that unless we were using HUGE (~size of cache on storage controller) flowbuffers that this caused way too many syncs/seeks on the disks and hurt performance quite a bit, maybe even as bad as 50% performance because things were not being optimized for our disk subsystems and we were issuing many small ops instead of fewer large ones. Granted I havent been able to get 2.6.0 building properly yet to test the latest out, but this was definitely the case for us on the 2.5 releases. You are definitely right about the data sync option causing a flush/sync on every sizeof(FLowBuffer). I don't really have a good explanation for why this doesn't seem to burn us anymore on local disk. Our settings are standard, except for: - 512KB flow buffer size - alt aio method - 512KB tcp buffers (with larger /proc tcp settings) This testing was done on some version prior to 2.6.0 also (I think it was a merge of some in-between release, so it is hard to pin down a version number). It may also have something to do with the controller and local disks being used? All of our local disk configurations are actually hardware raid 5 with some variety of the megaraid controller, and these are fairly new boxes. -Phil ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
Re: [Pvfs2-developers] TroveSyncData settings
Phil Carns wrote: We recently ran some tests trying different sync settings in PVFS2. We ran into one pleasant surprise, although probably it is already obvious to others. Here is the setup: 12 clients 4 servers read/write test application, 100 MB operations, large files fibre channel SAN storage The test application is essentially the same as was used in the posting regarding kernel buffer sizes, although with different parameters in this environment. At any rate, to get to the point: with TroveSyncData=no (default settings): 173 MB/s with TroveSyncData=yes: 194 MB/s I think the issue is that if syncdata is turned off, then the buffer cache tends to get very full before it starts writing. This bursty behavior isn't doing the SAN any favors- it has a big cache on the back end and probably performs better with sustained writes that don't put so much sudden peak traffic on the HBA card. There are probably more sophisticated variations of this kind of tuning around (/proc vm settings, using direct io, etc.) but this is an easy config file change to get an extra 12% throughput. This setting is a little more unpredictable for local scsi disks- some combinations of application and node go faster but some go slower. Overall it seems better for our environment to just leave data syncing on for both SAN and local disk, but your mileage may vary. This is different from results that we have seen in the past (maybe a year ago or so) for local disk- it used to be a big penalty to sync every data operation. I'm not sure what exactly happened to change this (new dbpf design? alt-aio? better kernels?) but I'm not complaining :) One thing that we noticed while testing for storage challenge was that (and everyone correct me if I'm wrong here) enabling the data-sync causes a flush/sync to occur after every sizeof(FlowBuffer) bytes had been written. I can imagine how this would help a SAN, but I'm perplexed how it helps localdisk, what buffer size are you playing with? We found that unless we were using HUGE (~size of cache on storage controller) flowbuffers that this caused way too many syncs/seeks on the disks and hurt performance quite a bit, maybe even as bad as 50% performance because things were not being optimized for our disk subsystems and we were issuing many small ops instead of fewer large ones. Granted I havent been able to get 2.6.0 building properly yet to test the latest out, but this was definitely the case for us on the 2.5 releases. +=Kyle -Phil ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers !DSPAM:456db371123088992556831! -- Kyle Schochenmaier [EMAIL PROTECTED] Research Assistant, Dr. Brett Bode AmesLab - US Dept.Energy Scalable Computing Laboratory ___ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers