Re: [Pvfs2-developers] TroveSyncData settings

2006-12-04 Thread Robert Latham
On Wed, Nov 29, 2006 at 04:05:35PM -0600, Sam Lang wrote:
> I had a note that we should change the default aio data-sync code to  
> only sync at the end of an IO request, instead of for each trove  
> operation (in FlowBufferSize chunks).  Doing this at the end of an  
> io.sm seemed a little messy, but if/when we have request ids (hints)  
> being passed to the trove interface, we could use that as a way to  
> know to flush at the end.  In any case, it sounds like its better to  
> flush early and often than at the end of a request?

I have no data to back this up but it feels like flushing early and
often will help only if you have a good disk subsystem. 

> From a user perspective, we usually tell people to enable data sync  
> if they're concerned about losing data.  Now we're talking about  
> getting better performance with data sync enabled (at least in some  
> cases).  

> Does it make sense to sync even with data sync disabled if  
> we can figure out that better performance would result?

I can't imagine a use case where someone would be upset if we were
able to both deliver better performance and also sync data even if
they didn't ask us to.   If we can figure out what's best (maybe with
a quick test when the server starts up?), then yes,
sync even with sync disabled.  

==rob
 
-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] TroveSyncData settings

2006-11-29 Thread Sam Lang


On Nov 29, 2006, at 3:44 PM, Rob Ross wrote:

That's what I was thinking -- that we could ask the I/O thread to  
do the syncing rather than stalling out other progress.


Wanna try it and see if it helps :)?

Rob

Phil Carns wrote:
No.  Both alt aio and the normal dbpf method sync as a seperate  
step after the aio list operation completes.
This is technically possible with alt aio, though- you would just  
need to pass a flag through to tell the I/O thread to sync after  
the pwrite().  That would probably be pretty helpful, so the trove  
worker thread doesn't get stuck waiting on the sync...

-Phil
Rob Ross wrote:

This is similar to using O_DIRECT, which has also shown benefits.

With alt aio, do we sync in the context of the I/O thread?

Thanks,

Rob

Phil Carns wrote:



One thing that we noticed while testing for storage challenge  
was that (and everyone correct me if I'm wrong here) enabling  
the data-sync causes a flush/sync to occur after every sizeof 
(FlowBuffer) bytes had been written.  I can imagine how this  
would help a SAN, but I'm perplexed how it helps localdisk,  
what buffer size are you playing with?
We found that unless we were using HUGE (~size of cache on  
storage controller) flowbuffers that this caused way too many  
syncs/seeks on the disks and hurt performance quite a bit,  
maybe even as bad as 50% performance because things were not  
being optimized for our disk subsystems and we were issuing  
many small ops instead of fewer large ones.


Granted I havent been able to get 2.6.0 building properly yet  
to test the latest out, but this was definitely the case for us  
on the 2.5 releases.



You are definitely right about the data sync option causing a  
flush/sync on every sizeof(FlowBuffer).


I had a note that we should change the default aio data-sync code to  
only sync at the end of an IO request, instead of for each trove  
operation (in FlowBufferSize chunks).  Doing this at the end of an  
io.sm seemed a little messy, but if/when we have request ids (hints)  
being passed to the trove interface, we could use that as a way to  
know to flush at the end.  In any case, it sounds like its better to  
flush early and often than at the end of a request?


From a user perspective, we usually tell people to enable data sync  
if they're concerned about losing data.  Now we're talking about  
getting better performance with data sync enabled (at least in some  
cases).  Does it make sense to sync even with data sync disabled if  
we can figure out that better performance would result?


-sam

  I don't really have a good explanation for why this doesn't  
seem to burn us anymore on local disk.  Our settings are  
standard, except for:


- 512KB flow buffer size
- alt aio method
- 512KB tcp buffers (with larger /proc tcp settings)

This testing was done on some version prior to 2.6.0 also (I  
think it was a merge of some in-between release, so it is hard  
to pin down a version number).


It may also have something to do with the controller and local  
disks being used?  All of our local disk configurations are  
actually hardware raid 5 with some variety of the megaraid  
controller, and these are fairly new boxes.


-Phil

___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2- 
developers



___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] TroveSyncData settings

2006-11-29 Thread Rob Ross
That's what I was thinking -- that we could ask the I/O thread to do the 
syncing rather than stalling out other progress.


Wanna try it and see if it helps :)?

Rob

Phil Carns wrote:
No.  Both alt aio and the normal dbpf method sync as a seperate step 
after the aio list operation completes.


This is technically possible with alt aio, though- you would just need 
to pass a flag through to tell the I/O thread to sync after the 
pwrite().  That would probably be pretty helpful, so the trove worker 
thread doesn't get stuck waiting on the sync...


-Phil


Rob Ross wrote:

This is similar to using O_DIRECT, which has also shown benefits.

With alt aio, do we sync in the context of the I/O thread?

Thanks,

Rob

Phil Carns wrote:



One thing that we noticed while testing for storage challenge was 
that (and everyone correct me if I'm wrong here) enabling the 
data-sync causes a flush/sync to occur after every 
sizeof(FlowBuffer) bytes had been written.  I can imagine how this 
would help a SAN, but I'm perplexed how it helps localdisk, what 
buffer size are you playing with?
We found that unless we were using HUGE (~size of cache on storage 
controller) flowbuffers that this caused way too many syncs/seeks on 
the disks and hurt performance quite a bit, maybe even as bad as 50% 
performance because things were not being optimized for our disk 
subsystems and we were issuing many small ops instead of fewer large 
ones.


Granted I havent been able to get 2.6.0 building properly yet to 
test the latest out, but this was definitely the case for us on the 
2.5 releases.



You are definitely right about the data sync option causing a 
flush/sync on every sizeof(FLowBuffer).  I don't really have a good 
explanation for why this doesn't seem to burn us anymore on local 
disk.  Our settings are standard, except for:


- 512KB flow buffer size
- alt aio method
- 512KB tcp buffers (with larger /proc tcp settings)

This testing was done on some version prior to 2.6.0 also (I think it 
was a merge of some in-between release, so it is hard to pin down a 
version number).


It may also have something to do with the controller and local disks 
being used?  All of our local disk configurations are actually 
hardware raid 5 with some variety of the megaraid controller, and 
these are fairly new boxes.


-Phil

___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers




___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] TroveSyncData settings

2006-11-29 Thread Phil Carns
No.  Both alt aio and the normal dbpf method sync as a seperate step 
after the aio list operation completes.


This is technically possible with alt aio, though- you would just need 
to pass a flag through to tell the I/O thread to sync after the 
pwrite().  That would probably be pretty helpful, so the trove worker 
thread doesn't get stuck waiting on the sync...


-Phil


Rob Ross wrote:

This is similar to using O_DIRECT, which has also shown benefits.

With alt aio, do we sync in the context of the I/O thread?

Thanks,

Rob

Phil Carns wrote:



One thing that we noticed while testing for storage challenge was 
that (and everyone correct me if I'm wrong here) enabling the 
data-sync causes a flush/sync to occur after every sizeof(FlowBuffer) 
bytes had been written.  I can imagine how this would help a SAN, but 
I'm perplexed how it helps localdisk, what buffer size are you 
playing with?
We found that unless we were using HUGE (~size of cache on storage 
controller) flowbuffers that this caused way too many syncs/seeks on 
the disks and hurt performance quite a bit, maybe even as bad as 50% 
performance because things were not being optimized for our disk 
subsystems and we were issuing many small ops instead of fewer large 
ones.


Granted I havent been able to get 2.6.0 building properly yet to test 
the latest out, but this was definitely the case for us on the 2.5 
releases.



You are definitely right about the data sync option causing a 
flush/sync on every sizeof(FLowBuffer).  I don't really have a good 
explanation for why this doesn't seem to burn us anymore on local 
disk.  Our settings are standard, except for:


- 512KB flow buffer size
- alt aio method
- 512KB tcp buffers (with larger /proc tcp settings)

This testing was done on some version prior to 2.6.0 also (I think it 
was a merge of some in-between release, so it is hard to pin down a 
version number).


It may also have something to do with the controller and local disks 
being used?  All of our local disk configurations are actually 
hardware raid 5 with some variety of the megaraid controller, and 
these are fairly new boxes.


-Phil

___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] TroveSyncData settings

2006-11-29 Thread Rob Ross

This is similar to using O_DIRECT, which has also shown benefits.

With alt aio, do we sync in the context of the I/O thread?

Thanks,

Rob

Phil Carns wrote:


One thing that we noticed while testing for storage challenge was that 
(and everyone correct me if I'm wrong here) enabling the data-sync 
causes a flush/sync to occur after every sizeof(FlowBuffer) bytes had 
been written.  I can imagine how this would help a SAN, but I'm 
perplexed how it helps localdisk, what buffer size are you playing with?
We found that unless we were using HUGE (~size of cache on storage 
controller) flowbuffers that this caused way too many syncs/seeks on 
the disks and hurt performance quite a bit, maybe even as bad as 50% 
performance because things were not being optimized for our disk 
subsystems and we were issuing many small ops instead of fewer large 
ones.


Granted I havent been able to get 2.6.0 building properly yet to test 
the latest out, but this was definitely the case for us on the 2.5 
releases.


You are definitely right about the data sync option causing a flush/sync 
on every sizeof(FLowBuffer).  I don't really have a good explanation for 
why this doesn't seem to burn us anymore on local disk.  Our settings 
are standard, except for:


- 512KB flow buffer size
- alt aio method
- 512KB tcp buffers (with larger /proc tcp settings)

This testing was done on some version prior to 2.6.0 also (I think it 
was a merge of some in-between release, so it is hard to pin down a 
version number).


It may also have something to do with the controller and local disks 
being used?  All of our local disk configurations are actually hardware 
raid 5 with some variety of the megaraid controller, and these are 
fairly new boxes.


-Phil

___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] TroveSyncData settings

2006-11-29 Thread Phil Carns


One thing that we noticed while testing for storage challenge was that 
(and everyone correct me if I'm wrong here) enabling the data-sync 
causes a flush/sync to occur after every sizeof(FlowBuffer) bytes had 
been written.  I can imagine how this would help a SAN, but I'm 
perplexed how it helps localdisk, what buffer size are you playing with?
We found that unless we were using HUGE (~size of cache on storage 
controller) flowbuffers that this caused way too many syncs/seeks on the 
disks and hurt performance quite a bit, maybe even as bad as 50% 
performance because things were not being optimized for our disk 
subsystems and we were issuing many small ops instead of fewer large ones.


Granted I havent been able to get 2.6.0 building properly yet to test 
the latest out, but this was definitely the case for us on the 2.5 
releases.


You are definitely right about the data sync option causing a flush/sync 
on every sizeof(FLowBuffer).  I don't really have a good explanation for 
why this doesn't seem to burn us anymore on local disk.  Our settings 
are standard, except for:


- 512KB flow buffer size
- alt aio method
- 512KB tcp buffers (with larger /proc tcp settings)

This testing was done on some version prior to 2.6.0 also (I think it 
was a merge of some in-between release, so it is hard to pin down a 
version number).


It may also have something to do with the controller and local disks 
being used?  All of our local disk configurations are actually hardware 
raid 5 with some variety of the megaraid controller, and these are 
fairly new boxes.


-Phil

___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] TroveSyncData settings

2006-11-29 Thread Kyle Schochenmaier

Phil Carns wrote:
We recently ran some tests trying different sync settings in PVFS2.  
We ran into one pleasant surprise, although probably it is already 
obvious to others.  Here is the setup:


12 clients
4 servers
read/write test application, 100 MB operations, large files
fibre channel SAN storage

The test application is essentially the same as was used in the 
posting regarding kernel buffer sizes, although with different 
parameters in this environment.


At any rate, to get to the point:

with TroveSyncData=no (default settings): 173 MB/s
with TroveSyncData=yes: 194 MB/s

I think the issue is that if syncdata is turned off, then the buffer 
cache tends to get very full before it starts writing.  This bursty 
behavior isn't doing the SAN any favors- it has a big cache on the 
back end and probably performs better with sustained writes that don't 
put so much sudden peak traffic on the HBA card.


There are probably more sophisticated variations of this kind of 
tuning around (/proc vm settings, using direct io, etc.) but this is 
an easy config file change to get an extra 12% throughput.


This setting is a little more unpredictable for local scsi disks- some 
combinations of application and node go faster but some go slower. 
Overall it seems better for our environment to just leave data syncing 
on for both SAN and local disk, but your mileage may vary.


This is different from results that we have seen in the past (maybe a 
year ago or so) for local disk- it used to be a big penalty to sync 
every data operation.  I'm not sure what exactly happened to change 
this (new dbpf design?  alt-aio?  better kernels?) but I'm not 
complaining :)


One thing that we noticed while testing for storage challenge was that 
(and everyone correct me if I'm wrong here) enabling the data-sync 
causes a flush/sync to occur after every sizeof(FlowBuffer) bytes had 
been written.  I can imagine how this would help a SAN, but I'm 
perplexed how it helps localdisk, what buffer size are you playing with? 

We found that unless we were using HUGE (~size of cache on storage 
controller) flowbuffers that this caused way too many syncs/seeks on the 
disks and hurt performance quite a bit, maybe even as bad as 50% 
performance because things were not being optimized for our disk 
subsystems and we were issuing many small ops instead of fewer large ones.


Granted I havent been able to get 2.6.0 building properly yet to test 
the latest out, but this was definitely the case for us on the 2.5 releases.


+=Kyle

-Phil
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

!DSPAM:456db371123088992556831!




--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory 


___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers