Re: [devel] checkpoint section create performance
Privyet Mihkail! I have the work done, but I don't have the patch file created yet. I was going to create a ticket, and attach the patch to it. I will try to do it today. I will let you know. Alex On 02/04/2014 09:24 AM, Domrachev, Mikhail wrote: > > Hi, Alex. > > Currently I'm fixing that issue too. You said that you going to > reimplement the cpnd database, so if it's done could you please share > the patch? > > Or we could fix that trouble together and divide work between us. What > do you think about it? > > Thanks. > > _ > > Mikhail Domrachev > > Software Engineer > > OpenEPC > -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] checkpoint section create performance
Well, except for the code in the agent library. That code must be written in C. regards, Anders Widell 2014-02-04 15:59, Anders Widell skrev: > Hi guys! > > Just to let you know: we are trying to move away from the patricia tree > code and use C++ STL instead. So if you already have implemented this > with an STL map, there is no need to port it to use the patricia tree. > > regards, > Anders Widell > > 2014-02-04 15:24, Domrachev, Mikhail skrev: >> Hi, Alex. >> >> Currently I'm fixing that issue too. You said that you going to reimplement >> the cpnd database, so if it's done could you please share the patch? >> Or we could fix that trouble together and divide work between us. What do >> you think about it? >> >> Thanks. >> _ >> Mikhail Domrachev >> Software Engineer >> OpenEPC >> >> -- >> Managing the Performance of Cloud-Based Applications >> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. >> Read the Whitepaper. >> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk >> ___ >> Opensaf-devel mailing list >> Opensaf-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >> >> > > -- > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk > ___ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] checkpoint section create performance
Hi guys! Just to let you know: we are trying to move away from the patricia tree code and use C++ STL instead. So if you already have implemented this with an STL map, there is no need to port it to use the patricia tree. regards, Anders Widell 2014-02-04 15:24, Domrachev, Mikhail skrev: > Hi, Alex. > > Currently I'm fixing that issue too. You said that you going to reimplement > the cpnd database, so if it's done could you please share the patch? > Or we could fix that trouble together and divide work between us. What do you > think about it? > > Thanks. > _ > Mikhail Domrachev > Software Engineer > OpenEPC > > -- > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk > ___ > Opensaf-devel mailing list > Opensaf-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/opensaf-devel > > -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] checkpoint section create performance
Hi, Alex. Currently I'm fixing that issue too. You said that you going to reimplement the cpnd database, so if it's done could you please share the patch? Or we could fix that trouble together and divide work between us. What do you think about it? Thanks. _ Mikhail Domrachev Software Engineer OpenEPC -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] checkpoint section create performance
AVM, I found another performance issue in the checkpoint subsystem. I was still not able to replicate 40k sections from 5 active blades to the 6th standby blade (blade 6 has 200k sections in 5 checkpoints). The writes on the active blades were fine, but the standby couldn't keep up. On the standby the CPU was pegged and all ckpt API functions were returning with SA_AIS_ERR_TIMEOUT, including ActiveReplicaSet, and CheckpointClose! So, I ran oprofile on the standby and found the following. 90% of the time was being spent in cpnd_ckpt_sec_get_create(). In this function it is clearly seen that the section database is implemented as a linked list. This is horribly expensive with large numbers of sections, especially when these sections are being replicated at high rates. As a proof of concept, I reimplemented the section database using the C++ STL map. This improved performance tenfold. With the following changes I can now easily replicate 40k sections each from 5 blades to the standby (200k sections being simultaneously replicated on the backup blade.) 1. Make the section create message asynchronous when SA_CKPT_WR_ACTIVE_REPLICA is specified. 2. Change the section database data structure from linked list to STL map. 3. Change MAX_SYNC_TRANSFER_SIZE in cpsv_evt.h from 30M to 3M. I'll reimplement the section database patch using the internal patricia tree code and post the patch, unless you feel there's a better way to do it. Alex On 01/14/2014 05:13 PM, Alex Jones wrote: > 3.7.1 saCkptSectionCreate() > "If the checkpoint was created with the SA_CKPT_WR_ALL_REPLICAS > property, the section is created in all of the checkpoint replicas > when the invocation returns; otherwise, the section has been created > at least in the active checkpoint replica when the invocation returns > and will be created asynchronously in the other checkpoint replicas." > > It looks like the implementation behaves like the checkpoint was > created with SA_CKPT_WR_ALL_REPLICAS, (even if > SA_CKPT_WR_ACTIVE_REPLICA or SA_CKPT_WR_ACTIVE_REPLICA_WEAK are > specified) for section creates. > > I've been digging into the code, and it looks like > "cpnd_evt_proc_ckpt_sect_create" sends a synchronous message to each > replica to create the section regardless of the property. > > I realize this is spec compliant, but performance could be greatly > enhanced for large numbers of sections when using the > SA_CKPT_WR_ACTIVE_REPLICA or SA_CKPT_WR_ACTIVE_REPLICA_WEAK property > by making this asynchronous. > > Any chance we can change this behaviour, and make it like > saCkptCheckpointWrite? > > Alex > > On 01/14/2014 03:16 PM, Alex Jones wrote: >> AVM, >> >> In my 5+1 setup, when I have the standby node open all the >> checkpoints and read from them, as well as open the hot-standby >> callback, the section creates done on the other active nodes can take >> a very long time. (For 40k sections, it can sometimes take over 2 >> minutes). >> >> Once the sections have been created, however, subsequent writes >> and overwrites are very fast. (Writing 1k data into 40k sections >> takes 10 seconds). >> >> But, if I don't open the checkpoints on the standby, the section >> creates on the active nodes are fast (about 22 seconds for 40k >> sections), and the write and overwrite performance is basically >> unchanged. >> >> This suggests that there is some kind of synchronous mechanism >> going on between replicas when creating sections. >> >> Can you explain why I am seeing this performance degradation when >> creating sections when a standby replica is opened, but there is no >> performance hit for writing and overwriting? >> >> Thanks! >> >> Alex >> > -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
Re: [devel] checkpoint section create performance
3.7.1 saCkptSectionCreate() "If the checkpoint was created with the SA_CKPT_WR_ALL_REPLICAS property, the section is created in all of the checkpoint replicas when the invocation returns; otherwise, the section has been created at least in the active checkpoint replica when the invocation returns and will be created asynchronously in the other checkpoint replicas." It looks like the implementation behaves like the checkpoint was created with SA_CKPT_WR_ALL_REPLICAS, (even if SA_CKPT_WR_ACTIVE_REPLICA or SA_CKPT_WR_ACTIVE_REPLICA_WEAK are specified) for section creates. I've been digging into the code, and it looks like "cpnd_evt_proc_ckpt_sect_create" sends a synchronous message to each replica to create the section regardless of the property. I realize this is spec compliant, but performance could be greatly enhanced for large numbers of sections when using the SA_CKPT_WR_ACTIVE_REPLICA or SA_CKPT_WR_ACTIVE_REPLICA_WEAK property by making this asynchronous. Any chance we can change this behaviour, and make it like saCkptCheckpointWrite? Alex On 01/14/2014 03:16 PM, Alex Jones wrote: > AVM, > > In my 5+1 setup, when I have the standby node open all the > checkpoints and read from them, as well as open the hot-standby > callback, the section creates done on the other active nodes can take > a very long time. (For 40k sections, it can sometimes take over 2 > minutes). > > Once the sections have been created, however, subsequent writes > and overwrites are very fast. (Writing 1k data into 40k sections > takes 10 seconds). > > But, if I don't open the checkpoints on the standby, the section > creates on the active nodes are fast (about 22 seconds for 40k > sections), and the write and overwrite performance is basically > unchanged. > > This suggests that there is some kind of synchronous mechanism > going on between replicas when creating sections. > > Can you explain why I am seeing this performance degradation when > creating sections when a standby replica is opened, but there is no > performance hit for writing and overwriting? > > Thanks! > > Alex > -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel
[devel] checkpoint section create performance
AVM, In my 5+1 setup, when I have the standby node open all the checkpoints and read from them, as well as open the hot-standby callback, the section creates done on the other active nodes can take a very long time. (For 40k sections, it can sometimes take over 2 minutes). Once the sections have been created, however, subsequent writes and overwrites are very fast. (Writing 1k data into 40k sections takes 10 seconds). But, if I don't open the checkpoints on the standby, the section creates on the active nodes are fast (about 22 seconds for 40k sections), and the write and overwrite performance is basically unchanged. This suggests that there is some kind of synchronous mechanism going on between replicas when creating sections. Can you explain why I am seeing this performance degradation when creating sections when a standby replica is opened, but there is no performance hit for writing and overwriting? Thanks! Alex -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk ___ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel