Re: [devel] checkpoint section create performance

2014-02-04 Thread Alex Jones
Privyet Mihkail!

I have the work done, but I don't have the patch file created yet. I was 
going to create a ticket, and attach the patch to it.

I will try to do it today.

I will let you know.

Alex

On 02/04/2014 09:24 AM, Domrachev, Mikhail wrote:
>
> Hi, Alex.
>
> Currently I'm fixing that issue too. You said that you going to 
> reimplement the cpnd database, so if it's done could you please share 
> the patch?
>
> Or we could fix that trouble together and divide work between us. What 
> do you think about it?
>
> Thanks.
>
> _
>
> Mikhail Domrachev
>
> Software Engineer
>
> OpenEPC
>

--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] checkpoint section create performance

2014-02-04 Thread Anders Widell
Well, except for the code in the agent library. That code must be 
written in C.

regards,
Anders Widell

2014-02-04 15:59, Anders Widell skrev:
> Hi guys!
>
> Just to let you know: we are trying to move away from the patricia tree
> code and use C++ STL instead. So if you already have implemented this
> with an STL map, there is no need to port it to use the patricia tree.
>
> regards,
> Anders Widell
>
> 2014-02-04 15:24, Domrachev, Mikhail skrev:
>> Hi, Alex.
>>
>> Currently I'm fixing that issue too. You said that you going to reimplement 
>> the cpnd database, so if it's done could you please share the patch?
>> Or we could fix that trouble together and divide work between us. What do 
>> you think about it?
>>
>> Thanks.
>> _
>> Mikhail Domrachev
>> Software Engineer
>> OpenEPC
>>
>> --
>> Managing the Performance of Cloud-Based Applications
>> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
>> Read the Whitepaper.
>> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
>> ___
>> Opensaf-devel mailing list
>> Opensaf-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>
>>
>
> --
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
> ___
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>
>


--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] checkpoint section create performance

2014-02-04 Thread Anders Widell
Hi guys!

Just to let you know: we are trying to move away from the patricia tree 
code and use C++ STL instead. So if you already have implemented this 
with an STL map, there is no need to port it to use the patricia tree.

regards,
Anders Widell

2014-02-04 15:24, Domrachev, Mikhail skrev:
> Hi, Alex.
>
> Currently I'm fixing that issue too. You said that you going to reimplement 
> the cpnd database, so if it's done could you please share the patch?
> Or we could fix that trouble together and divide work between us. What do you 
> think about it?
>
> Thanks.
> _
> Mikhail Domrachev
> Software Engineer
> OpenEPC
>
> --
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
> ___
> Opensaf-devel mailing list
> Opensaf-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>
>


--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] checkpoint section create performance

2014-02-04 Thread Domrachev, Mikhail
Hi, Alex.

Currently I'm fixing that issue too. You said that you going to reimplement the 
cpnd database, so if it's done could you please share the patch?
Or we could fix that trouble together and divide work between us. What do you 
think about it?

Thanks.
_
Mikhail Domrachev
Software Engineer
OpenEPC

--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] checkpoint section create performance

2014-01-17 Thread Alex Jones
AVM,

 I found another performance issue in the checkpoint subsystem.

 I was still not able to replicate 40k sections from 5 active blades 
to the 6th standby blade (blade 6 has 200k sections in 5 checkpoints).  
The writes on the active blades were fine, but the standby couldn't keep 
up.  On the standby the CPU was pegged and all ckpt API functions were 
returning with SA_AIS_ERR_TIMEOUT, including ActiveReplicaSet, and 
CheckpointClose!

 So, I ran oprofile on the standby and found the following.  90% of 
the time was being spent in cpnd_ckpt_sec_get_create().  In this 
function it is clearly seen that the section database is implemented as 
a linked list.  This is horribly expensive with large numbers of 
sections, especially when these sections are being replicated at high rates.

 As a proof of concept, I reimplemented the section database using 
the C++ STL map.  This improved performance tenfold.

 With the following changes I can now easily replicate 40k sections 
each from 5 blades to the standby (200k sections being simultaneously 
replicated on the backup blade.)

 1. Make the section create message asynchronous when
SA_CKPT_WR_ACTIVE_REPLICA is specified.
 2. Change the section database data structure from linked list to STL map.
 3. Change MAX_SYNC_TRANSFER_SIZE in cpsv_evt.h from 30M to 3M.

 I'll reimplement the section database patch using the internal 
patricia tree code and post the patch, unless you feel there's a better 
way to do it.


Alex

On 01/14/2014 05:13 PM, Alex Jones wrote:
> 3.7.1 saCkptSectionCreate()
> "If the checkpoint was created with the SA_CKPT_WR_ALL_REPLICAS 
> property, the section is created in all of the checkpoint replicas 
> when the invocation returns; otherwise, the section has been created 
> at least in the active checkpoint replica when the invocation returns 
> and will be created asynchronously in the other checkpoint replicas."
>
> It looks like the implementation behaves like the checkpoint was 
> created with SA_CKPT_WR_ALL_REPLICAS, (even if 
> SA_CKPT_WR_ACTIVE_REPLICA or SA_CKPT_WR_ACTIVE_REPLICA_WEAK are 
> specified) for section creates.
>
> I've been digging into the code, and it looks like 
> "cpnd_evt_proc_ckpt_sect_create" sends a synchronous message to each 
> replica to create the section regardless of the property.
>
> I realize this is spec compliant, but performance could be greatly 
> enhanced for large numbers of sections when using the 
> SA_CKPT_WR_ACTIVE_REPLICA or SA_CKPT_WR_ACTIVE_REPLICA_WEAK property 
> by making this asynchronous.
>
> Any chance we can change this behaviour, and make it like 
> saCkptCheckpointWrite?
>
> Alex
>
> On 01/14/2014 03:16 PM, Alex Jones wrote:
>> AVM,
>>
>> In my 5+1 setup, when I have the standby node open all the 
>> checkpoints and read from them, as well as open the hot-standby 
>> callback, the section creates done on the other active nodes can take 
>> a very long time.  (For 40k sections, it can sometimes take over 2 
>> minutes).
>>
>> Once the sections have been created, however, subsequent writes 
>> and overwrites are very fast.  (Writing 1k data into 40k sections 
>> takes 10 seconds).
>>
>> But, if I don't open the checkpoints on the standby, the section 
>> creates on the active nodes are fast (about 22 seconds for 40k 
>> sections), and the write and overwrite performance is basically 
>> unchanged.
>>
>> This suggests that there is some kind of synchronous mechanism 
>> going on between replicas when creating sections.
>>
>> Can you explain why I am seeing this performance degradation when 
>> creating sections when a standby replica is opened, but there is no 
>> performance hit for writing and overwriting?
>>
>> Thanks!
>>
>> Alex
>>
>

--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] checkpoint section create performance

2014-01-14 Thread Alex Jones
3.7.1 saCkptSectionCreate()
"If the checkpoint was created with the SA_CKPT_WR_ALL_REPLICAS 
property, the section is created in all of the checkpoint replicas when 
the invocation returns; otherwise, the section has been created at least 
in the active checkpoint replica when the invocation returns and will be 
created asynchronously in the other checkpoint replicas."

It looks like the implementation behaves like the checkpoint was created 
with SA_CKPT_WR_ALL_REPLICAS, (even if SA_CKPT_WR_ACTIVE_REPLICA or 
SA_CKPT_WR_ACTIVE_REPLICA_WEAK are specified) for section creates.

I've been digging into the code, and it looks like 
"cpnd_evt_proc_ckpt_sect_create" sends a synchronous message to each 
replica to create the section regardless of the property.

I realize this is spec compliant, but performance could be greatly 
enhanced for large numbers of sections when using the 
SA_CKPT_WR_ACTIVE_REPLICA or SA_CKPT_WR_ACTIVE_REPLICA_WEAK property by 
making this asynchronous.

Any chance we can change this behaviour, and make it like 
saCkptCheckpointWrite?

Alex

On 01/14/2014 03:16 PM, Alex Jones wrote:
> AVM,
>
> In my 5+1 setup, when I have the standby node open all the 
> checkpoints and read from them, as well as open the hot-standby 
> callback, the section creates done on the other active nodes can take 
> a very long time.  (For 40k sections, it can sometimes take over 2 
> minutes).
>
> Once the sections have been created, however, subsequent writes 
> and overwrites are very fast.  (Writing 1k data into 40k sections 
> takes 10 seconds).
>
> But, if I don't open the checkpoints on the standby, the section 
> creates on the active nodes are fast (about 22 seconds for 40k 
> sections), and the write and overwrite performance is basically 
> unchanged.
>
> This suggests that there is some kind of synchronous mechanism 
> going on between replicas when creating sections.
>
> Can you explain why I am seeing this performance degradation when 
> creating sections when a standby replica is opened, but there is no 
> performance hit for writing and overwriting?
>
> Thanks!
>
> Alex
>



--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] checkpoint section create performance

2014-01-14 Thread Alex Jones
AVM,

 In my 5+1 setup, when I have the standby node open all the 
checkpoints and read from them, as well as open the hot-standby 
callback, the section creates done on the other active nodes can take a 
very long time.  (For 40k sections, it can sometimes take over 2 minutes).

 Once the sections have been created, however, subsequent writes and 
overwrites are very fast.  (Writing 1k data into 40k sections takes 10 
seconds).

 But, if I don't open the checkpoints on the standby, the section 
creates on the active nodes are fast (about 22 seconds for 40k 
sections), and the write and overwrite performance is basically unchanged.

 This suggests that there is some kind of synchronous mechanism 
going on between replicas when creating sections.

 Can you explain why I am seeing this performance degradation when 
creating sections when a standby replica is opened, but there is no 
performance hit for writing and overwriting?

 Thanks!

Alex



--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel