after yesterday's discussion, I decided to do a basic benchmark how scalable 
the method is - to answer the question "is this going to be a bottleneck - yes 
or no". 

This is back-of-envelope, worst case analysis, without _any_ attempt at 
optimisation. (such as could be: using faster protobuf encoding methods than 
per-scalar, substituting the member names by ID references, grouping into 
submessages with different reporting interval needs and so forth).


what size problem?
------------------

Assume we were to report EMC_STAT contents through HAL messaging. 

Assuming part of the reporting effort comes from per-value serialisation 
overhead of EMC_STAT members, we need an estimate for the number of scalars.

Looking at EMC_STAT with the following assumptions:
- just counting scalars
- leaving out strings (these occur only in EMC_TASK_STAT which doesnt originate 
in HAL!)
- multiply each EmcPose field by 9, assuming it's unrolled into x,y,z etc 
discrete scalars
- unroll arrays
- leave out the tool table altogether
- assume current array size build parameters (NUM_DIO, NUM_AIO, MAX_AXES etc)

EMC_STAT contains:
   EMC_TASK_STAT = total of 73 scalars
   EMC_IO_STAT =  8 scalars (plus tool table which will not live there in the 
future)
   EMC_MOTION_STAT = total 508 scalars.  NB: of these 256 are digital/analog 
I/O pins; by default only 4 of 64 possible are exported in HAL though. 

in total: 589 scalars.

Test:
-----

I created a fake hal group of 500 float signals, and ran that through the 
halreport component, which is the single CPU user in the scheme.

run variants:
- running at 100 and 1000 reports/sec
- running without/with protobuf encoding and ZeroMQ message sending (to gauge 
encoding/transmit cost)

results:

- on Intel Core2, at 100 reports/sec this creates about 3% load on one core 
with encoding/sending, and about 1-2% without encoding/sending values.
- increasing reporting frequency to 1000/sec (which is absurdly high) raises 
this to 18% CPU with encoding. Connecting a client raises this by 1-2%.
- removing protobuf encoding and sending the ZeroMQ publish message drops this 
to about 11% at 1000/sec.

Actually, at 1000/sec optimization is needed at the client side, not the 
publisher side - the simple client I referred to uses the (notoriously slow) 
Python protobuf decoding, even though a compiled-C extension mapping is already 
supported, which I didnt build yet. The client maxes out a core at 1000/second 
unoptimized.

---

Summary: this is entirely feasible even with a completely dumb brute-force 
approach to mapping EMC_STAT.

- Michael





------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Emc-developers mailing list
Emc-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to