Re: [Engine-devel] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization

2013-03-13 Thread Saggi Mizrahi
I am completely against this.
It make the return value differ according to input which
is a big no no when talking about type safe APIs.

The only reason we have this problem is because there is this
thing against making multiple calls.

Just split it up.
getVmRuntimeStats() - transient things like mem and cpu%
getVmInformation() - (semi)static things like disk\networking layout etc.
Each updated at different intervals.

- Original Message -
> From: "Vinzenz Feenstra" 
> To: vdsm-de...@lists.fedorahosted.org, engine-devel@ovirt.org
> Sent: Thursday, March 7, 2013 6:25:54 AM
> Subject: [Engine-devel] Proposal VDSM <=> Engine Data Statistics Retrieval
> Optimization
> 
> 
> Please find the prettier version on the wiki:
> http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval
> 
> Proposal VDSM - Engine Data Statistics Retrieval
> VDSM <=> Engine data retrieval optimization
> Motivation:
> 
> 
> Currently the RHEVM engine is polling the a lot of data from VDSM
> every 15 seconds. This should be optimized and the amount of data
> requested should be more specific.
> 
> For each VM the data currently contains much more information than
> actually needed which blows up the size of the XML content quite
> big. We could optimize this by splitting the reply on the getVmStats
> based on the request of the engine into sections. For this reason
> Omer Frenkel and me have split up the data into parts based on their
> usage.
> 
> This data can and usually does change during the lifetime of the VM.
> Rarely Changed:
> 
> 
> This data is change not very frequent and it should be enough to
> update this only once in a while. Most commonly this data changes
> after changes made in the UI or after a migration of the VM to
> another Host. Status = Running acpiEnable = true vmType = kvm
> guestName = W864GUESTAGENTT displayType = qxl guestOs = Win 8
> kvmEnable = true # this should be constant and never changed
> pauseCode = NOERR monitorResponse = 0 session = Locked # unused
> netIfaces = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC',
> 'inet6':  ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'],
> 'hw': '00:1a:4a:22:3c:db'}] appsList = ['RHEV-Tools 3.2.4',
> 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 3.2.3', 'RHEV-Network64 3.2.2',
> 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 'RHEV-Balloon64
> 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 'RHEV-USB
> 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2'] pid = 11314
> guestIPs = 10.34.60.148 # duplicated info displayIp = 0 displayPort
> = 5902 displaySecurePort = 5903 username = user@W864GUESTAGENTT
> clientIp = lastLogin = 1361976900.67 Often Changed:
> 
> 
> This data is changed quite often however it is not necessary to
> update this data every 15 seconds. As this is cumulative data and
> reflects the current status, and it does not need to be snapshotted
> every 15 seconds to retrieve statistics. The data can be retrieved
> in much more generous time slices. (e.g. Every 5 minutes) network =
> {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped': '0',
> 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate': '0.0',
> 'txErrors': '0', 'state': 'unknown', 'speed': '100', 'name':
> 'vnet1'}} disksUsage = [{'path': 'c:\\', 'total': '64055406592',
> 'fs': 'NTFS', 'used': '19223846912'}, {'path': 'd:\\', 'total':
> '3490912256', 'fs': 'UDF', 'used': '3490912256'}] timeOffset = 14422
> elapsedTime = 68591 hash = 2335461227228498964 statsAge = 0.09 #
> unused Often Changed but unused
> 
> 
> This data does not seem to be used in the engine at all. It is not
> even used in the data warehouse. memoryStats = {'swap_out': '0',
> 'majflt': '0', 'mem_free': '1466884', 'swap_in': '0', 'pageflt':
> '0', 'mem_total': '2096736', 'mem_unused': '1466884'} balloonInfo =
> {'balloon_max': 2097152, 'balloon_cur': 2097152} disks = {'vda':
> {'readLatency': '0', 'apparentsize': '64424509440', 'writeLatency':
> '1754496','imageID': '28abb923-7b89-4638-84f8-1700f0b76482',
> 'flushLatency': '156549',  'readRate': '0.00', 'truesize':
> '18855059456', 'writeRate': '952.05'}, 'hdc': {'readLatency': '0',
> 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0',
> 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}} Very
> frequent uppdates needed by webadmin portal:
> 
> 
> This data is mostly needed for the webadmin portal and might be
> required to be updated quite often. An exception here is the
> statsAge field, which seems to be unused by the Engine. This data
> could be requested every 15 seconds to keep things as they are now.
> cpuSys = 2.32 cpuUser = 1.34 memUsage = 30 Proposed Solution for
> VDSM & Engine:
> 
> 
> We will introduce new optional parameters to getVmStats,
> getAllVmStats and list to allow a finer grained specification of
> data which should be included.
> 
> Parameter: statsType =  (getVmStats, getAllVmStats only)
> Allowed values:
> 
> * full (default to keep backwards compatibility)
> * app-list (Just send 

Re: [Engine-devel] Proposal VDSM <=> Engine Data Statistics Retrieval Optimization

2013-03-07 Thread Dan Kenigsberg
On Thu, Mar 07, 2013 at 12:25:54PM +0100, Vinzenz Feenstra wrote:
> Please find the prettier version on the wiki:
> http://www.ovirt.org/Proposal_VDSM_-_Engine_Data_Statistics_Retrieval
> 
> 
>  Proposal VDSM - Engine Data Statistics Retrieval
> 
> 
>VDSM <=> Engine data retrieval optimization
> 
> 
>  Motivation:
> 
> Currently the RHEVM engine is polling the a lot of data from VDSM
> every 15 seconds. This should be optimized and the amount of data
> requested should be more specific.

It feels like a good idea, but do you have numbers? How much traffic
would be saved? Remember the added computation incurred on each host -
there's always a price to pay.

> 
> For each VM the data currently contains much more information than
> actually needed which blows up the size of the XML content quite
> big. We could optimize this by splitting the reply on the getVmStats
> based on the request of the engine into sections. For this reason
> Omer Frenkel and me have split up the data into parts based on their
> usage.
> 
> This data can and usually does change during the lifetime of the VM.
> 
> 
>Rarely Changed:
> 
> This data is change not very frequent and it should be enough to
> update this only once in a while. Most commonly this data changes
> after changes made in the UI or after a migration of the VM to
> another Host.
> 
>*Status*  = Running

Status does not change much, but when it does, it is important to report
that quickly.

>*acpiEnable*  = true
>*vmType*  = kvm
>*guestName*  = W864GUESTAGENTT
>*displayType*  = qxl
>*guestOs*  = Win 8
>*kvmEnable*  = true #/*this should be constant and never changed*/
>*pauseCode*  = NOERR
>*monitorResponse*  = 0
>*session*  = Locked # unused
>*netIfaces*  = [{'name': 'Realtek RTL8139C+ Fast Ethernet NIC', 'inet6':  
> ['fe80::490c:92bb:bbcc:9f87'], 'inet': ['10.34.60.148'], 'hw': 
> '00:1a:4a:22:3c:db'}]
>*appsList*  = ['RHEV-Tools 3.2.4', 'RHEV-Agent64 3.2.3', 'RHEV-Serial64 
> 3.2.3', 'RHEV-Network64 3.2.2', 'RHEV-Network64 3.2.3', 'RHEV-Block64 3.2.3', 
> 'RHEV-Balloon64 3.2.3', 'RHEV-Balloon64 3.2.2', 'RHEV-Agent64 3.2.2', 
> 'RHEV-USB 3.2.3', 'RHEV-Block64 3.2.2', 'RHEV-Serial64 3.2.2']
>*pid*  = 11314
>*guestIPs*  = 10.34.60.148 # duplicated info
>*displayIp*  = 0
>*displayPort*  = 5902
>*displaySecurePort*  = 5903
>*username*  = user@W864GUESTAGENTT
>*clientIp*  =
>*lastLogin*  = 1361976900.67
> 
> 
>Often Changed:
> 
> This data is changed quite often however it is not necessary to
> update this data every 15 seconds. As this is cumulative data and
> reflects the current status, and it does not need to be snapshotted
> every 15 seconds to retrieve statistics. The data can be retrieved
> in much more generous time slices. (e.g. Every 5 minutes)
> 
>*network*  = {'vnet1': {'macAddr': '00:1a:4a:22:3c:db', 'rxDropped': '0', 
> 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate': '0.0', 
> 'txErrors': '0', 'state': 'unknown', 'speed': '100', 'name': 'vnet1'}}
>*disksUsage*  = [{'path': 'c:\\', 'total': '64055406592', 'fs': 'NTFS', 
> 'used': '19223846912'}, {'path': 'd:\\', 'total': '3490912256', 'fs': 'UDF', 
> 'used': '3490912256'}]
>*timeOffset*  = 14422
>*elapsedTime*  = 68591
>*hash*  = 2335461227228498964
>*statsAge*  = 0.09 # unused
> 
> 
>Often Changed but unused
> 
> This data does not seem to be used in the engine at all. It is *not*
> even used in the data warehouse.
> 
>*memoryStats*  = {'swap_out': '0', 'majflt': '0', 'mem_free': '1466884', 
> 'swap_in': '0', 'pageflt': '0', 'mem_total': '2096736', 'mem_unused': 
> '1466884'}
>*balloonInfo*  = {'balloon_max': 2097152, 'balloon_cur': 2097152}
>*disks*  = {'vda': {'readLatency': '0', 'apparentsize': '64424509440', 
> 'writeLatency': '1754496',  'imageID': 
> '28abb923-7b89-4638-84f8-1700f0b76482', 'flushLatency': '156549',  
> 'readRate': '0.00', 'truesize': '18855059456', 'writeRate': '952.05'}, 'hdc': 
> {'readLatency': '0', 'apparentsize': '0', 'writeLatency': '0', 
> 'flushLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeRate': 
> '0.00'}}

I am pretty sure that {read,write,flush}Latency is collected and
reported by Engine. `git grep writeLatency` reinforces my vague memory.
> 
> 
>Very frequent uppdates needed by webadmin portal:
> 
> This data is mostly needed for the webadmin portal and might be
> required to be updated quite often. An exception here is the
> statsAge field, which seems to be unused by the Engine. This data
> could be requested every 15 seconds to keep things as they are now.
> 
>*cpuSys*  = 2.32
>*cpuUser*  = 1.34
>*memUsage*  = 30
> 
> 
>Proposed Solution for VDSM & Engine:
> 
> We will introduce new optional parameters to getVmStats,
> getAllVmStats and list to allow a finer grained specification of
> data which should be included.
> 
> *Parameter:* *statsType*=/**/ (getVmStats, getAllVmStat