Re: [Openstack] [metering] resources metadata

2012-05-16 Thread Julien Danjou
On Wed, May 16 2012, Loic Dachary wrote:

 It makes sense and I updated the wiki accordingly:

 http://wiki.openstack.org/EfficientMetering?action=diffrev2=81rev1=80

 What do you think ?

I think we can remove the payload field, since it's stored in
resource_metadata.

-- 
Julien Danjou
// eNovance  http://enovance.com
// ✉ julien.dan...@enovance.com  ☎ +33 1 49 70 99 81

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [metering] resources metadata

2012-05-15 Thread Julien Danjou
On Mon, May 14 2012, Loic Dachary wrote:

 Each set of metering data will need to be associated with the appropriate
 metadata from the resource at the time the metering information was
 collected. The rate of change of metadata and metering events are
 different, though, so the timestamps of the metadata records are unlikely
 to match exactly with the values in the metering records. Depending on the
 clock resolution, it would be possible to have metadata changes and meter
 data with the same timestamp, resulting in an incorrect association.
 Indeed, good point.

 We can work around that by maintaining proper foreign key references using
 the metadata version field as you describe in the schema above (so the
 resource id and metadata version value point to the correct metadata
 record). It will make recording the metering data less efficient because
 we will need to determine the current version for the resource metadata,
 but we can optimize that eventually through indexes and caching.

 Aggregation will also need to take the metadata version into account, so 
 everywhere in the list of queries we say by resource_id we need to change 
 that to by resource_id and version.
 I added the idea of a format version for when the payload format changes and 
 tried to write down a description of the metadata storage matching this 
 thread in the wiki.

 http://wiki.openstack.org/EfficientMetering?action=diffrev2=80rev1=78

 What do you think ?

I'm jumping in a bit late in the discussion, but there may be a point I
miss in the current definition because, I think it's getting too
complicated.

We now have 2 payload fields: one for meter and one for metadata.

For example, if you look at the c1 counter (instance) you need to store
the type as payload of the meter. This is a metadata of the instance,
but it's not currently defined as being stored in metadata, but in the
payload field of the meter.
Moreover, I'm rather sure there will soon be a counter with the need of
2 different payload information, and we'll have a problem since we can
only store one in the current meter schema, so we'll store the second
one as a metadata or something. So clearly the initial payload
solution is not enough.

OTOH I find the metadata proposal in another table too much
complicated. Why not storing what metadata in the meter.payload field
in the same table (e.g. as a JSON string)?

I miss the point of the introduction of a dedicated metadata table with
version string. It sounds to me like early optimization, which is the
root of all evil. :) But I might miss something.

-- 
Julien Danjou
// eNovance  http://enovance.com
// ✉ julien.dan...@enovance.com  ☎ +33 1 49 70 99 81

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [metering] resources metadata

2012-05-15 Thread Loic Dachary
On 05/15/2012 12:05 PM, Julien Danjou wrote:

 OTOH I find the metadata proposal in another table too much
 complicated. Why not storing what metadata in the meter.payload field
 in the same table (e.g. as a JSON string)?
I would be much simpler to store the metadata in the resource_id field which 
could be renamed into resource field.
Instead of resource_id=134123 we could have resource={ 'id': 134123, 'name': 
'foobar', 'flavor': 'm1.small' etc.. }
There would be no need for versioning, format, separate table, etc. etc. The 
only convention would be that it's a hash with at least one field : the id of 
the resource. The rest is metadata.

It will use a lot of disk space with highly redundant information.

Cheers

-- 
Loïc Dachary Chief Research Officer
// eNovance labs   http://labs.enovance.com
// ✉ l...@enovance.com  ☎ +33 1 49 70 99 82


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [metering] resources metadata

2012-05-15 Thread Julien Danjou
On Tue, May 15 2012, Loic Dachary wrote:

 On 05/15/2012 12:05 PM, Julien Danjou wrote:

 OTOH I find the metadata proposal in another table too much
 complicated. Why not storing what metadata in the meter.payload field
 in the same table (e.g. as a JSON string)?
 I would be much simpler to store the metadata in the resource_id field
 which could be renamed into resource field.

That'd be even more radical.

 Instead of resource_id=134123 we could have resource={ 'id': 134123,
 'name': 'foobar', 'flavor': 'm1.small' etc.. } There would be no need
 for versioning, format, separate table, etc. etc. The only convention
 would be that it's a hash with at least one field : the id of the
 resource. The rest is metadata.

 It will use a lot of disk space with highly redundant information.

Ok, so the current proposal is just early optimization, as I understood.

If you want to optimize the storage, why not use resource_id as a
foreign key to the metatable table which would contains unique records
of metadata?

That would allow to store identical metadata once (and therefore
optimize space) and will be much simpler. There would not be any need of
version, timestamp, or whatever on metadata.

-- 
Julien Danjou
// eNovance  http://enovance.com
// ✉ julien.dan...@enovance.com  ☎ +33 1 49 70 99 81

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [metering] resources metadata

2012-05-15 Thread Doug Hellmann
Copying the list...

On Tue, May 15, 2012 at 10:26 AM, Doug Hellmann doug.hellm...@dreamhost.com
 wrote:



 On Tue, May 15, 2012 at 8:21 AM, Julien Danjou julien.dan...@enovance.com
  wrote:

 On Tue, May 15 2012, Loic Dachary wrote:

  On 05/15/2012 12:05 PM, Julien Danjou wrote:
 
  OTOH I find the metadata proposal in another table too much
  complicated. Why not storing what metadata in the meter.payload field
  in the same table (e.g. as a JSON string)?
  I would be much simpler to store the metadata in the resource_id field
  which could be renamed into resource field.

 That'd be even more radical.


 I like it because it would simplify the messaging. We can leave the
 storage optimization question to the daemon that stores the data.



  Instead of resource_id=134123 we could have resource={ 'id': 134123,
  'name': 'foobar', 'flavor': 'm1.small' etc.. } There would be no need
  for versioning, format, separate table, etc. etc. The only convention
  would be that it's a hash with at least one field : the id of the
  resource. The rest is metadata.
 
  It will use a lot of disk space with highly redundant information.

 Ok, so the current proposal is just early optimization, as I understood.

 If you want to optimize the storage, why not use resource_id as a
 foreign key to the metatable table which would contains unique records
 of metadata?

 That would allow to store identical metadata once (and therefore
 optimize space) and will be much simpler. There would not be any need of
 version, timestamp, or whatever on metadata.

 --
 Julien Danjou
 // eNovance  http://enovance.com
 // ✉ julien.dan...@enovance.com  ☎ +33 1 49 70 99 81

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [metering] resources metadata (was: public API design)

2012-05-14 Thread Doug Hellmann
On Fri, May 11, 2012 at 3:55 PM, Loic Dachary l...@enovance.com wrote:


  - The interesting metadata for a resource may depend on the type of
  resource. Do we need separate tables for that or can we normalize
  somehow?
  - How do we map a resource to the correct version of its metadata at
  any given time? Timestamps seem brittle.
  - Do we need to reflect the metadata in the aggregation API?
 
 Hi,

 I started a new thread for the metadata topic. I suspect it deserves it.
 Although I was reluctant to acknowledge that the metadate should be stored
 by the metering, yesterday's meeting made me realize that it was mandatory.
 The compelling reason ( for me ;-) is that it would make it much more
 difficult to implement a billing system if the metering does not provide a
 simple way to extract metadata and display it in a human readable way (or
 meaningfull to accountants ?) .

 I see two separate questions :

 a) how to store and query metadata ?
 b) what is the semantic of metadata for a given resource ?

 My hunch is that there will never be a definitive answer to b) and that
 the best we can do is to provide a format and leave the semantic to the
 documentation of the metering system, explaining the metadata of a resource.

 Regarding the storage of the metadata, the metering could listen / poll
 events creating / updating / deleting a given resource and store a history
 log indexed by the resource id. Something like:

 { meter_type: TTT,
 resource_id: RRR,
 metadata: [{ version: ,
 timestamp: TIME1,
 payload: PAYLOAD1 },
 { version: ,
 timestamp: TIME3,
 payload: PAYLOAD2 }]
 }

 With PPP being the resource dependant metadata that depends on the type of
 the resource. And the metadata array being an ordered list of the
 successive states of the resource over time. The VVV version accounting for
 changes in the format of the payload.

 The query would be :

 GET /resource/meter_type/resource_id/TIME2

 and it would return PAYLOAD1 if TIME2 is in the range [TIME1,TIME3[

 I'm not sure why you think timestamp is brittle. Maybe I'm missing
 something.


Each set of metering data will need to be associated with the appropriate
metadata from the resource at the time the metering information was
collected. The rate of change of metadata and metering events are
different, though, so the timestamps of the metadata records are unlikely
to match exactly with the values in the metering records. Depending on the
clock resolution, it would be possible to have metadata changes and meter
data with the same timestamp, resulting in an incorrect association.

We can work around that by maintaining proper foreign key references using
the metadata version field as you describe in the schema above (so the
resource id and metadata version value point to the correct metadata
record). It will make recording the metering data less efficient because we
will need to determine the current version for the resource metadata, but
we can optimize that eventually through indexes and caching.

Aggregation will also need to take the metadata version into account, so
everywhere in the list of queries we say by resource_id we need to change
that to by resource_id and version.

Doug



 Cheers

 --
 Loïc Dachary Chief Research Officer
 // eNovance labs   http://labs.enovance.com
 // ✉ l...@enovance.com  ☎ +33 1 49 70 99 82


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [metering] resources metadata

2012-05-14 Thread Loic Dachary
On 05/14/2012 04:15 PM, Doug Hellmann wrote:


 On Fri, May 11, 2012 at 3:55 PM, Loic Dachary l...@enovance.com 
 mailto:l...@enovance.com wrote:


  - The interesting metadata for a resource may depend on the type of
  resource. Do we need separate tables for that or can we normalize
  somehow?
  - How do we map a resource to the correct version of its metadata at
  any given time? Timestamps seem brittle.
  - Do we need to reflect the metadata in the aggregation API?
 
 Hi,

 I started a new thread for the metadata topic. I suspect it deserves 
 it. Although I was reluctant to acknowledge that the metadate should be 
 stored by the metering, yesterday's meeting made me realize that it was 
 mandatory. The compelling reason ( for me ;-) is that it would make it much 
 more difficult to implement a billing system if the metering does not provide 
 a simple way to extract metadata and display it in a human readable way (or 
 meaningfull to accountants ?) .

 I see two separate questions :

 a) how to store and query metadata ?
 b) what is the semantic of metadata for a given resource ?

 My hunch is that there will never be a definitive answer to b) and that 
 the best we can do is to provide a format and leave the semantic to the 
 documentation of the metering system, explaining the metadata of a resource.

 Regarding the storage of the metadata, the metering could listen / poll 
 events creating / updating / deleting a given resource and store a history 
 log indexed by the resource id. Something like:

 { meter_type: TTT,
 resource_id: RRR,
 metadata: [{ version: ,
 timestamp: TIME1,
 payload: PAYLOAD1 },
 { version: ,
 timestamp: TIME3,
 payload: PAYLOAD2 }]
 }

 With PPP being the resource dependant metadata that depends on the type 
 of the resource. And the metadata array being an ordered list of the 
 successive states of the resource over time. The VVV version accounting for 
 changes in the format of the payload.

 The query would be :

 GET /resource/meter_type/resource_id/TIME2

 and it would return PAYLOAD1 if TIME2 is in the range [TIME1,TIME3[

 I'm not sure why you think timestamp is brittle. Maybe I'm missing 
 something.


 Each set of metering data will need to be associated with the appropriate 
 metadata from the resource at the time the metering information was 
 collected. The rate of change of metadata and metering events are different, 
 though, so the timestamps of the metadata records are unlikely to match 
 exactly with the values in the metering records. Depending on the clock 
 resolution, it would be possible to have metadata changes and meter data with 
 the same timestamp, resulting in an incorrect association.
Indeed, good point.

 We can work around that by maintaining proper foreign key references using 
 the metadata version field as you describe in the schema above (so the 
 resource id and metadata version value point to the correct metadata record). 
 It will make recording the metering data less efficient because we will need 
 to determine the current version for the resource metadata, but we can 
 optimize that eventually through indexes and caching.

 Aggregation will also need to take the metadata version into account, so 
 everywhere in the list of queries we say by resource_id we need to change 
 that to by resource_id and version.
I added the idea of a format version for when the payload format changes and 
tried to write down a description of the metadata storage matching this thread 
in the wiki.

http://wiki.openstack.org/EfficientMetering?action=diffrev2=80rev1=78

What do you think ?

-- 
Loïc Dachary Chief Research Officer
// eNovance labs   http://labs.enovance.com
// ✉ l...@enovance.com  ☎ +33 1 49 70 99 82

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [metering] resources metadata

2012-05-14 Thread Doug Hellmann
On Mon, May 14, 2012 at 1:04 PM, Loic Dachary l...@enovance.com wrote:

  On 05/14/2012 04:15 PM, Doug Hellmann wrote:



 On Fri, May 11, 2012 at 3:55 PM, Loic Dachary l...@enovance.com wrote:


  - The interesting metadata for a resource may depend on the type of
  resource. Do we need separate tables for that or can we normalize
  somehow?
  - How do we map a resource to the correct version of its metadata at
  any given time? Timestamps seem brittle.
  - Do we need to reflect the metadata in the aggregation API?
 
 Hi,

 I started a new thread for the metadata topic. I suspect it deserves
 it. Although I was reluctant to acknowledge that the metadate should be
 stored by the metering, yesterday's meeting made me realize that it was
 mandatory. The compelling reason ( for me ;-) is that it would make it much
 more difficult to implement a billing system if the metering does not
 provide a simple way to extract metadata and display it in a human readable
 way (or meaningfull to accountants ?) .

 I see two separate questions :

 a) how to store and query metadata ?
 b) what is the semantic of metadata for a given resource ?

 My hunch is that there will never be a definitive answer to b) and that
 the best we can do is to provide a format and leave the semantic to the
 documentation of the metering system, explaining the metadata of a resource.

 Regarding the storage of the metadata, the metering could listen / poll
 events creating / updating / deleting a given resource and store a history
 log indexed by the resource id. Something like:

 { meter_type: TTT,
 resource_id: RRR,
 metadata: [{ version: ,
 timestamp: TIME1,
 payload: PAYLOAD1 },
 { version: ,
 timestamp: TIME3,
 payload: PAYLOAD2 }]
 }

 With PPP being the resource dependant metadata that depends on the type
 of the resource. And the metadata array being an ordered list of the
 successive states of the resource over time. The VVV version accounting for
 changes in the format of the payload.

 The query would be :

 GET /resource/meter_type/resource_id/TIME2

 and it would return PAYLOAD1 if TIME2 is in the range [TIME1,TIME3[

 I'm not sure why you think timestamp is brittle. Maybe I'm missing
 something.


  Each set of metering data will need to be associated with the
 appropriate metadata from the resource at the time the metering information
 was collected. The rate of change of metadata and metering events are
 different, though, so the timestamps of the metadata records are unlikely
 to match exactly with the values in the metering records. Depending on the
 clock resolution, it would be possible to have metadata changes and meter
 data with the same timestamp, resulting in an incorrect association.

 Indeed, good point.


Although it turns out the case I was actually worried about, resizing
instances, may be supported by only some hypervisors. As a result, this is
less of a concern and I could afford to have us postpone handling changing
metadata until a later version of ceilometer. We still need to collect the
initial data, in case the resource is deleted, but that is far less
complicated and there is no sense making extra trouble for ourselves if
other users of 1.0 will not need the feature, either. Does anyone else in
the group have feedback on how important it is?


  We can work around that by maintaining proper foreign key references
 using the metadata version field as you describe in the schema above (so
 the resource id and metadata version value point to the correct metadata
 record). It will make recording the metering data less efficient because we
 will need to determine the current version for the resource metadata, but
 we can optimize that eventually through indexes and caching.

  Aggregation will also need to take the metadata version into account, so
 everywhere in the list of queries we say by resource_id we need to change
 that to by resource_id and version.

 I added the idea of a format version for when the payload format changes
 and tried to write down a description of the metadata storage matching this
 thread in the wiki.

 http://wiki.openstack.org/EfficientMetering?action=diffrev2=80rev1=78

 What do you think ?


That looks good. I am looking forward to getting Julien's code merged in so
I can start working with it.




 --
 Loïc Dachary Chief Research Officer
 // eNovance labs   http://labs.enovance.com
 // ✉ l...@enovance.com  ☎ +33 1 49 70 99 82


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [metering] resources metadata (was: public API design)

2012-05-11 Thread Loic Dachary

 - The interesting metadata for a resource may depend on the type of
 resource. Do we need separate tables for that or can we normalize
 somehow?
 - How do we map a resource to the correct version of its metadata at
 any given time? Timestamps seem brittle.
 - Do we need to reflect the metadata in the aggregation API?

Hi,

I started a new thread for the metadata topic. I suspect it deserves it. 
Although I was reluctant to acknowledge that the metadate should be stored by 
the metering, yesterday's meeting made me realize that it was mandatory. The 
compelling reason ( for me ;-) is that it would make it much more difficult to 
implement a billing system if the metering does not provide a simple way to 
extract metadata and display it in a human readable way (or meaningfull to 
accountants ?) .

I see two separate questions :

a) how to store and query metadata ?
b) what is the semantic of metadata for a given resource ?

My hunch is that there will never be a definitive answer to b) and that the 
best we can do is to provide a format and leave the semantic to the 
documentation of the metering system, explaining the metadata of a resource.

Regarding the storage of the metadata, the metering could listen / poll events 
creating / updating / deleting a given resource and store a history log indexed 
by the resource id. Something like:

{ meter_type: TTT,
resource_id: RRR,
metadata: [{ version: ,
timestamp: TIME1,
payload: PAYLOAD1 },
{ version: ,
timestamp: TIME3,
payload: PAYLOAD2 }]
}

With PPP being the resource dependant metadata that depends on the type of the 
resource. And the metadata array being an ordered list of the successive states 
of the resource over time. The VVV version accounting for changes in the format 
of the payload.

The query would be :

GET /resource/meter_type/resource_id/TIME2

and it would return PAYLOAD1 if TIME2 is in the range [TIME1,TIME3[

I'm not sure why you think timestamp is brittle. Maybe I'm missing something.

Cheers

-- 
Loïc Dachary Chief Research Officer
// eNovance labs   http://labs.enovance.com
// ✉ l...@enovance.com  ☎ +33 1 49 70 99 82


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp