On 8/20/13 1:48 PM, "Sandy Walsh" <sandy.wa...@rackspace.com> wrote:
> > >On 08/20/2013 10:42 AM, Thomas Maddox wrote: >> On 8/19/13 8:21 AM, "Sandy Walsh" <sandy.wa...@rackspace.com> wrote: >> >>> >>> >>> On 08/18/2013 04:04 PM, Jay Pipes wrote: >>>> On 08/17/2013 03:10 AM, Julien Danjou wrote: >>>>> On Fri, Aug 16 2013, Jay Pipes wrote: >>>>> >>>>>> Actually, that's the opposite of what I'm suggesting :) I'm >>>>>>suggesting >>>>>> getting rid of the resource_metadata column in the meter table and >>>>>> using the >>>>>> resource table in joins... >>>>> >>>>> I think there's a lot of scenario where this would fail, like for >>>>> example instances being resized; the flavor is a metadata. >>>> >>>> I'm proposing that in these cases, a *new* resource would be added to >>>> the resource table (and its ID inserted in meter) table with the new >>>> flavor/instance's metadata. >>>> >>>>> Though, changing the schema to improve performance is a good one, >>>>>this >>>>> needs to be thought from the sample sending to the storage, through >>>>>the >>>>> whole chain. This is something that will break a lot of current >>>>> assumption; that doesn't mean it's bad or we can't do it, just that >>>>>we >>>>> need to think it through. :) >>>> >>>> Yup, understood completely. The change I am proposing would not affect >>>> any assumptions made from the point of view of a sample sent to >>>>storage. >>>> The current assumption is that a sample's *exact* state at time of >>>> sampling would be stored so that the exact sample state could be >>>> reflected even if the underlying resource that triggered the sample >>>> changed over time. >>>> >>>> All I am proposing is a change to the existing implementation of that >>>> assumption: instead of storing the original resource metadata in the >>>> meter table, we instead ensure that we store the resource in the >>>> resource table, and upon new sample records being inserted into the >>>> meter table, we check to see if the resource for the sample is the >>>>same >>>> as it was last time. If it is, we simply insert the resource ID from >>>> last time. If it isn't, we add a new record to the resource table that >>>> describes the new resource attributes, and we insert that new resource >>>> ID into the meter table for that sample... >>> >>> I'm assuming we wouldn't need a backlink to the older resource? >>> >>> I'm thinking about how this would work work Events and Request ID's. >>>The >>> two most common reports we run from StackTach are based on Request ID >>> and some resource ID. >>> >>> "Show me all the events related to this Request UUID" >>> "Show me all the events related to this <Instance/Image/Network/etc> >>>UUID" >>> >>> A new Resource entry would be fine so long as it was still associated >>> with the underlying Resource UUID (instance, image, etc). We could get >>> back a list of all the Resources with the same UUID and, if needed, >>> lookup the metadata for it. This would allow us to see how to the >>> resource changed over time. >>> >>> I think that's what you're suggesting ... if so, yep. >>> >>> As for the first query "... for this Request ID", we'd have to map >>>Event >>> many related Resources since one event could have a related >>> instance/image/network/volume/host/scheduler, etc. >>> >>> These relationships would have to get mapped when the Event is turned >>> into Meters. Changing the Resource ID might not be a problem if we keep >>> a common Resource UUID. I have to think about that some more. >>> >>> Would we use timestamps to determine which Resource is the most recent? >>> >>> >>> -S >>> >> >> Are we going to be incurring significant performance cost from this? > >There's certainly a storage cost and potentially a race condition (read >the current metadata, change something and, at the same time, someone >else did the another metadata change). But the performance overhead >should be slight. Hmmmm, okay. Yeah, it's not currently checking before merging metadata right now. It's just merging whichever notifications happens to show up last. > >> Let me see if I understand how a query will work for this based on the >> current way CM gets billing: >> >> Scenario: Monthly billing for Winston who built 12 machines this month; >>we >> don't want to bill for failed/stalled builds that weren't cleaned up yet >> either. >> >> 1. Filter Meter table for the time range in the samples to get the >> Resources that were updated > >I would do like we do in StackTach, query for the unique set of Request >ID's over that time range by Tenant. From there determine which of those >operations were billable actions (CUD). > >I think Dragon's Trigger work will make this far less expensive than it >is in StackTach currently since we'll be able to create a "Request" >Resource (each Request will have a related Resource) with metadata >saying this was a billable event and the tenant ID. The corresponding >events to the request ID can link to this Request resource. The query >should be pretty tight. That's an interesting point about triggers; I'm not sure I fully understand - the criteria would be like a 'completed billable request' and can be stored as satisfied after the notifications come through for a specific tenant? So, we would just aggregate for each tenant's billable (triggered 'bill this') events for the period? > > >> 2. Because the metadata changes a few times throughout the build >>process, >> we have samples referencing several different metadata states over time >> for each instance >> 3. Because of the metadata over time, we filter the Resource table to >> provide distinct resources > >Yeah, knowing which is the "latest" Resource is my concern as well. >Getting the metadata from that resource should be the same cost. But >getting the resource might be tricky. That's the main issue I'm seeing if we were to create a new resource for each resource state (metadata change), yep. > >> 4. We then correlate the resulting resources with their aggregate >>samples >> to have the measurements for each resource >> >> A thought for what may make this easier is to apply Jay's idea to a >> normalized version of resource_metadata in a separate table that is then >> referenced from the samples' resource_metadata attribute and a >> latest_metadata column in the Resource table. That way we're not >>repeating >> ourselves and we're not incurring any more complication than we already >> have with the current implementation (it would keep the FK to the >>Resource >> table). This way we can easily get to the latest state (hit the Resource >> table from the sample) and the associated measurements derived from the >> samples. We then only have to deal with metadata over time when it's >> important, which seems like a relatively infrequent request to trace, >>but >> still a use case that need be satisfied. >> >> I hope I'm not seeing a problem that doesn't exist, but either way I'll >> learn something so, thoughts? =] >> >> Cheers! >> >> -Thomas >> >>> >>> >>> >>>> Best, >>>> -jay >>>> >>>> >>>> _______________________________________________ >>>> OpenStack-dev mailing list >>>> OpenStack-dev@lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev