Hi Anqi, See my comments listed below. Regards --Roland

From: An Qi YL Lu <l...@cn.ibm.com<mailto:l...@cn.ibm.com>>
Date: Sunday, February 12, 2017 at 8:29 PM
To: Roland Hochmuth <roland.hochm...@hpe.com<mailto:roland.hochm...@hpe.com>>
Cc: OpenStack List 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [monasca] Ideas to work on

Hi Roland

I am not sure whether you received my last email because I got a delivery 
failure notification. I am sending this again to ensure that you can see this 
email.

Best,
Anqi

----- Original message -----
From: An Qi YL Lu/China/IBM
To: roland.hochm...@hpe.com<mailto:roland.hochm...@hpe.com>
Cc: openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>
Subject: Re: [monasca] Ideas to work on
Date: Fri, Feb 10, 2017 5:14 PM

Hi Roland

Thanks for your suggestions. The list you made is useful, helping me get clues 
in areas that I can work on. I spent some time doing investigation in the bps 
that you introduced.

I am most interested in data retention and metrics deleting.

Data retention: I had a quick look into the data retention policy of influxDB. 
It apparently support different retention policy for different series. To my 
understanding, the whiteboard in this bp has a straightforward design for this 
feature. I didn't quite get what is the complex point. Could you please shed 
some light so I can learn where the complicated part is?
The retention policy specified in the bp, 
https://blueprints.launchpad.net/monasca/+spec/per-project-data-retention,  is 
per project. InfluxDB allows retention policies to be set per database, 
https://docs.influxdata.com/influxdb/v1.2/query_language/database_management/#create-retention-policies-with-create-retention-policy.

Currently, we store all metrics for all tenants in one database. One approach, 
which would involve a bit of re-engineering if we choose to do it, would be to 
store metrics for a project in a database for each project.

I could also imagine having retention policies per metric per tenant. For 
example, there might be metrics for metering that should be stored for a longer 
period than operational metrics. There isn't a way to do this directly in 
InfluxDB using the built-in data retention policy. However, it could possibly 
be done using delete and scheduling jobs that periodically run that prune the 
database.

For the Vertica database, we, as in HPE, simulate retention policies by running 
a cron job that drops partitions after some period of time, such as 45 days. 
Charter has a more sophisticated cron job that deletes metrics from specific 
tenants at different periods than the operational metrics. For example, tenants 
of the cloud might have their metrics deleted every two weeks. Metering metrics 
might be deleted every 13 months.

The problem with deleting specific metrics is the performance. Dropping 
partitions is extremely fast. However, deleting metrics might be slow and also 
lock the database and prevent writes and/or queries to it. Therefore, to delete 
metrics, you could trickle deletes in, reducing the overall impact for any 
period of time, or do in the Charter case, run the deletion script at 2:00 AM 
in the morning, when usage of the system is light.

Metrics deleting: In influxDB 1.1 (or any version after 0.9), it supports 
deleting series, though you cannot specify time interval for this operation. It 
simply deletes all points from a series in a database. I think one of the 
tricky parts is to decide the data dependent on a metric to be deleted, such as 
measurements, alarms. Please point it out if my understanding is not precise.
The problem I believe is that a single series in InfluxDB has the data for 
multiple tenants. Deleting a single series would then result in deleting series 
for all tenants. Similar to data retention policies, to support deletion of 
metrics, by metric name and optional dimensions, the storage of metrics would 
need to be handled differently and/or some other solution designed.


I would like to look at logs publishing as well. But unfortunately I did not 
find the monasca-log-api doc, which is supposed to be at 
https://github.com/openstack/monasca-log-api/tree/master/docs . I don't know 
how this log-api works now. Please share me a copy of the doc if you have one.
The new changes proposed by Steve Simpson are in the review that he just 
published at, https://review.openstack.org/#/c/433016/.

The current documentation is now under a slightly different directory than the 
link above at, 
https://github.com/openstack/monasca-log-api/blob/master/documentation/monasca-log-api-spec.md.

Best,
Anqi

----- Original message -----
From: "Hochmuth, Roland M" 
<roland.hochm...@hpe.com<mailto:roland.hochm...@hpe.com>>
To: OpenStack List 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>, 
An Qi YL Lu/China/IBM@IBMCN
Cc:
Subject: [monasca] Ideas to work on
Date: Fri, Feb 10, 2017 11:13 AM

Hi Anqi, You had expressed a strong interest in working on Monasca the other 
day in our Weekly Monasca Team Meeting. I owed you a response. The team had 
also asked me to also keep them in the loop. Here is a list that I feel is 
interesting, that is not trivial or extremely complex (just right hopefully), 
and doesn't overlap with some of the areas that other developers are working 
on, and consequently difficult to coordinate in a limited time.

  1.  RBAC: Currently, the Python API doesn't fully support Role Based Access 
Controls (RBAC) in the API. We've had discussions on this topic, but oddly, 
there isn't a blueprint written for this. But, this would be very useful to 
implement in the APIs similar to what other OpenStack projects support.
  2.  Data retention: 
https://blueprints.launchpad.net/monasca/+spec/per-project-data-retention. We 
haven't completely reviewed and or approved this blueprint, but it would be 
very useful to add support for per-project, or per-metric data retention. This 
would involve understanding how data retention works in InfluxDB. We would also 
want to have some design discussion prior to proceeding, as it is probably more 
complex than described in the bp.
  3.  Publish logs and/or metrics to topics selectively. 
https://blueprints.launchpad.net/monasca/+spec/publish-logs-to-topic-selectively.
 In the context of metrics, this would be useful to identifying specific 
metrics as metering as opposed to monitoring metrics and allow them to be 
published to different Kafka topics as a result. The way this would be used is 
that the downstream Monasca Transform Engine would only get metrics sent to it 
that will be transformed and therefore doesn't need to filter them, which would 
help improve performance dramatically. For logging, it would help identity 
operational logs from audit logs. It could also be used to identity high 
priority metrics such that they could be published to a high-priority metrics 
topic in Kafka. There are several more contexts in which this is useful.
  4.  Delete metrics: 
https://blueprints.launchpad.net/monasca/+spec/delete-metrics. Basically adding 
the ability to delete metrics using the Monasca API. Typically, time series 
databases are not very good at deletes. We haven't tried to do this with 
InfluxDB, and while this might seem an easy task, it is a lot more involved 
than issuing the obvious and straight-forward DELETE command.

I hope this helps. Let me know if you want to discuss further or want more 
ideas.

Regards --Roland




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to