[openstack-dev] [monasca] Ideas to work on

2017-02-09 Thread Hochmuth, Roland M
Hi Anqi, You had expressed a strong interest in working on Monasca the other 
day in our Weekly Monasca Team Meeting. I owed you a response. The team had 
also asked me to also keep them in the loop. Here is a list that I feel is 
interesting, that is not trivial or extremely complex (just right hopefully), 
and doesn't overlap with some of the areas that other developers are working 
on, and consequently difficult to coordinate in a limited time.

  1.  RBAC: Currently, the Python API doesn't fully support Role Based Access 
Controls (RBAC) in the API. We've had discussions on this topic, but oddly, 
there isn't a blueprint written for this. But, this would be very useful to 
implement in the APIs similar to what other OpenStack projects support.
  2.  Data retention: 
https://blueprints.launchpad.net/monasca/+spec/per-project-data-retention. We 
haven't completely reviewed and or approved this blueprint, but it would be 
very useful to add support for per-project, or per-metric data retention. This 
would involve understanding how data retention works in InfluxDB. We would also 
want to have some design discussion prior to proceeding, as it is probably more 
complex than described in the bp.
  3.  Publish logs and/or metrics to topics selectively. 
https://blueprints.launchpad.net/monasca/+spec/publish-logs-to-topic-selectively.
 In the context of metrics, this would be useful to identifying specific 
metrics as metering as opposed to monitoring metrics and allow them to be 
published to different Kafka topics as a result. The way this would be used is 
that the downstream Monasca Transform Engine would only get metrics sent to it 
that will be transformed and therefore doesn't need to filter them, which would 
help improve performance dramatically. For logging, it would help identity 
operational logs from audit logs. It could also be used to identity high 
priority metrics such that they could be published to a high-priority metrics 
topic in Kafka. There are several more contexts in which this is useful.
  4.  Delete metrics: 
https://blueprints.launchpad.net/monasca/+spec/delete-metrics. Basically adding 
the ability to delete metrics using the Monasca API. Typically, time series 
databases are not very good at deletes. We haven't tried to do this with 
InfluxDB, and while this might seem an easy task, it is a lot more involved 
than issuing the obvious and straight-forward DELETE command.

I hope this helps. Let me know if you want to discuss further or want more 
ideas.

Regards --Roland
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [monasca] Ideas to work on

2017-02-12 Thread An Qi YL Lu
Hi Roland
 
I am not sure whether you received my last email because I got a delivery failure notification. I am sending this again to ensure that you can see this email.
 
Best,
Anqi
 
- Original message -From: An Qi YL Lu/China/IBMTo: roland.hochm...@hpe.comCc: openstack-dev@lists.openstack.orgSubject: Re: [monasca] Ideas to work onDate: Fri, Feb 10, 2017 5:14 PM 
Hi Roland
 
Thanks for your suggestions. The list you made is useful, helping me get clues in areas that I can work on. I spent some time doing investigation in the bps that you introduced.
 
I am most interested in data retention and metrics deleting.
 
Data retention: I had a quick look into the data retention policy of influxDB. It apparently support different retention policy for different series. To my understanding, the whiteboard in this bp has a straightforward design for this feature. I didn't quite get what is the complex point. Could you please shed some light so I can learn where the complicated part is?
 
Metrics deleting: In influxDB 1.1 (or any version after 0.9), it supports deleting series, though you cannot specify time interval for this operation. It simply deletes all points from a series in a database. I think one of the tricky parts is to decide the data dependent on a metric to be deleted, such as measurements, alarms. Please point it out if my understanding is not precise.
 
I would like to look at logs publishing as well. But unfortunately I did not find the monasca-log-api doc, which is supposed to be at https://github.com/openstack/monasca-log-api/tree/master/docs . I don't know how this log-api works now. Please share me a copy of the doc if you have one.
 
Best,
Anqi
 
- Original message -From: "Hochmuth, Roland M" To: OpenStack List , An Qi YL Lu/China/IBM@IBMCNCc:Subject: [monasca] Ideas to work onDate: Fri, Feb 10, 2017 11:13 AM 
Hi Anqi, You had expressed a strong interest in working on Monasca the other day in our Weekly Monasca Team Meeting. I owed you a response. The team had also asked me to also keep them in the loop. Here is a list that I feel is interesting, that is not trivial or extremely complex (just right hopefully), and doesn't overlap with some of the areas that other developers are working on, and consequently difficult to coordinate in a limited time.
RBAC: Currently, the Python API doesn't fully support Role Based Access Controls (RBAC) in the API. We've had discussions on this topic, but oddly, there isn't a blueprint written for this. But, this would be very useful to implement in the APIs similar to what other OpenStack projects support.Data retention: https://blueprints.launchpad.net/monasca/+spec/per-project-data-retention. We haven't completely reviewed and or approved this blueprint, but it would be very useful to add support for per-project, or per-metric data retention. This would involve understanding how data retention works in InfluxDB. We would also want to have some design discussion prior to proceeding, as it is probably more complex than described in the bp.Publish logs and/or metrics to topics selectively. https://blueprints.launchpad.net/monasca/+spec/publish-logs-to-topic-selectively. In the context of metrics, this would be useful to identifying specific metrics as metering as opposed to monitoring metrics and allow them to be published to different Kafka topics as a result. The way this would be used is that the downstream Monasca Transform Engine would only get metrics sent to it that will be transformed and therefore doesn't need to filter them, which would help improve performance dramatically. For logging, it would help identity operational logs from audit logs. It could also be used to identity high priority metrics such that they could be published to a high-priority metrics topic in Kafka. There are several more contexts in which this is useful.Delete metrics: https://blueprints.launchpad.net/monasca/+spec/delete-metrics. Basically adding the ability to delete metrics using the Monasca API. Typically, time series databases are not very good at deletes. We haven't tried to do this with InfluxDB, and while this might seem an easy task, it is a lot more involved than issuing the obvious and straight-forward DELETE command.
I hope this helps. Let me know if you want to discuss further or want more ideas.
 
Regards --Roland
 
 
 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [monasca] Ideas to work on

2017-02-13 Thread witold.be...@est.fujitsu.com
Hi,

Here the URL to monasca-log-api documentation [1].

Cheers
Witek


[1] https://github.com/openstack/monasca-log-api/tree/master/documentation



I would like to look at logs publishing as well. But unfortunately I did not 
find the monasca-log-api doc, which is supposed to be at 
https://github.com/openstack/monasca-log-api/tree/master/docs . I don't know 
how this log-api works now. Please share me a copy of the doc if you have one.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [monasca] Ideas to work on

2017-02-13 Thread Hochmuth, Roland M
Hi Anqi, See my comments listed below. Regards --Roland

From: An Qi YL Lu mailto:l...@cn.ibm.com>>
Date: Sunday, February 12, 2017 at 8:29 PM
To: Roland Hochmuth mailto:roland.hochm...@hpe.com>>
Cc: OpenStack List 
mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [monasca] Ideas to work on

Hi Roland

I am not sure whether you received my last email because I got a delivery 
failure notification. I am sending this again to ensure that you can see this 
email.

Best,
Anqi

- Original message -
From: An Qi YL Lu/China/IBM
To: roland.hochm...@hpe.com
Cc: openstack-dev@lists.openstack.org
Subject: Re: [monasca] Ideas to work on
Date: Fri, Feb 10, 2017 5:14 PM

Hi Roland

Thanks for your suggestions. The list you made is useful, helping me get clues 
in areas that I can work on. I spent some time doing investigation in the bps 
that you introduced.

I am most interested in data retention and metrics deleting.

Data retention: I had a quick look into the data retention policy of influxDB. 
It apparently support different retention policy for different series. To my 
understanding, the whiteboard in this bp has a straightforward design for this 
feature. I didn't quite get what is the complex point. Could you please shed 
some light so I can learn where the complicated part is?
The retention policy specified in the bp, 
https://blueprints.launchpad.net/monasca/+spec/per-project-data-retention,  is 
per project. InfluxDB allows retention policies to be set per database, 
https://docs.influxdata.com/influxdb/v1.2/query_language/database_management/#create-retention-policies-with-create-retention-policy.

Currently, we store all metrics for all tenants in one database. One approach, 
which would involve a bit of re-engineering if we choose to do it, would be to 
store metrics for a project in a database for each project.

I could also imagine having retention policies per metric per tenant. For 
example, there might be metrics for metering that should be stored for a longer 
period than operational metrics. There isn't a way to do this directly in 
InfluxDB using the built-in data retention policy. However, it could possibly 
be done using delete and scheduling jobs that periodically run that prune the 
database.

For the Vertica database, we, as in HPE, simulate retention policies by running 
a cron job that drops partitions after some period of time, such as 45 days. 
Charter has a more sophisticated cron job that deletes metrics from specific 
tenants at different periods than the operational metrics. For example, tenants 
of the cloud might have their metrics deleted every two weeks. Metering metrics 
might be deleted every 13 months.

The problem with deleting specific metrics is the performance. Dropping 
partitions is extremely fast. However, deleting metrics might be slow and also 
lock the database and prevent writes and/or queries to it. Therefore, to delete 
metrics, you could trickle deletes in, reducing the overall impact for any 
period of time, or do in the Charter case, run the deletion script at 2:00 AM 
in the morning, when usage of the system is light.

Metrics deleting: In influxDB 1.1 (or any version after 0.9), it supports 
deleting series, though you cannot specify time interval for this operation. It 
simply deletes all points from a series in a database. I think one of the 
tricky parts is to decide the data dependent on a metric to be deleted, such as 
measurements, alarms. Please point it out if my understanding is not precise.
The problem I believe is that a single series in InfluxDB has the data for 
multiple tenants. Deleting a single series would then result in deleting series 
for all tenants. Similar to data retention policies, to support deletion of 
metrics, by metric name and optional dimensions, the storage of metrics would 
need to be handled differently and/or some other solution designed.


I would like to look at logs publishing as well. But unfortunately I did not 
find the monasca-log-api doc, which is supposed to be at 
https://github.com/openstack/monasca-log-api/tree/master/docs . I don't know 
how this log-api works now. Please share me a copy of the doc if you have one.
The new changes proposed by Steve Simpson are in the review that he just 
published at, https://review.openstack.org/#/c/433016/.

The current documentation is now under a slightly different directory than the 
link above at, 
https://github.com/openstack/monasca-log-api/blob/master/documentation/monasca-log-api-spec.md.

Best,
Anqi

- Original message -
From: "Hochmuth, Roland M" 
mailto:roland.hochm...@hpe.com>>
To: OpenStack List 
mailto:openstack-dev@lists.openstack.org>>, 
An Qi YL Lu/China/IBM@IBMCN
Cc:
Subject: [monasca] Ideas to work on
Date: Fri, Feb 10, 2017 11:13 AM

Hi Anqi, You had expressed a strong interest in working on Monasca the other 
day in our Weekly Monasca Team Meeting. I owed you a 

Re: [openstack-dev] [monasca] Ideas to work on

2017-02-19 Thread An Qi YL Lu
Hi Roland
I went on holiday for last week. I’ll send you a notification next time. Sorry for being absent from last weekly meeting.
According to the comments that you gave, I can see the retention policy partly depends on metrics deleting. There can also be an option that we make a patch for influxDB to enable the basic support for retention on series. But this option seems take great effort. So in my opinion, we should implement metrics deleting before retention policy.
If we decide to run a scheduled job to simulate the retention policy, what tools will you suggest to host the job? I know there is a schedule module in python. (https://docs.python.org/2/library/sched.html) But it doesn’t seem appropriate for our consequence.
I think we can begin from metrics deleting. You talked about changing the storage of metrics. Shall we start from redesign the data storage structure? What is in your mind? I am going to study the current storage structure for a bit further now.
What shall I do next if I decide to work on metrics deleting? Do I need to write a whiteboard describing the detailed design?
Best,Anqi
 
 
- Original message -From: "Hochmuth, Roland M" To: "openstack-dev@lists.openstack.org" Cc: An Qi YL Lu/China/IBM@IBMCNSubject: Re: [monasca] Ideas to work onDate: Tue, Feb 14, 2017 10:05 AM 
Hi Anqi, See my comments listed below. Regards --Roland
 
  
From: An Qi YL Lu Date: Sunday, February 12, 2017 at 8:29 PMTo: Roland Hochmuth Cc: OpenStack List Subject: Re: [monasca] Ideas to work on 
  
Hi Roland
 
I am not sure whether you received my last email because I got a delivery failure notification. I am sending this again to ensure that you can see this email.
 
Best,
Anqi
 
- Original message -From: An Qi YL Lu/China/IBMTo: roland.hochm...@hpe.comCc: openstack-dev@lists.openstack.orgSubject: Re: [monasca] Ideas to work onDate: Fri, Feb 10, 2017 5:14 PM  
Hi Roland
 
Thanks for your suggestions. The list you made is useful, helping me get clues in areas that I can work on. I spent some time doing investigation in the bps that you introduced.
 
I am most interested in data retention and metrics deleting.
 
Data retention: I had a quick look into the data retention policy of influxDB. It apparently support different retention policy for different series. To my understanding, the whiteboard in this bp has a straightforward design for this feature. I didn't quite get what is the complex point. Could you please shed some light so I can learn where the complicated part is?  
The retention policy specified in the bp, https://blueprints.launchpad.net/monasca/+spec/per-project-data-retention,  is per project. InfluxDB allows retention policies to be set per database, https://docs.influxdata.com/influxdb/v1.2/query_language/database_management/#create-retention-policies-with-create-retention-policy.
 
Currently, we store all metrics for all tenants in one database. One approach, which would involve a bit of re-engineering if we choose to do it, would be to store metrics for a project in a database for each project.
 
I could also imagine having retention policies per metric per tenant. For example, there might be metrics for metering that should be stored for a longer period than operational metrics. There isn't a way to do this directly in InfluxDB using the built-in data retention policy. However, it could possibly be done using delete and scheduling jobs that periodically run that prune the database.
 
For the Vertica database, we, as in HPE, simulate retention policies by running a cron job that drops partitions after some period of time, such as 45 days. Charter has a more sophisticated cron job that deletes metrics from specific tenants at different periods than the operational metrics. For example, tenants of the cloud might have their metrics deleted every two weeks. Metering metrics might be deleted every 13 months.
 
The problem with deleting specific metrics is the performance. Dropping partitions is extremely fast. However, deleting metrics might be slow and also lock the database and prevent writes and/or queries to it. Therefore, to delete metrics, you could trickle deletes in, reducing the overall impact for any period of time, or do in the Charter case, run the deletion script at 2:00 AM in the morning, when usage of the system is light. 
 
Metrics deleting: In influxDB 1.1 (or any version after 0.9), it supports deleting series, though you cannot specify time interval for this operation. It simply deletes all points from a series in a database. I think one of the tricky parts is to decide the data dependent on a metric to be deleted, such as measurements, alarms. Please point it out if my understanding is not precise. 
The problem I believe is that a single series in InfluxDB has the data for multiple tenants. Deleting a single series would then result in deleting series for all tenants. Similar to data reten