Re: [openstack-dev] [Ceilometer][Gnocchi] question on integration with time-series databases
On 18 Jun 2015 at 04:44:18, gordon chung (g...@live.ca) wrote: On 17/06/2015 12:57 PM, Chris Dent wrote: On Tue, 16 Jun 2015, Simon Pasquier wrote: I'm still struggling to see how these optimizations would be implemented since the current Gnocchi design has separate backends for indexing and storage which means that datapoints (id + timestamp + value) and metric metadata (tenant_id, instance_id, server group, ...) are stored into different places. I'd be interested to hear from the Gnocchi team how this is going to be tackled. For instance, does it imply modifications or extensions to the existing Gnocchi API? I think there's three things to keep in mind: a) The plan is to figure it out and make it work well, production ready even. That will require some iteration. At the moment the overlap between InfluxDB python driver maturity and someone-to-do-the- work is not great. When it is I'm sure the full variety of optimizations will be explored, with actual working code and test cases. just curious but what bugs are we waiting on for the influxdb driver? i'm hoping Paul Dix has prioritised them? b) Gnocchi has separate _interfaces_ for indexing and storage. This is not the same as having separate _backends_[1]. If it turns out that the right way to get InfluxDB working is for it to be the same backend to the two separate interfaces then that will be okay. i'll straddle the middle line here and say i think we need to wait for a viable driver before we can start making the appropriate adjustments. having said that, i think once we have the gaps resolved, i think we should make all effort to conform to the rules of the db (whether it is influxdb, kairosdb, opentsdb). we faced a similar issue with the previous data storage design where we generically applied a design for one driver across all drivers and that led to terribly inefficient design everywhere. I'd like to emphasise that using the same backend for both data-point time-series and the identification of the resources linked to those time-series is not only the right way, it is the mandatory way. The most salient reason being that we shall not mandate other applications consuming time-series produced through Gnocchi to use anything else than the time-series backend native API. Operators who want to use InfluxDB, OpenTSDB or something else, as their time-series backend, do it for a reason. The choice of an API that best suits their needs is key to that decision. It is also a question of effectiveness. There are plenty of applications out there like Grafana that plug into those time-series out-of-the-box. I don’t think we want to force those applications to use the Gnocchi API instead. - Patrick c) The future is unknown and the present is not made of stone. There could be modifications and extensions to the existing stuff. We don't know. Yet. [1] Yes the existing implementations use SQL for the indexer and various subclasses of the carbonara abstraction as two backends for the two interfaces. That's an accident of history not a design requirement. -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Gnocchi] question on integration with time-series databases
On 17/06/2015 12:57 PM, Chris Dent wrote: On Tue, 16 Jun 2015, Simon Pasquier wrote: I'm still struggling to see how these optimizations would be implemented since the current Gnocchi design has separate backends for indexing and storage which means that datapoints (id + timestamp + value) and metric metadata (tenant_id, instance_id, server group, ...) are stored into different places. I'd be interested to hear from the Gnocchi team how this is going to be tackled. For instance, does it imply modifications or extensions to the existing Gnocchi API? I think there's three things to keep in mind: a) The plan is to figure it out and make it work well, production ready even. That will require some iteration. At the moment the overlap between InfluxDB python driver maturity and someone-to-do-the- work is not great. When it is I'm sure the full variety of optimizations will be explored, with actual working code and test cases. just curious but what bugs are we waiting on for the influxdb driver? i'm hoping Paul Dix has prioritised them? b) Gnocchi has separate _interfaces_ for indexing and storage. This is not the same as having separate _backends_[1]. If it turns out that the right way to get InfluxDB working is for it to be the same backend to the two separate interfaces then that will be okay. i'll straddle the middle line here and say i think we need to wait for a viable driver before we can start making the appropriate adjustments. having said that, i think once we have the gaps resolved, i think we should make all effort to conform to the rules of the db (whether it is influxdb, kairosdb, opentsdb). we faced a similar issue with the previous data storage design where we generically applied a design for one driver across all drivers and that led to terribly inefficient design everywhere. c) The future is unknown and the present is not made of stone. There could be modifications and extensions to the existing stuff. We don't know. Yet. [1] Yes the existing implementations use SQL for the indexer and various subclasses of the carbonara abstraction as two backends for the two interfaces. That's an accident of history not a design requirement. -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Gnocchi] question on integration with time-series databases
On Tue, 16 Jun 2015, Simon Pasquier wrote: I'm still struggling to see how these optimizations would be implemented since the current Gnocchi design has separate backends for indexing and storage which means that datapoints (id + timestamp + value) and metric metadata (tenant_id, instance_id, server group, ...) are stored into different places. I'd be interested to hear from the Gnocchi team how this is going to be tackled. For instance, does it imply modifications or extensions to the existing Gnocchi API? I think there's three things to keep in mind: a) The plan is to figure it out and make it work well, production ready even. That will require some iteration. At the moment the overlap between InfluxDB python driver maturity and someone-to-do-the- work is not great. When it is I'm sure the full variety of optimizations will be explored, with actual working code and test cases. b) Gnocchi has separate _interfaces_ for indexing and storage. This is not the same as having separate _backends_[1]. If it turns out that the right way to get InfluxDB working is for it to be the same backend to the two separate interfaces then that will be okay. c) The future is unknown and the present is not made of stone. There could be modifications and extensions to the existing stuff. We don't know. Yet. [1] Yes the existing implementations use SQL for the indexer and various subclasses of the carbonara abstraction as two backends for the two interfaces. That's an accident of history not a design requirement. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer][Gnocchi] question on integration with time-series databases
Hi, Originally, I posted this question on the review [0] that adds InfluxDB support to Gnocchi but Julien felt that it wasn't relevant in the scope of the review. Still I think that it deserves some discussion... The current implementation of the InfluxDB driver for Gnocchi doesn't follow the recommendations for InfluxDB 0.9 [1] as it doesn't use tags at all. As a result, each metric will be stored in an individual series which makes aggregation across metrics suboptimal from the InfluxDB point of view. With tags properly implemented, a query like 'return the cpu.util measures for this group of servers in this given interval' is only one InfluxDB query while it would result in N queries with the proposed change. In fact, the same issue can be seen in the OpenTSDB [2] and KairosDB [3] reviews too. And my guess is that all production-grade backends will provide the same type of semantic on metrics (call it tags, labels or dimensions). Julien's anwser to this was: There's no point in talking about optimizing a driver until it's implemented. For now, neither InfluxDB or Kairos nor OpenTSDB drivers are ready for Gnocchi. Once they are, we'll be able to talk about changing the implementation of the storage/driver API to leverage their abilities such as tags. I'm still struggling to see how these optimizations would be implemented since the current Gnocchi design has separate backends for indexing and storage which means that datapoints (id + timestamp + value) and metric metadata (tenant_id, instance_id, server group, ...) are stored into different places. I'd be interested to hear from the Gnocchi team how this is going to be tackled. For instance, does it imply modifications or extensions to the existing Gnocchi API? BR, Simon [0] https://review.openstack.org/#/c/165407/ [1] http://influxdb.com/docs/v0.9/concepts/schema_and_data_layout.html [2] https://review.openstack.org/#/c/107986 [3] https://review.openstack.org/#/c/159476 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev