Re: [openstack-dev] [Ceilometer][Gnocchi] question on integration with time-series databases

2015-06-18 Thread Patrick Petit
On 18 Jun 2015 at 04:44:18, gordon chung (g...@live.ca) wrote:


On 17/06/2015 12:57 PM, Chris Dent wrote: 
 On Tue, 16 Jun 2015, Simon Pasquier wrote: 
 
 I'm still struggling to see how these optimizations would be implemented 
 since the current Gnocchi design has separate backends for indexing and 
 storage which means that datapoints (id + timestamp + value) and metric 
 metadata (tenant_id, instance_id, server group, ...) are stored into 
 different places. I'd be interested to hear from the Gnocchi team how 
 this 
 is going to be tackled. For instance, does it imply modifications or 
 extensions to the existing Gnocchi API? 
 
 I think there's three things to keep in mind: 
 
 a) The plan is to figure it out and make it work well, production 
 ready even. That will require some iteration. At the moment the 
 overlap between InfluxDB python driver maturity and someone-to-do-the- 
 work is not great. When it is I'm sure the full variety of 
 optimizations will be explored, with actual working code and test 
 cases. 

just curious but what bugs are we waiting on for the influxdb driver? 
i'm hoping Paul Dix has prioritised them? 

 
 b) Gnocchi has separate _interfaces_ for indexing and storage. This 
 is not the same as having separate _backends_[1]. If it turns out 
 that the right way to get InfluxDB working is for it to be the 
 same backend to the two separate interfaces then that will be 
 okay. 

i'll straddle the middle line here and say i think we need to wait for a 
viable driver before we can start making the appropriate adjustments. 
having said that, i think once we have the gaps resolved, i think we 
should make all effort to conform to the rules of the db (whether it is 
influxdb, kairosdb, opentsdb). we faced a similar issue with the 
previous data storage design where we generically applied a design for 
one driver across all drivers and that led to terribly inefficient 
design everywhere. 
I'd like to emphasise that using the same backend for both data-point 
time-series and the identification of the resources linked to those time-series 
is not only the right way, it is the mandatory way. The most salient reason 
being that we shall not mandate other applications consuming time-series 
produced through Gnocchi to use anything else than the time-series backend 
native API. Operators who want to use InfluxDB, OpenTSDB or something else, as 
their time-series backend, do it for a reason. The choice of an API that best 
suits their needs is key to that decision. It is also a question of 
effectiveness. There are plenty of applications out there like Grafana that 
plug into those time-series out-of-the-box. I don’t think we want to force 
those applications to use the Gnocchi API instead.

 - Patrick



 
 c) The future is unknown and the present is not made of stone. There 
 could be modifications and extensions to the existing stuff. We 
 don't know. Yet. 
 
 [1] Yes the existing implementations use SQL for the indexer and 
 various subclasses of the carbonara abstraction as two backends 
 for the two interfaces. That's an accident of history not a design 
 requirement. 

-- 
gord 


__ 
OpenStack Development Mailing List (not for usage questions) 
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Gnocchi] question on integration with time-series databases

2015-06-17 Thread gordon chung



On 17/06/2015 12:57 PM, Chris Dent wrote:

On Tue, 16 Jun 2015, Simon Pasquier wrote:


I'm still struggling to see how these optimizations would be implemented
since the current Gnocchi design has separate backends for indexing and
storage which means that datapoints (id + timestamp + value) and metric
metadata (tenant_id, instance_id, server group, ...) are stored into
different places. I'd be interested to hear from the Gnocchi team how 
this

is going to be tackled. For instance, does it imply modifications or
extensions to the existing Gnocchi API?


I think there's three things to keep in mind:

a) The plan is to figure it out and make it work well, production
   ready even. That will require some iteration. At the moment the
   overlap between InfluxDB python driver maturity and someone-to-do-the-
   work is not great. When it is I'm sure the full variety of
   optimizations will be explored, with actual working code and test
   cases.


just curious but what bugs are we waiting on for the influxdb driver? 
i'm hoping Paul Dix has prioritised them?




b) Gnocchi has separate _interfaces_ for indexing and storage. This
   is not the same as having separate _backends_[1]. If it turns out
   that the right way to get InfluxDB working is for it to be the
   same backend to the two separate interfaces then that will be
   okay.


i'll straddle the middle line here and say i think we need to wait for a 
viable driver before we can start making the appropriate adjustments. 
having said that, i think once we have the gaps resolved, i think we 
should make all effort to conform to the rules of the db (whether it is 
influxdb, kairosdb, opentsdb). we faced a similar issue with the 
previous data storage design where we generically applied a design for 
one driver across all drivers and that led to terribly inefficient 
design everywhere.




c) The future is unknown and the present is not made of stone. There
   could be modifications and extensions to the existing stuff. We
   don't know. Yet.

[1] Yes the existing implementations use SQL for the indexer and
various subclasses of the carbonara abstraction as two backends
for the two interfaces. That's an accident of history not a design
requirement.


--
gord


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Gnocchi] question on integration with time-series databases

2015-06-17 Thread Chris Dent

On Tue, 16 Jun 2015, Simon Pasquier wrote:


I'm still struggling to see how these optimizations would be implemented
since the current Gnocchi design has separate backends for indexing and
storage which means that datapoints (id + timestamp + value) and metric
metadata (tenant_id, instance_id, server group, ...) are stored into
different places. I'd be interested to hear from the Gnocchi team how this
is going to be tackled. For instance, does it imply modifications or
extensions to the existing Gnocchi API?


I think there's three things to keep in mind:

a) The plan is to figure it out and make it work well, production
   ready even. That will require some iteration. At the moment the
   overlap between InfluxDB python driver maturity and someone-to-do-the-
   work is not great. When it is I'm sure the full variety of
   optimizations will be explored, with actual working code and test
   cases.

b) Gnocchi has separate _interfaces_ for indexing and storage. This
   is not the same as having separate _backends_[1]. If it turns out
   that the right way to get InfluxDB working is for it to be the
   same backend to the two separate interfaces then that will be
   okay.

c) The future is unknown and the present is not made of stone. There
   could be modifications and extensions to the existing stuff. We
   don't know. Yet.

[1] Yes the existing implementations use SQL for the indexer and
various subclasses of the carbonara abstraction as two backends
for the two interfaces. That's an accident of history not a design
requirement.
--
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer][Gnocchi] question on integration with time-series databases

2015-06-16 Thread Simon Pasquier
Hi,

Originally, I posted this question on the review [0] that adds InfluxDB
support to Gnocchi but Julien felt that it wasn't relevant in the scope of
the review. Still I think that it deserves some discussion...

The current implementation of the InfluxDB driver for Gnocchi doesn't
follow the recommendations for InfluxDB 0.9 [1] as it doesn't use tags at
all. As a result, each metric will be stored in an individual series which
makes aggregation across metrics suboptimal from the InfluxDB point of
view. With tags properly implemented, a query like 'return the cpu.util
measures for this group of servers in this given interval' is only one
InfluxDB query while it would result in N queries with the proposed change.
In fact, the same issue can be seen in the OpenTSDB [2] and KairosDB [3]
reviews too. And my guess is that all production-grade backends will
provide the same type of semantic on metrics (call it tags, labels or
dimensions).

Julien's anwser to this was:

There's no point in talking about optimizing a driver until it's
implemented. For now, neither InfluxDB or Kairos nor OpenTSDB drivers are
ready for Gnocchi. Once they are, we'll be able to talk about changing the
implementation of the storage/driver API to leverage their abilities such
as tags.

I'm still struggling to see how these optimizations would be implemented
since the current Gnocchi design has separate backends for indexing and
storage which means that datapoints (id + timestamp + value) and metric
metadata (tenant_id, instance_id, server group, ...) are stored into
different places. I'd be interested to hear from the Gnocchi team how this
is going to be tackled. For instance, does it imply modifications or
extensions to the existing Gnocchi API?

BR,
Simon

[0] https://review.openstack.org/#/c/165407/
[1] http://influxdb.com/docs/v0.9/concepts/schema_and_data_layout.html
[2] https://review.openstack.org/#/c/107986
[3] https://review.openstack.org/#/c/159476
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev