Re: [Gluster-devel] Metrics: and how to get them out from gluster

2017-09-08 Thread Amar Tumballi
Thanks all for the feedback!

On Sat, Sep 2, 2017 at 12:21 AM, John Strunk  wrote:

>
>
> On Fri, Sep 1, 2017 at 1:27 AM, Amar Tumballi  wrote:
>
>> Disclaimer: This email is long, and did take significant time to write.
>> Do take time and read, review and give feedback, so we can have some
>> metrics related tasks done by Gluster 4.0
>>
>> ---
>> ** History:*
>>
>> To understand what is happening inside GlusterFS process, over the years,
>> we have opened many bugs and also coded few things with regard to
>> statedump, and did put some effort into io-stats translator to improve the
>> gluster's monitoring capabilities.
>>
>> But surely there is more required! And some glimpse of it is captured in
>> [1], [2], [3] & [4]. Also, I did send an email to this group [5] about
>> possibilities of capturing this information.
>>
>> ** Current problem:*
>>
>> When we talk about metrics or monitoring, we have to consider giving out
>> these data to a tool which can preserve the readings in a periodic time,
>> without a time graph, no metrics will make sense! So, the first challenge
>> itself is how to get them out? Should getting the metrics out from each
>> process need 'glusterd' interacting? or should we use signals? Which leads
>> us to *'challenge #1'.*
>>
>> Next is, should we depend on io-stats to do the reporting? If yes, how to
>> get information from between any two layers? Should we provide io-stats in
>> between all the nodes of translator graph? or should we utilize
>> STACK_WIND/UNWIND framework to get the details? This is our *'challenge
>> #2'*
>>
>> Once the above decision will be taken, then the question is, "what about
>> 'metrics' from other translators? Who gives it out (ie, dumps it?)? Why do
>> we need something similar to statedump, and can't we read info from
>> statedump itself?". But when we say 'metrics', we should have a key and a
>> number associated with it, statedump has lot more, and no format. If its
>> different from statedump, then what is our answer for translator code to
>> give out metrics? This is our *'challenge #3*'
>>
>> If we get a solution to above challenges, then I guess we are in a decent
>> shape for further development. Lets go through them one by one, in detail.
>>
>> ** Problems and proposed solutions:*
>>
>> *a) how to dump metrics data ?*
>>
>> Currently, I propose signal handler way, as it will give control for us
>> to choose what are the processes we need to capture information on, and
>> will be much faster than communicating through another tool. Also
>> considering we need to have these metrics taken every 10sec or so, there
>> will be a need for efficient way to get this out.
>>
>> But even there, we have challenges, because we have already chosen both
>> USR1 and USR2 signal handlers, one for statedump, another for toggling
>> latency monitoring respectively. It makes sense to continue to have
>> statedump use USR1, but toggling options should be technically (for
>> correctness too) be handled by glusterd volume set options, and there
>> should be a way to handle it in a better way by our 'reconfigure()'
>> framework in graph-switch. Proposal sent in github issue #303 [6].
>>
>> If we are good with above proposal, then we can make use of USR2 for
>> metrics dump. Next issue will be about the format of the file itself, which
>> we will discuss at the end of the email.
>>
>> NOTE: Above approach is already implemented in 'experimental' branch,
>> excluding handling of [6].
>>
>
>
This was done with SIGUSR2 mainly because for the 'implementation' to test
out other things, it was just 1 line change :-)

We should surely plan something else IMO too. Will wait for somemore time
before doing anything here.



> I'm going to pile on with the others discouraging the use of signals and
> put a vote in favor of using a network socket.
> In a previous project [10], we used a listening TCP socket to provide
> metrics to graphite. This has the ability to support multiple receivers by
> just sending a copy to each currently open connection, and if there is a
> concern about overwhelming receivers and/or slowing down the gluster
> sending side, these could be non-blocking sockets that simply drop data if
> there is no room in the outbound buffer. The data format we used was
> exactly the Graphite text format [11], which includes a timestamp directly
> with each metric. The downside is extra data, but it removes
> transmission/processing/queuing latency concerns. In practice, we
> calculated the timestamp once and used it for all metrics sent in the
> interval to minimize the overhead imposed by repeated gettimeofday().
> Another reason I like the socket approach is that in containerized
> environments, I can easily run a sidecar that grabs the metrics and
> forwards or processes them and it doesn't have to share anything more than
> a network port.
>
> The biggest drawback to the socket approach is it's passive nature. The
> 

Re: [Gluster-devel] Metrics: and how to get them out from gluster

2017-09-01 Thread John Strunk
On Fri, Sep 1, 2017 at 1:27 AM, Amar Tumballi  wrote:

> Disclaimer: This email is long, and did take significant time to write. Do
> take time and read, review and give feedback, so we can have some metrics
> related tasks done by Gluster 4.0
>
> ---
> ** History:*
>
> To understand what is happening inside GlusterFS process, over the years,
> we have opened many bugs and also coded few things with regard to
> statedump, and did put some effort into io-stats translator to improve the
> gluster's monitoring capabilities.
>
> But surely there is more required! And some glimpse of it is captured in
> [1], [2], [3] & [4]. Also, I did send an email to this group [5] about
> possibilities of capturing this information.
>
> ** Current problem:*
>
> When we talk about metrics or monitoring, we have to consider giving out
> these data to a tool which can preserve the readings in a periodic time,
> without a time graph, no metrics will make sense! So, the first challenge
> itself is how to get them out? Should getting the metrics out from each
> process need 'glusterd' interacting? or should we use signals? Which leads
> us to *'challenge #1'.*
>
> Next is, should we depend on io-stats to do the reporting? If yes, how to
> get information from between any two layers? Should we provide io-stats in
> between all the nodes of translator graph? or should we utilize
> STACK_WIND/UNWIND framework to get the details? This is our *'challenge
> #2'*
>
> Once the above decision will be taken, then the question is, "what about
> 'metrics' from other translators? Who gives it out (ie, dumps it?)? Why do
> we need something similar to statedump, and can't we read info from
> statedump itself?". But when we say 'metrics', we should have a key and a
> number associated with it, statedump has lot more, and no format. If its
> different from statedump, then what is our answer for translator code to
> give out metrics? This is our *'challenge #3*'
>
> If we get a solution to above challenges, then I guess we are in a decent
> shape for further development. Lets go through them one by one, in detail.
>
> ** Problems and proposed solutions:*
>
> *a) how to dump metrics data ?*
>
> Currently, I propose signal handler way, as it will give control for us to
> choose what are the processes we need to capture information on, and will
> be much faster than communicating through another tool. Also considering we
> need to have these metrics taken every 10sec or so, there will be a need
> for efficient way to get this out.
>
> But even there, we have challenges, because we have already chosen both
> USR1 and USR2 signal handlers, one for statedump, another for toggling
> latency monitoring respectively. It makes sense to continue to have
> statedump use USR1, but toggling options should be technically (for
> correctness too) be handled by glusterd volume set options, and there
> should be a way to handle it in a better way by our 'reconfigure()'
> framework in graph-switch. Proposal sent in github issue #303 [6].
>
> If we are good with above proposal, then we can make use of USR2 for
> metrics dump. Next issue will be about the format of the file itself, which
> we will discuss at the end of the email.
>
> NOTE: Above approach is already implemented in 'experimental' branch,
> excluding handling of [6].
>


I'm going to pile on with the others discouraging the use of signals and
put a vote in favor of using a network socket.
In a previous project [10], we used a listening TCP socket to provide
metrics to graphite. This has the ability to support multiple receivers by
just sending a copy to each currently open connection, and if there is a
concern about overwhelming receivers and/or slowing down the gluster
sending side, these could be non-blocking sockets that simply drop data if
there is no room in the outbound buffer. The data format we used was
exactly the Graphite text format [11], which includes a timestamp directly
with each metric. The downside is extra data, but it removes
transmission/processing/queuing latency concerns. In practice, we
calculated the timestamp once and used it for all metrics sent in the
interval to minimize the overhead imposed by repeated gettimeofday().
Another reason I like the socket approach is that in containerized
environments, I can easily run a sidecar that grabs the metrics and
forwards or processes them and it doesn't have to share anything more than
a network port.

The biggest drawback to the socket approach is it's passive nature. The
receiver is stuck with whatever stat frequency gluster chooses, though this
could be configured either globally or per connection.

[10] https://github.com/NTAP/chronicle/
[11]
https://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol



>
> *b) where to measure the latency and fops counts?*
>
> One of the possible way is to load io-stats in between all the nodes, but
> it has its own limitations. Mainly, how to configure 

Re: [Gluster-devel] Metrics: and how to get them out from gluster

2017-09-01 Thread Xavier Hernandez

Hi Amar,

I don't have time to review the changes in experimental branch yet, but 
here are some comments about these ideas...


On 01/09/17 07:27, Amar Tumballi wrote:
Disclaimer: This email is long, and did take significant time to write. 
Do take time and read, review and give feedback, so we can have some 
metrics related tasks done by Gluster 4.0


---
** History:*

To understand what is happening inside GlusterFS process, over the 
years, we have opened many bugs and also coded few things with regard to 
statedump, and did put some effort into io-stats translator to improve 
the gluster's monitoring capabilities.


But surely there is more required! And some glimpse of it is captured in 
[1], [2], [3] & [4]. Also, I did send an email to this group [5] about 
possibilities of capturing this information.


** Current problem:*

When we talk about metrics or monitoring, we have to consider giving out 
these data to a tool which can preserve the readings in a periodic time, 
without a time graph, no metrics will make sense! So, the first 
challenge itself is how to get them out? Should getting the metrics out 
from each process need 'glusterd' interacting? or should we use signals? 
Which leads us to *'challenge #1'.*


One problem I see here is that we will have multiple bricks and multiple 
clients (including FUSE and gfapi).


I assume we want to be able to monitor whole volume performance 
(aggregate values of all mount points), specific mount performance, and 
even specific brick performance.


In this case, the signal approach seems quite difficult to me, specially 
for gfapi based clients. Even for fuse mounts and brick processes we 
would need to connect to each place where one of these processes is and 
send the signal there. In this case, some clients may be not prepared to 
be accessed remotely in an easy way.


Using glusterd this problem could be minimized, but I'm not sure that 
the interface would be easy to implement (basically because we would 
need some kind of filtering syntax to avoid huge outputs) and the output 
could be complex to parse for other tools, specially considering that 
the amount of data could be significant and it will can change with the 
addition or change of translators.


I propose a third approach. It's based on a virtual directory similar to 
/sys and /proc on linux. We already have /.meta in gluster. We could 
extend that in a way that we could have data there from each mount point 
(fuse of gfapi), and each brick. Then we could define an api to allow 
each xlator to publish information in that directory in a simple way.


Using this approach, monitor tools can check only the interesting data 
directly mounting the volume as any other client and reading the desired 
values.


To implement this we could centralize all statistics capturing in 
libglusterfs itself, and create a new translator (or reuse meta) to 
gather this information from libglusterfs and publish it into the 
virtual directory (probably we would need a server side and a client 
side xlator to be able to combine data from all mounts and bricks).




Next is, should we depend on io-stats to do the reporting? If yes, how 
to get information from between any two layers? Should we provide 
io-stats in between all the nodes of translator graph?


I whouldn't depend on io-stats for reporting all the data. The 
monitoring seems to me a deeper thing than what a single translator can do.


Using the virtual directory approach, io-stats can place its statistics 
there, but it doesn't need to be aware of all other possible statistics 
from other xlators because each one will report its own statistics 
independently.


or should we 
utilize STACK_WIND/UNWIND framework to get the details? This is our 
*'challenge #2'*


I think that gluster core itself (basically libglusterfs) should keep 
its own details on global things like this. This details could also be 
published in the virtual directory. From my point of view, io-stats 
should be left to provide global timings for the fops or be merged with 
the STACK_WIND/UNWIND framework and removed as an xlator.




Once the above decision will be taken, then the question is, "what about 
'metrics' from other translators? Who gives it out (ie, dumps it?)? Why 
do we need something similar to statedump, and can't we read info from 
statedump itself?".


I think it would be better and easier to move the information from the 
statedump to the virtual directory instead of trying to use the 
statedump to report everything.


But when we say 'metrics', we should have a key and 
a number associated with it, statedump has lot more, and no format. If 
its different from statedump, then what is our answer for translator 
code to give out metrics? This is our *'challenge #3*'


Using the virtual directory structure, our key would be an specific file 
name in some directory that represents the hierarchical structure of the 
volume (xlators), and the value would be its