Re: [PATCH] MEDIUM/RFC: Implement time-based server latency metrics

2017-01-04 Thread Willy Tarreau
Hi Krishna,

On Thu, Jan 05, 2017 at 11:15:46AM +0530, Krishna Kumar (Engineering) wrote:
> Hi Willy,
> 
> If required, I can try to make the "hard-coded periods" changes too, but
> want
> to hear your opinion as the code gets very complicated, and IMHO, may not
> give correct results depending on when the request is made. All the other
> changes are doable.
> 
> Hoping to hear from you on this topic, please let me know your opinion.

I've started to think about it but had to stop, I'm just busy dealing with
some painful bugs so it takes me more time to review code additions. I'm
intentionally keeping your mail marked unread in order to get back to it
ASAP.

Thanks,
Willy



ALERT:sendmsg logger #1 failed: Resource temporarily unavailable (errno=11)

2017-01-04 Thread Igor Cicimov
Hi all,

On one of my haproxy's I get the following message on reload:

[ALERT] 004/070949 (21440) : sendmsg logger #1 failed: Resource temporarily
unavailable (errno=11)

Has anyone seen this before or any pointers where to look for to correct
this?

Thanks,
Igor


Re: 400 error on cookie string

2017-01-04 Thread cas
  Yes, please apply Willy's patch to the 1.7.1 release and tell us what happens. Everything looks good. No errors.   About 1.6.11. It was in the first days of December. May be I've mixed up something.I can test again tomorrow if you want  It was 1.6.10. I think I've made some mistakes in paths so maybe I've started as a service 1.7 and haproxy -vv 1.6. I'm not pro in nix, sorry for confusing you

Re: [PATCH] MEDIUM/RFC: Implement time-based server latency metrics

2017-01-04 Thread Krishna Kumar (Engineering)
Hi Willy,

If required, I can try to make the "hard-coded periods" changes too, but
want
to hear your opinion as the code gets very complicated, and IMHO, may not
give correct results depending on when the request is made. All the other
changes are doable.

Hoping to hear from you on this topic, please let me know your opinion.

Regards,
- Krishna


On Tue, Jan 3, 2017 at 3:07 PM, Krishna Kumar (Engineering) <
krishna...@flipkart.com> wrote:

> Hi Willy,
>
> Sorry for the late response as I was out during the year end, and thanks
> once again for your review comments.
>
> I explored your suggestion of "hard-coded periods", and have some
> problems: code complexity seems to be very high at updates (as well
> as retrievals possibly); and I may not be able to get accurate results.
> E.g. I have data for 1, 4, 16 seconds; and at 18 seconds, a request is
> made for retrieval of the last 16 seconds (or 1,4,16). At this time I have
> values for last 18 seconds not 16 seconds. I explored using timers to
> cascade (will not work as it may run into races with the setters, and
> also adds too much overhead) vs doing this synchronously when the
> event happens. Both are complicated and have the above issue of not
> able to get accurate information depending on when the request is
> made.
>
> To implement your suggestion of say histograms, the retrieval code can
> calculate the 4 values (1, 4, 16, and 64 seconds) by averaging across
> the correct intervals. In this case, the new CLI command is not required,
> and by default it prints all 4 values. Would this work in your opinion?
>
> Ack all your other suggestions, will incorporate those changes and
> re-send. Please let me know if this sounds reasonable.
>
> Thanks,
> - Krishna
>
>
> On Thu, Dec 22, 2016 at 4:23 PM, Willy Tarreau  wrote:
>
>> Hi Krishna,
>>
>> On Thu, Dec 22, 2016 at 09:41:49AM +0530, Krishna Kumar (Engineering)
>> wrote:
>> > We have found that the current mechanism of qtime, ctime, rtime, and
>> ttime
>> > based on last 1024 requests is not the most suitable to debug/visualize
>> > latency issues with servers, especially if they happen to last a very
>> short
>> > time. For live dashboards showing server timings, we found an additional
>> > last-'n' seconds metrics useful. The logs could also be parsed to derive
>> > these
>> > values, but suffers from delays at high volume, requiring higher
>> processing
>> > power and enabling logs.
>> >
>> > The 'last-n' seconds metrics per server/backend can be configured as
>> follows
>> > in the HAProxy configuration file:
>> > backend backend-1
>> > stats-period 32
>> > ...
>> >
>> > To retrieve these stats at the CLI (in addition to existing metrics),
>> run:
>> > echo show stat-duration time 3 | socat /var/run/admin.sock stdio
>> >
>> > These are also available on the GUI.
>> >
>> > The justification for this patch are:
>> > 1. Allows to capture spikes for a server during a short period. This
>> helps
>> >having dashboards that show server response times every few seconds
>> (e.g.
>> >every 1 second), so as to be able to chart it across timelines.
>> > 2. Be able to get an average across different time intervals, e.g.  the
>> >configuration file may specify to save the last 32 seconds, but the
>> cli
>> >interface can request for average across any interval upto 32
>> seconds.
>> > E.g.
>> >the following command prints the existing metrics appended by the
>> time
>> >based ones for the last 1 second:
>> > echo show stat-duration time 1 | socat /var/run/admin.sock stdio
>> >Running the following existing command appends the time-based metric
>> > values
>> >based on the time period configured in the configuration file per
>> >backend/server:
>> > echo show stat | socat /var/run/admin.sock stdio
>> > 3. Option per backend for configuring the server stat's time interval,
>> and.
>> >no API breakage to stats (new metrics are added at end of line).
>> >
>> > Please review, any feedback on the code/usability/extensibility is very
>> much
>> > appreciated.
>>
>> First, thanks for this work. I'm having several concerns and comments
>> however
>> about it.
>>
>> The first one is that the amount of storage is overkill if the output can
>> only emit an average over a few periods. I mean, the purpose of stats is
>> to emit what we know internally. Some people might want to see
>> historgrams,
>> and while we have everything internally with your patch, it's not possible
>> to produce them.
>>
>> For this reason I think we should proceed differently and always emit
>> these
>> stats over a few hard-coded periods. You proved that they don't take that
>> much space, and I think it would make sense probably to emit them over a
>> small series of power of 4 seconds : 1s, 4s, 16s, 64s. That's quite cheap
>> to store and easy to compute because it's not needed anymore to store all
>> individual values, you can cascade them while filling a bucket.
>>
>> And if you go down 

Re: TLS-PSK support for haproxy?

2017-01-04 Thread Nenad Merdanovic
I have a working patch for this, but it's very ugly currently (minimal
error checking, no warnings/messages, no docs, very basic tests done
only, etc.)

I expect to have a version for review by EOW (depending on the workload,
maybe a bit sooner).

Regards,
Nenad

On 1/2/2017 10:11 AM, Gil Bahat wrote:
> yes, stunnel was my original inspiration for this request, I wanted
> HAproxy to communicate with stunnel-backed services. actually, stunnel
> implements both PSK server and PSK client and it would make sense for
> HAproxy to have both. TLS 1.3 also appears to significantly improve PSK
> with combinations such as RSA-PSK and ECDHE-PSK, so that appears to have
> future usability as well.
> 
> Regards,
> 
> Gil
> 
> On Sun, Jan 1, 2017 at 5:41 PM, Igor Pav  > wrote:
> 
> Stunnel supports it, https://www.stunnel.org/auth.html
> , quite simple.
> 
> On Sun, Jan 1, 2017 at 4:34 PM, Willy Tarreau  > wrote:
> > On Sun, Jan 01, 2017 at 01:16:37AM +0800, Igor Pav wrote:
> >> Sounds good for SSL backend, is this possible?
> >
> > Indeed that sounds interesting for such use cases. I have no idea
> what it
> > requires to set it up nor what needs to be configurable. Does
> anyone have
> > any pointer to any product supporting it ?
> >
> > Willy
> 
> 




Re: 400 error on cookie string

2017-01-04 Thread Lukas Tribus

Hi Willy,


Am 03.01.2017 um 21:10 schrieb Willy Tarreau:


It was not dropped since the server SACKed it (but until it's within the
window the stack is free to change its mind). In fact following a TCP
stream in wireshark never gives any useful information. You *always* need
the absolute sequence numbers and ack numbers on each and every packet and
you cannot even trust the payload reported on a packet because payload
spanning over multiple packets is generally reported at the end. That's
why in the end, the much dumber tcpdump is always much more reliable.

So here's what we have :


Thanks for taking the time to explain, it makes complete sense now.


Sounds like the MSS "clamp" works just in one direction and it the other 
direction real
PathMTU comes into play (router sends back "Fragmentation needed"). 
Usually MSS

clamping will apply the MSS clamp in both directions (SYN and SYN/ACK).

Anyway, the TCP is fine, that's the important thing.



About 1.6.11. It was in the first days of December. May be I've mixed 
up something.

I can test again tomorrow if you want


Yes, please apply Willy's patch to the 1.7.1 release and tell us what 
happens.




Regards,

Lukas




Re: HAProxy's health checks and maxconn limits

2017-01-04 Thread Lukas Tribus

Hi Jiri,


Am 04.01.2017 um 11:38 schrieb Jiri Mencak:

Hi,

we are using HAProxy with its default 2000 maxconn limit and a listen 
block:


listen stats :1936
mode http
monitor-uri /healthz

which we use to check HAProxy's "health" by external HTTP probes.  The
behaviour I'm seeing is that once the default 2000 maxconn limit is 
reached,

HAProxy stops listening
[...]
As far as I can see, staying with the HTTP probe model, we can 
increase the

global maxconn limit or/and increase the health-check's timeout period.

Do you see any other options?


Increasing global maxconn is the correct thing to do.

Here's why:
They are multiple maxconn "levels" that work independently from each other.

We have maxconn at global (process) level, at listen/frontend level, and 
server level.



What you wanna do is make sure that the queuing happens in the lower 
tiers first:
1. maxconn at server level is exhausted, haproxy will queue or 
redispatch to another

backend server
2. maxconn at bind/frontend/listen level is exhausted, haproxy will stop 
accepting
new connections and the kernel will queue for this particular level 
(e.g. the specific frontend)
3. maxconn at the process level is exhausted, the kernel will queue for 
all frontends

on this kernel of the specific process

The reason is impact. If one frontend/listen section is overloaded and 
we are hitting its "own"
maxconn, other sections remain unaffected (your http probes in the 
listen section for example).


But if the global maxconn is exhausted, everything is gonna be affected 
(which is exactly what

you are seeing).



What are the dangers of setting the global maxconn limits really high 
apart

from increased HAProxy's memory usage?


Its really only memory (haproxy and kernel). But that also means that 
when you

go beyond what your box can handle in ram and swap, you gonna get haproxy
OOM-killed.

If you have 1GB of memory to spare on this box, you can set global maxconn
to 2 without any problems (estimate from the docs). And because each
listen/frontend section is still bound to the per section maxconn 
default (2000),

your monitor section won't be affected by exhaustion in the other sections.

Consider tuning per listen/frontend maxconn as well.

maxconn is the single most import performance setting in haproxy, and it 
should

be adjusted for each use case, considering available memory.


Also see:
https://cbonte.github.io/haproxy-dconv/1.6/configuration.html#4.2-maxconn



Regards,

Lukas




Re: (Без темы)

2017-01-04 Thread cas
About 1.6.11. It was in the first days of December. May be I've mixed up something. I can test again tomorrow if you want 04.01.2017, 17:43, "c...@xmonetize.net" : Hi, Thank you very much for fix. I just want to mention that I had this issue in 1.6.11 too.My name is Aleksey Gordeev. I'm glad that my information was useful. Also I will wait any commit, branch or tag to test it. It is very easy to test it.  I found it and fixed it! It was me again who added a bug in 1.7 with the optimizations for largeheaders and large requests. If certain conditions are met, we could readthe \r from previous data (as we guessed) and complain that the next bytewas not an LF once the remaining part arrived. I'm intending to merge the attached patch. "cas", I'm willing to add youas the reporter here since you provided lots of very valuable information,but for this it would be nice if you had a name :-) I'll backport it to 1.7. Unfortunately there's no easy workaround on anexisting configuration, so I'll produce 1.7.2 soon I guess. I'll try toadd the remaining missing information to make dumps more accurate regardingthe expected state (this would definitely had helped here). Cheers,Willy

[no subject]

2017-01-04 Thread cas
 Hi, Thank you very much for fix. I just want to mention that I had this issue in 1.6.11 too.My name is Aleksey Gordeev. I'm glad that my information was useful. Also I will wait any commit, branch or tag to test it. It is very easy to test it.  I found it and fixed it! It was me again who added a bug in 1.7 with the optimizations for largeheaders and large requests. If certain conditions are met, we could readthe \r from previous data (as we guessed) and complain that the next bytewas not an LF once the remaining part arrived. I'm intending to merge the attached patch. "cas", I'm willing to add youas the reporter here since you provided lots of very valuable information,but for this it would be nice if you had a name :-) I'll backport it to 1.7. Unfortunately there's no easy workaround on anexisting configuration, so I'll produce 1.7.2 soon I guess. I'll try toadd the remaining missing information to make dumps more accurate regardingthe expected state (this would definitely had helped here). Cheers,Willy

SOLVED! (Was: 400 error on cookie string)

2017-01-04 Thread Willy Tarreau
I found it and fixed it!

It was me again who added a bug in 1.7 with the optimizations for large
headers and large requests. If certain conditions are met, we could read
the \r from previous data (as we guessed) and complain that the next byte
was not an LF once the remaining part arrived.

I'm intending to merge the attached patch. "cas", I'm willing to add you
as the reporter here since you provided lots of very valuable information,
but for this it would be nice if you had a name :-)

I'll backport it to 1.7. Unfortunately there's no easy workaround on an
existing configuration, so I'll produce 1.7.2 soon I guess. I'll try to
add the remaining missing information to make dumps more accurate regarding
the expected state (this would definitely had helped here).

Cheers,
Willy
>From 5afd6967423caa2d55d4c5a5ae0e296b35e2292d Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Wed, 4 Jan 2017 14:44:46 +0100
Subject: BUG/MAJOR: http: fix risk of getting invalid reports of bad requests
X-Bogosity: Ham, tests=bogofilter, spamicity=0.00, version=1.2.4

Commits 5f10ea3 ("OPTIM: http: improve parsing performance of long URIs")
and 0431f9d ("OPTIM: http: improve parsing performance of long header lines")
introduced a bug in the HTTP parser : when a partial request is read, the
first part ends up on a 8-bytes boundary (or 4-byte on 32-bit machines), the
end lies in the header field value part, and the buffer used to contain a CR
character exactly after the last block, then the parser could be confused and
read this CR character as being part of the current request, then switch to a
new state waiting for an LF character. Then when the next part of the request
appeared, it would read the character following what was erroneously mistaken
for a CR, see that it is not an LF and fail on a bad request. The reason is
that there's no control of and of parsing just after breaking out of the loop.

One way to reproduce it is with this config :

  global
  stats socket /tmp/sock1 mode 666 level admin
  stats timeout 1d

  frontend  px
  bind :8001
  mode http
  timeout client 10s
  redirect location /

And sending requests this way :

  $ tcploop 8001 C P S:"$(dd if=/dev/zero bs=16384 count=1 2>/dev/null | tr 
'\000' '\r')"
  $ tcploop 8001 C P S:"$(dd if=/dev/zero bs=16384 count=1 2>/dev/null | tr 
'\000' '\r')"
  $ tcploop 8001 C P \
S:"GET / HTTP/1.0\r\nX-padding: 
0123456789.123456789.123456789.123456789.123456789.123456789.1234567" P \
S:"89.123456789\r\n\r\n" P

Then a "show errors" on the socket will report :

  $ echo "show errors" | socat - /tmp/sock1
  Total events captured on [04/Jan/2017:15:09:15.755] : 32

  [04/Jan/2017:15:09:13.050] frontend px (#2): invalid request
backend  (#-1), server  (#-1), event #31
src 127.0.0.1:59716, session #91, session flags 0x0080
HTTP msg state 17, msg flags 0x, tx flags 0x
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00808002, out 0 bytes, total 111 bytes
pending 111 bytes, wrapping at 16384, error at position 107:

0  GET / HTTP/1.0\r\n
00016  X-padding: 
0123456789.123456789.123456789.123456789.123456789.12345678
00086+ 9.123456789.123456789\r\n
00109  \r\n

This fix must be backported to 1.7.
---
 src/proto_http.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/proto_http.c b/src/proto_http.c
index aa8d997..d894ef6 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -1557,6 +1557,10 @@ const char *http_parse_reqline(struct http_msg *msg,
ptr += sizeof(int);
}
 #endif
+   if (ptr >= end) {
+   state = HTTP_MSG_RQURI;
+   goto http_msg_ood;
+   }
http_msg_rquri2:
if (likely((unsigned char)(*ptr - 33) <= 93)) /* 33 to 126 
included */
EAT_AND_JUMP_OR_RETURN(http_msg_rquri2, HTTP_MSG_RQURI);
@@ -1983,6 +1987,10 @@ void http_msg_analyzer(struct http_msg *msg, struct 
hdr_idx *idx)
ptr += sizeof(int);
}
 #endif
+   if (ptr >= end) {
+   state = HTTP_MSG_HDR_VAL;
+   goto http_msg_ood;
+   }
http_msg_hdr_val2:
if (likely(!HTTP_IS_CRLF(*ptr)))
EAT_AND_JUMP_OR_RETURN(http_msg_hdr_val2, 
HTTP_MSG_HDR_VAL);
-- 
1.7.12.1



Re: 400 error on cookie string

2017-01-04 Thread Willy Tarreau
Hi guys,

On Tue, Jan 03, 2017 at 09:10:21PM +0100, Willy Tarreau wrote:
> However when looking at the error capture, the request is correct. But as
> you can see, it's reported as wrong from offset 1453, hence one byte past
> the end of the first segment. Thus I suspect that we have something wrong
> here when receiving an incomplete request in this particular state. Maybe
> we read a value too far, find a \0 and abort, and by the time we get to
> the error dump the rest of the request comes in and appears in the capture.

So I tried to inject these data into haproxy 1.7.1 and no luck, it never
fails. I even tried to start it with -dM0 to ensure the buffer was filled
with forbidden chars and it never triggers, as can be seen below.

I'll see what to change in the code to get a copy of the parser's state in
the dump, hoping that will sched a bit more light on the problem.

Willy

--
11:38:56.314991 accept4(5, {sa_family=AF_INET, sin_port=htons(60288), 
sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_NONBLOCK) = 6
11:38:56.315073 setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
11:38:56.315139 accept4(5, 0x7ffd8be8fc40, [128], SOCK_NONBLOCK) = -1 EAGAIN 
(Resource temporarily unavailable)
11:38:56.315214 recvfrom(6, 0xaa3214, 15360, 0, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
11:38:56.315277 epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLIN|0x2000, {u32=6, 
u64=6}}) = 0
11:38:56.315327 gettimeofday({1483526336, 315340}, NULL) = 0
11:38:56.315371 epoll_wait(3, {{EPOLLIN, {u32=6, u64=6}}}, 200, 1000) = 1
11:38:56.415320 gettimeofday({1483526336, 415353}, NULL) = 0
11:38:56.415407 recvfrom(6, "GET 
/?fm=1_url=is-there-caffeine-in-chocolate_type=question=1e4aa05d-f9a2-46f8-8d4f-a59bf9cdb3ad=43049_rd=_s=sr7imz238=sr7imz238=3048187=xmonetize="...,
 15360, 0, NULL, NULL) = 1452
11:38:56.415667 setsockopt(6, SOL_TCP, TCP_QUICKACK, [1], 4) = 0
11:38:56.415778 gettimeofday({1483526336, 415800}, NULL) = 0
11:38:56.415840 epoll_wait(3, {{EPOLLIN, {u32=6, u64=6}}}, 200, 1000) = 1
11:38:57.415765 gettimeofday({1483526337, 415806}, NULL) = 0
11:38:57.415857 recvfrom(6, 
"0%22Daily%20Quiz%20%7C%20Tests%20Clickers%200-14d%20Any%2024k%2Fd%20%7C%2011d%2B%20%7C%20Theme%20%7C%201%20PM%20LiveIntent%22%2C%22tref4%22%3A%20%22NA%22%2C%22emaildomain%22%3A%20%22gmail.com%22%2C%22"...,
 13908, 0, NULL, NULL) = 1453
11:38:57.415988 setsockopt(6, SOL_TCP, TCP_QUICKACK, [1], 4) = 0
11:38:57.416072 gettimeofday({1483526337, 416089}, NULL) = 0
11:38:57.416126 epoll_wait(3, {}, 200, 1000) = 0
11:38:58.417279 gettimeofday({1483526338, 417344}, NULL) = 0
--




HAProxy's health checks and maxconn limits

2017-01-04 Thread Jiri Mencak
Hi,

we are using HAProxy with its default 2000 maxconn limit and a listen block:

listen stats :1936
mode http
monitor-uri /healthz

which we use to check HAProxy's "health" by external HTTP probes.  The
behaviour I'm seeing is that once the default 2000 maxconn limit is
reached, HAProxy stops listening for new connections and these are queued
in the kernel.  This includes the ":1936/healthz" probes which timeout and
HAProxy's state is being interpreted as "unhealthy", rather than "busy".
While we can health-check HAProxy in other ways (e.g. stats UNIX domain
socket in stream mode), HTTP health-checks are preferred since they better
reflect HAProxy's ability to process requests.

As far as I can see, staying with the HTTP probe model, we can increase the
global maxconn limit or/and increase the health-check's timeout period.

Do you see any other options?  What are the dangers of setting the global
maxconn limits really high apart from increased HAProxy's memory usage?
I've seen reports  of people going as
high as maxconn=200 and achieving 300k concurrent TCP connections on a
reportedly outdated PC.

Many thanks.

Jiri


[PATCH 1/2] MEDIUM: stats: Add JSON output option to show (info|stat)

2017-01-04 Thread Simon Horman
Add a json parameter to show (info|stat) which will output information
in JSON format. A follow-up patch will add a JSON schema which describes
the format of the JSON output of these commands.

The JSON output is without any extra whitespace in order to reduce the
volume of output. For human consumption passing the output through a
pretty printer may be helpful.

e.g.:
$ echo "show info json" | socat /var/run/haproxy.stat stdio | \
 python -m json.tool

STAT_STARTED has bee added in order to track if show output has begun or
not. This is used in order to allow the JSON output routines to only insert
a "," between elements when needed. I would value any feedback on how this
might be done better.

Signed-off-by: Simon Horman 
---

For the simple configuration below a comparison of the size of info
and stats output is as follows:

$ show stat   =>  1654 bytes
$ show stat typed =>  7081 bytes
$ show stat json  => 45331 bytes
$ show stat json (pretty printed[*]) => 113390 bytes

$ show info   =>  527 bytes
$ show info typed =>  937 bytes
$ show info json  => 5330 bytes
$ show info json (pretty printed[*]) => 11456 bytes

[*] pretty printed using python -m json.tool

--- begin config ---
global
daemon
stats socket /tmp/haproxy.stat mode 600 level admin
pidfile /tmp/haproxy.pid
log /dev/log local4
tune.bufsize 16384
tune.maxrewrite 1024

defaults
mode http
balance roundrobin
timeout connect 4000
timeout client 42000
timeout server 43000
log global

listen VIP_Name
bind 127.0.0.1:10080 transparent
mode http
balance leastconn
cookie SERVERID insert nocache indirect
server backup 127.0.0.1:9081 backup  non-stick
option http-keep-alive
option forwardfor
option redispatch
option abortonclose
maxconn 4
log global
option httplog
option log-health-checks
server RIP_Name 127.0.0.1  weight 100  cookie RIP_Name agent-check 
agent-port 12345 agent-inter 2000 check port 80 inter 2000 rise 2 fall 3  
minconn 0 maxconn 0s on-marked-down shutdown-sessions disabled
server RIP_Name 127.0.0.1  weight 100  cookie RIP_Name agent-check 
agent-port 12345 agent-inter 2000 check port 80 inter 2000 rise 2 fall 3  
minconn 0 maxconn 0s on-marked-down shutdown-sessions
--- end config ---

Changes since RFC:
* Handle cases where output exceeds available buffer space
* Document that consideration should be given to updating
  dump functions if struct field is updated
* Limit JSON integer values to the range [-(2**53)+1, (2**53)-1] as per
  the recommendation for interoperable integers in section 6 of RFC 7159.
---
 doc/management.txt|  45 +++--
 include/types/stats.h |   5 +
 src/stats.c   | 272 +-
 3 files changed, 306 insertions(+), 16 deletions(-)

diff --git a/doc/management.txt b/doc/management.txt
index 683b99790160..623ac6375552 100644
--- a/doc/management.txt
+++ b/doc/management.txt
@@ -1760,16 +1760,18 @@ show errors [|] [request|response]
 show backend
   Dump the list of backends available in the running process
 
-show info [typed]
+show info [typed|json]
   Dump info about haproxy status on current process. If "typed" is passed as an
   optional argument, field numbers, names and types are emitted as well so that
   external monitoring products can easily retrieve, possibly aggregate, then
   report information found in fields they don't know. Each field is dumped on
-  its own line. By default, the format contains only two columns delimited by a
-  colon (':'). The left one is the field name and the right one is the value.
-  It is very important to note that in typed output format, the dump for a
-  single object is contiguous so that there is no need for a consumer to store
-  everything at once.
+  its own line. If "json" is passed as an optional argument then
+  information provided by "typed" output is provided in JSON format as a
+  list of JSON objects. By default, the format contains only two columns
+  delimited by a colon (':'). The left one is the field name and the right
+  one is the value.  It is very important to note that in typed output
+  format, the dump for a single object is contiguous so that there is no
+  need for a consumer to store everything at once.
 
   When using the typed output format, each line is made of 4 columns delimited
   by colons (':'). The first column is a dot-delimited series of 3 elements. 
The
@@ -1846,6 +1848,16 @@ show info [typed]
   6.Uptime.2:MDP:str:0d 0h01m28s
   (...)
 
+  The format of JSON output is described in a schema which may be output
+  using "show schema json" (to be implemented).
+
+  The JSON output contains no extra 

[PATCH 0/2] MEDIUM: stats: Add JSON output option to show (info|stat)

2017-01-04 Thread Simon Horman
Hi,

this short series is an RFC implementation of adding JSON format
output to show (info|stat). It also adds a new show schema json
stats command to allow retreival of the schema which describes
the JSON output of show (info|stat).

Some areas for possible discussion:
* Use of STAT_STARTED in first patch
* Possible automatic generation of (part) of schema in 2nd patch
* Improved documentation

Some discussion of the size of JSON output is included as an appendix
to the changelog of the first patch.

Changes since RFC noted in per-patch changelogs.

Simon Horman (2):
  MEDIUM: stats: Add JSON output option to show (info|stat)
  MEDIUM: stats: Add show json schema

 doc/management.txt|  74 ++--
 include/types/stats.h |   6 +
 src/stats.c   | 506 +-
 3 files changed, 570 insertions(+), 16 deletions(-)

-- 
2.7.0.rc3.207.g0ac5344




[PATCH 2/2] MEDIUM: stats: Add show json schema

2017-01-04 Thread Simon Horman
This may be used to output the JSON schema which describes the output of
show info json and show stats json.

The JSON output is without any extra whitespace in order to reduce the
volume of output. For human consumption passing the output through a
pretty printer may be helpful.

e.g.:
$ echo "show schema json" | socat /var/run/haproxy.stat stdio | \
 python -m json.tool

The implementation does not generate the schema. Some consideration could
be given to integrating the output of the schema with the output of
typed and json info and stats. In particular the types (u32, s64, etc...)
and tags.

A sample verification of show info json and show stats json using
the schema is as follows. It uses the jsonschema python module:

cat > jschema.py <<  __EOF__
import json

from jsonschema import validate
from jsonschema.validators import Draft3Validator

with open('schema.txt', 'r') as f:
schema = json.load(f)
Draft3Validator.check_schema(schema)

with open('instance.txt', 'r') as f:
instance = json.load(f)
validate(instance, schema, Draft3Validator)
__EOF__

$ echo "show schema json" | socat /var/run/haproxy.stat stdio > schema.txt
$ echo "show info json" | socat /var/run/haproxy.stat stdio > instance.txt
python ./jschema.py
$ echo "show stats json" | socat /var/run/haproxy.stat stdio > instance.txt
python ./jschema.py

Signed-off-by: Simon Horman 
---

In this case the pretty printer increases the size of the output by
about 200% illustrating the value of output without whitespace.

  $ echo "show schema json" | socat /var/run/haproxy.stat stdio | wc -c
  2690
  $ echo "show schema json" | socat /var/run/haproxy.stat stdio | \
  python -m json.tool | wc -c
  8587

Changes since RFC:
* Add errors to schema and use in the case where output exceeds
  available buffer space
* Document that consideration should be given to updating
  schema function if struct field is updated
* Correct typos
* Register "show", "schema", json" rather than "show", "schema".
  This allows the parse callback to be omitted and some simplification
  of the dump callback.
* Limit integer values to the range [-(2**53)+1, (2**53)-1] as
  per the recommendation for interoperable integers in
  section 6 of RFC 7159.
---
 doc/management.txt|  33 ++-
 include/types/stats.h |   5 +-
 src/stats.c   | 234 ++
 3 files changed, 268 insertions(+), 4 deletions(-)

diff --git a/doc/management.txt b/doc/management.txt
index 623ac6375552..70af03b07271 100644
--- a/doc/management.txt
+++ b/doc/management.txt
@@ -1849,7 +1849,14 @@ show info [typed|json]
   (...)
 
   The format of JSON output is described in a schema which may be output
-  using "show schema json" (to be implemented).
+  using "show schema json".
+
+  The JSON output contains no extra whitespace in order to reduce the
+  volume of output. For human consumption passing the output through a
+  pretty printer may be helpful. Example :
+
+  $ echo "show info json" | socat /var/run/haproxy.sock stdio | \
+python -m json.tool
 
   The JSON output contains no extra whitespace in order to reduce the
   volume of output. For human consumption passing the output through a
@@ -2128,7 +2135,14 @@ show stat [{|}  ] [typed|json]
 (...)
 
   The format of JSON output is described in a schema which may be output
-  using "show schema json" (to be implemented).
+  using "show schema json".
+
+  The JSON output contains no extra whitespace in order to reduce the
+  volume of output. For human consumption passing the output through a
+  pretty printer may be helpful. Example :
+
+  $ echo "show stat json" | socat /var/run/haproxy.sock stdio | \
+python -m json.tool
 
   The JSON output contains no extra whitespace in order to reduce the
   volume of output. For human consumption passing the output through a
@@ -2237,6 +2251,21 @@ show tls-keys [id|*]
   specified as parameter, it will dump the tickets, using * it will dump every
   keys from every references.
 
+show schema json
+  Dump the schema used for the output of "show info json" and "show stat json".
+
+  The contains no extra whitespace in order to reduce the volume of output.
+  For human consumption passing the output through a pretty printer may be
+  helpful. Example :
+
+  $ echo "show schema json" | socat /var/run/haproxy.sock stdio | \
+python -m json.tool
+
+  The schema follows "JSON Schema" (json-schema.org) and accordingly
+  verifiers may be used to verify the output of "show info json" and "show
+  stat json" against the schema.
+
+
 shutdown frontend 
   Completely delete the specified frontend. All the ports it was bound to will
   be released. It will not be possible to enable the frontend anymore after
diff --git a/include/types/stats.h b/include/types/stats.h
index aad694c203c3..70224687123b 100644
--- a/include/types/stats.h
+++ b/include/types/stats.h
@@ -215,8 +215,9 @@ enum field_scope {