Re: [PATCH] MINOR: sample: add json_string

Willy Tarreau Tue, 13 Apr 2021 02:26:49 -0700

Hi Aleks,

On Mon, Apr 12, 2021 at 10:09:08PM +0200, Aleksandar Lazic wrote:
> Hi.
> 
> another patch which honer the feedback.


Thank you. FWIW I agree with all the points reported by Tim. I'll add
a few comments and/or suggestions below. On a general note, please be
careful about your indenting, as it very can quickly become a total
mess. Similarly please pay attention not to leave trailing spaces
that may Git complain:

  Applying: MINOR: sample: converter: add JSON Path handling
  .git/rebase-apply/patch:39: trailing whitespace.
     - number  : When the JSON value is a number then will the value be 
  .git/rebase-apply/patch:40: trailing whitespace.
                 converted to a string. If you know that the value is a 
  .git/rebase-apply/patch:41: trailing whitespace.
                 integer then can you help haproxy to convert the value 
  .git/rebase-apply/patch:46: trailing whitespace.
    This converter extracts the value located at <json_path> from the JSON 
  .git/rebase-apply/patch:47: trailing whitespace.
    string in the input value. 
  warning: squelched 10 whitespace errors
  warning: 15 lines add whitespace errors.

All these lines are easily noticed this way:

    $ git show | grep -c '^+.*\s$'
    15

A good way to avoid this once for all it to enable colors in Git and to
always make sure not to leave red areas in "git diff" or "git show" :

    $ git config --global color.ui true

And even if it's of low importance for the code itself, it's particularly
important in a review because such cosmetic issues constantly remind the
reader that the patch is far from being final, so it's possibly not yet
the moment to focus on not critically important stuff. Thus in the end
they increase the number of round trips.

> The doc will be enhanced but I have a question about that sequence.
> This should write the double value to the string but I think I have here some
> issue.
> 
> ```
>                       printf("\n>>>DOUBLE rc:%d: double:%f:\n",rc, 
> double_val);
>                       trash->size = snprintf(trash->area,
>                                                               trash->data,
>                                                               
> "%g",double_val);
>                       smp->data.u.str = *trash;
>                       smp->data.type = SMP_T_STR;
> ```

Yeah, as Tim mentioned, you mixed size and data. "data" is the amount of
data bytes used in a chunk. "size" is its allocated size.


> >From 8cb1bc4aaedd17c7189d4985a57f662ab1b533a4 Mon Sep 17 00:00:00 2001
> From: Aleksandar Lazic <al-hapr...@none.at>
> Date: Mon, 12 Apr 2021 22:01:04 +0200
> Subject: [PATCH] MINOR: sample: converter: add JSON Path handling
> 
> With json_path can a JSON value be extacted from a Header
> or body

In the final version, please add a few more lines to describe the name
of the added converter and what it's used for. As a reminder, think that
you're trying to sell your artwork to me or anyone else who would make
you proud bybackporting your work into their version :-)

> +json_query(<json_path>,[<output_type>])
> +  The <json_path> is mandatory.
> +  By default will the follwing JSON types recognized.
> +   - string  : This is the default search type and returns a string;
> +   - number  : When the JSON value is a number then will the value be 
> +               converted to a string. If you know that the value is a 
> +               integer then can you help haproxy to convert the value 
> +               to a integer when you add "sint" to the <output_type>;

Just thinking loud, I looked at the rest of the doc and noticed we never
mention "sint" anywhere else, so I think it's entirely an internal type.
However we do mention "int" which is used as the matching method for
integers, so we could have:

     ... json_query("blah",sint) -m int 12

As such I would find it more natural to call this type "int" so that it
matches the same as the one used in the match. Maps already use "int" as
the output type name by the way.

In any case, I too am a bit confused by the need to force an output type.
As a user, I'd expect the type to be implicit and not to have to know
about it in the configuration. Of course we can imagine situations where
we'd want to force the type (like we sometimes do by adding 0 or
concatenating an empty string for example) but this is still not very
clear to me if we want it by default. Or maybe when dealing with floats
where we'd have to decide whether to emit them verbatim as strings or
to convert them to integers.

But then, could it make sense to also support "strict integers": values
that can accurately be represented as integers and which are within the
JSON valid range for integers (-2^52 to 2^52 with no decimal part) ?
This would then make the converter return nothing in case of violation
(i.e. risk of losing precision). This would also reject NaN and infinite
that the lib supports.

A small detail, in general, prefer a passive form in the text rather than
speaking to the reader using "you". You'll notice that using this more
descriptive approach helps better understand what a configuration does.
A technical documentation should be factual and work regardless of the
existence of a human. It's a difficult exercise but it results in better
quality overall. Here, instead of "if you know that the value is an
integer then you can help haproxy..." it's possible to write something
long "Specifuing an output type will avoid possibly costly or inaccurate
conversions of the returned value; the only supported type at the moment
is 'int' for integers".

> +   - boolean : If the JSON value is not a String
> +
> +  This converter uses the mjson library https://github.com/cesanta/mjson
> +  This converter extracts the value located at <json_path> from the JSON 
> +  string in the input value. 

While it's nice to advertise the library, this doesn't bring anything to
the user to have this in the configuration, and implicitly makes it a
binding decision for the long term, because by appearing in the doc, a
use could decide to rely on one of its specific behavior or extensions,
causing trouble if in the future for any reason we'd have to turn to any
other one.

> +  <json_path> must be a valid JsonPath string as defined at 
> +  https://goessner.net/articles/JsonPath/

It's nice to put a link, but better use the link to the most up-to-date
specification which is ongoing the standardization process:
   https://datatracker.ietf.org/doc/draft-ietf-jsonpath-base/

> +  A floating point value will always be returned as string!
                                                             ^
No need for this exclamation mark :-)
I also suggest that you mention that non-representable floating point
values like NaNs and infinites *may* be returned as short words or be
dropped depending on the underlying JSON library. This way nobody is
surprised to see a "nan" or "-infinite" in their logs.

> +  Example:
> +     # get the value of the key kubernetes.io/serviceaccount/namespace
> +     # => openshift-logging
> +     http-request set-var(sess.json) 
> req.hdr(Authorization),b64dec,json_string('$.kubernetes\\.io/serviceaccount/namespace')
> +     
> +     # get the value of the key 'iss' from a JWT
> +     # => kubernetes/serviceaccount
> +     http-request set-var(sess.json) 
> req.hdr(Authorization),b64dec,json_string('$.iss')

I've seen Tim's cmoments about these as being specific. I partially agree. I
think it's better to start with a very simple example (extract a number or
string from a sequence, maybe extract a user ID from its name). But your
examples above are useful in your specific context and show how to enter
these always annoying escape sequences, so they have their place here, just
as a more advanced example.

> diff --git a/include/import/mjson.h b/include/import/mjson.h
> new file mode 100644
> index 000000000..b96fd3fbb
> --- /dev/null
> +++ b/include/import/mjson.h
(...)

Just a detail, but it will significantly simplify long-term maintenance,
please split this in two patches, one whicih adds mjson.{c,h} and the
corresponding Makefile entry, and a second patch which adds your code.
This will ease updates and backports to stable in the future, particularly
if for any reason it is sometimes easier to revert and re-apply a patch
than to backport an upgrade patch.

The first patch can simply be justified as "this will be used by a
subsequent patch". And if you have to perform small changes there
(add/change an include), please do mention it in the commit message
so that future updates are easier to handle by just repeating the same
operation if needed.

> diff --git a/src/sample.c b/src/sample.c
> index 835a18115..d6cb6379d 100644
> --- a/src/sample.c
> +++ b/src/sample.c
> @@ -16,6 +16,7 @@
>  #include <arpa/inet.h>
>  #include <stdio.h>
>  
> +#include <import/mjson.h>
>  #include <import/sha1.h>
>  #include <import/xxhash.h>
>  
> @@ -3653,6 +3654,97 @@ static int sample_conv_rtrim(const struct arg *arg_p, 
> struct sample *smp, void *
>       return 1;
>  }
>  
> +/* This function checks the "json_query" converter's arguments. */
> +static int sample_check_json_query(struct arg *arg, struct sample_conv *conv,
> +                           const char *file, int line, char **err)
> +{
> +     int result;
> +
> +     if (arg[0].data.str.data == 0) { /* empty */
> +             memprintf(err, "json_path must not be empty");
> +             return 0;
> +     }
> +
> +     if (arg[1].data.str.data != 0) {
> +                     result = strncmp(arg[1].data.str.area, "sint", 
> sizeof("sint"));
> +                     if (result != 0) {
> +                             memprintf(err, "output_type only supports 
> \"sint\" as argument");
> +                             return 0;
> +                     }
> +     }

So as Tim suggested, please take this opportunity for replacing the
argument with an integer so that you don't need to parse it again at
run time. Please have a look at sample_conv_json_check() which does
exactly this already.

Hmmmm that's not your fault but now I'm seeing that we already have a
converter inappropriately called "json", so we don't even know in which
direction it works by just looking at its name :-(  Same issue as for
base64.

May I suggest that you call yours "json_decode" or maybe shorter
"json_dec" so that it's more explicit that it's the decode one ? Because
for me "json_string" is the one that will emit a json string from some
input (which it is not). Then we could later create "json_enc" and warn
when "json" alone is used. Or even "jsdec" and "jsenc" which are much
shorter and still quite explicit.

> +static int sample_conv_json_query(const struct arg *args, struct sample 
> *smp, void *private)
> +{
> +     struct buffer *trash = get_trash_chunk();
> +     int rc;            /* holds the return value of mjson* functions */
> +     int bool;          /* holds the value of mjson_get_bool */
> +     int result;        /* holds the return value of strncmp */
> +     double double_val; /* holds the value of mjson_get_number */
> +
> +     /* if the second argument is "sint" then are the other checks not 
> necessary. */
> +     if (args[1].data.str.data != 0) {
> +                     result = strncmp(args[1].data.str.area, "sint", 
> sizeof("sint"));

So here is the place where you'd use args->data.sint and retrieve
your output type (look for IT_ASCII or IT_UTF8S in the code for
example).

> +                     if (result != 0) {
> +                             /* "output_type only supports \"sint\" as 
> argument"); */
> +                             return 0;

And that would save you from having to deal with this.

> +                     } else {
> +                             rc = mjson_get_number(smp->data.u.str.area, 
> smp->data.u.str.data, args[0].data.str.area, &double_val);
> +                             if (rc == 0) {
> +                                     /* mjson does not recognized a number */

OK now I understand why you wanted to handle numbers separately,
but see below.

> +                                     return 0;
> +                             } else {
> +                                     smp->data.type = SMP_T_SINT;
> +                                     smp->data.u.sint = (unsigned long 
> long)double_val;
> +                                     return 1;
> +                             }
> +                             
> +                     }
> +     }
> +     
> +     /* No output_type was given try to guess the type */
> +
> +     rc = mjson_get_string(smp->data.u.str.area, smp->data.u.str.data, 
> args[0].data.str.area, trash->area, trash->size);
> +
> +     if (rc == -1 ) {

I'm seeing that there's a very nice mjson_find() which does *exactly* what
you need:

   "In a JSON string s, len, find an element by its JSONPATH path. Save
    found element in tokptr, toklen. If not found, return JSON_TOK_INVALID.
    If found, return one of: MJSON_TOK_STRING, MJSON_TOK_NUMBER,
    MJSON_TOK_TRUE, MJSON_TOK_FALSE, MJSON_TOK_NULL, MJSON_TOK_ARRAY,
    MJSON_TOK_OBJECT.
    If a searched key contains ., [ or ] characters, they should be escaped
    by a backslash."

So you get the type in return. I think you can then call one of the
related functions depending on what is found, which is more reliable
than iterating over multiple attempts.

Regards,
Willy

Re: [PATCH] MINOR: sample: add json_string

Reply via email to