Hi Aleks, On Mon, Apr 12, 2021 at 10:09:08PM +0200, Aleksandar Lazic wrote: > Hi. > > another patch which honer the feedback.
Thank you. FWIW I agree with all the points reported by Tim. I'll add a few comments and/or suggestions below. On a general note, please be careful about your indenting, as it very can quickly become a total mess. Similarly please pay attention not to leave trailing spaces that may Git complain: Applying: MINOR: sample: converter: add JSON Path handling .git/rebase-apply/patch:39: trailing whitespace. - number : When the JSON value is a number then will the value be .git/rebase-apply/patch:40: trailing whitespace. converted to a string. If you know that the value is a .git/rebase-apply/patch:41: trailing whitespace. integer then can you help haproxy to convert the value .git/rebase-apply/patch:46: trailing whitespace. This converter extracts the value located at <json_path> from the JSON .git/rebase-apply/patch:47: trailing whitespace. string in the input value. warning: squelched 10 whitespace errors warning: 15 lines add whitespace errors. All these lines are easily noticed this way: $ git show | grep -c '^+.*\s$' 15 A good way to avoid this once for all it to enable colors in Git and to always make sure not to leave red areas in "git diff" or "git show" : $ git config --global color.ui true And even if it's of low importance for the code itself, it's particularly important in a review because such cosmetic issues constantly remind the reader that the patch is far from being final, so it's possibly not yet the moment to focus on not critically important stuff. Thus in the end they increase the number of round trips. > The doc will be enhanced but I have a question about that sequence. > This should write the double value to the string but I think I have here some > issue. > > ``` > printf("\n>>>DOUBLE rc:%d: double:%f:\n",rc, > double_val); > trash->size = snprintf(trash->area, > trash->data, > > "%g",double_val); > smp->data.u.str = *trash; > smp->data.type = SMP_T_STR; > ``` Yeah, as Tim mentioned, you mixed size and data. "data" is the amount of data bytes used in a chunk. "size" is its allocated size. > >From 8cb1bc4aaedd17c7189d4985a57f662ab1b533a4 Mon Sep 17 00:00:00 2001 > From: Aleksandar Lazic <al-hapr...@none.at> > Date: Mon, 12 Apr 2021 22:01:04 +0200 > Subject: [PATCH] MINOR: sample: converter: add JSON Path handling > > With json_path can a JSON value be extacted from a Header > or body In the final version, please add a few more lines to describe the name of the added converter and what it's used for. As a reminder, think that you're trying to sell your artwork to me or anyone else who would make you proud bybackporting your work into their version :-) > +json_query(<json_path>,[<output_type>]) > + The <json_path> is mandatory. > + By default will the follwing JSON types recognized. > + - string : This is the default search type and returns a string; > + - number : When the JSON value is a number then will the value be > + converted to a string. If you know that the value is a > + integer then can you help haproxy to convert the value > + to a integer when you add "sint" to the <output_type>; Just thinking loud, I looked at the rest of the doc and noticed we never mention "sint" anywhere else, so I think it's entirely an internal type. However we do mention "int" which is used as the matching method for integers, so we could have: ... json_query("blah",sint) -m int 12 As such I would find it more natural to call this type "int" so that it matches the same as the one used in the match. Maps already use "int" as the output type name by the way. In any case, I too am a bit confused by the need to force an output type. As a user, I'd expect the type to be implicit and not to have to know about it in the configuration. Of course we can imagine situations where we'd want to force the type (like we sometimes do by adding 0 or concatenating an empty string for example) but this is still not very clear to me if we want it by default. Or maybe when dealing with floats where we'd have to decide whether to emit them verbatim as strings or to convert them to integers. But then, could it make sense to also support "strict integers": values that can accurately be represented as integers and which are within the JSON valid range for integers (-2^52 to 2^52 with no decimal part) ? This would then make the converter return nothing in case of violation (i.e. risk of losing precision). This would also reject NaN and infinite that the lib supports. A small detail, in general, prefer a passive form in the text rather than speaking to the reader using "you". You'll notice that using this more descriptive approach helps better understand what a configuration does. A technical documentation should be factual and work regardless of the existence of a human. It's a difficult exercise but it results in better quality overall. Here, instead of "if you know that the value is an integer then you can help haproxy..." it's possible to write something long "Specifuing an output type will avoid possibly costly or inaccurate conversions of the returned value; the only supported type at the moment is 'int' for integers". > + - boolean : If the JSON value is not a String > + > + This converter uses the mjson library https://github.com/cesanta/mjson > + This converter extracts the value located at <json_path> from the JSON > + string in the input value. While it's nice to advertise the library, this doesn't bring anything to the user to have this in the configuration, and implicitly makes it a binding decision for the long term, because by appearing in the doc, a use could decide to rely on one of its specific behavior or extensions, causing trouble if in the future for any reason we'd have to turn to any other one. > + <json_path> must be a valid JsonPath string as defined at > + https://goessner.net/articles/JsonPath/ It's nice to put a link, but better use the link to the most up-to-date specification which is ongoing the standardization process: https://datatracker.ietf.org/doc/draft-ietf-jsonpath-base/ > + A floating point value will always be returned as string! ^ No need for this exclamation mark :-) I also suggest that you mention that non-representable floating point values like NaNs and infinites *may* be returned as short words or be dropped depending on the underlying JSON library. This way nobody is surprised to see a "nan" or "-infinite" in their logs. > + Example: > + # get the value of the key kubernetes.io/serviceaccount/namespace > + # => openshift-logging > + http-request set-var(sess.json) > req.hdr(Authorization),b64dec,json_string('$.kubernetes\\.io/serviceaccount/namespace') > + > + # get the value of the key 'iss' from a JWT > + # => kubernetes/serviceaccount > + http-request set-var(sess.json) > req.hdr(Authorization),b64dec,json_string('$.iss') I've seen Tim's cmoments about these as being specific. I partially agree. I think it's better to start with a very simple example (extract a number or string from a sequence, maybe extract a user ID from its name). But your examples above are useful in your specific context and show how to enter these always annoying escape sequences, so they have their place here, just as a more advanced example. > diff --git a/include/import/mjson.h b/include/import/mjson.h > new file mode 100644 > index 000000000..b96fd3fbb > --- /dev/null > +++ b/include/import/mjson.h (...) Just a detail, but it will significantly simplify long-term maintenance, please split this in two patches, one whicih adds mjson.{c,h} and the corresponding Makefile entry, and a second patch which adds your code. This will ease updates and backports to stable in the future, particularly if for any reason it is sometimes easier to revert and re-apply a patch than to backport an upgrade patch. The first patch can simply be justified as "this will be used by a subsequent patch". And if you have to perform small changes there (add/change an include), please do mention it in the commit message so that future updates are easier to handle by just repeating the same operation if needed. > diff --git a/src/sample.c b/src/sample.c > index 835a18115..d6cb6379d 100644 > --- a/src/sample.c > +++ b/src/sample.c > @@ -16,6 +16,7 @@ > #include <arpa/inet.h> > #include <stdio.h> > > +#include <import/mjson.h> > #include <import/sha1.h> > #include <import/xxhash.h> > > @@ -3653,6 +3654,97 @@ static int sample_conv_rtrim(const struct arg *arg_p, > struct sample *smp, void * > return 1; > } > > +/* This function checks the "json_query" converter's arguments. */ > +static int sample_check_json_query(struct arg *arg, struct sample_conv *conv, > + const char *file, int line, char **err) > +{ > + int result; > + > + if (arg[0].data.str.data == 0) { /* empty */ > + memprintf(err, "json_path must not be empty"); > + return 0; > + } > + > + if (arg[1].data.str.data != 0) { > + result = strncmp(arg[1].data.str.area, "sint", > sizeof("sint")); > + if (result != 0) { > + memprintf(err, "output_type only supports > \"sint\" as argument"); > + return 0; > + } > + } So as Tim suggested, please take this opportunity for replacing the argument with an integer so that you don't need to parse it again at run time. Please have a look at sample_conv_json_check() which does exactly this already. Hmmmm that's not your fault but now I'm seeing that we already have a converter inappropriately called "json", so we don't even know in which direction it works by just looking at its name :-( Same issue as for base64. May I suggest that you call yours "json_decode" or maybe shorter "json_dec" so that it's more explicit that it's the decode one ? Because for me "json_string" is the one that will emit a json string from some input (which it is not). Then we could later create "json_enc" and warn when "json" alone is used. Or even "jsdec" and "jsenc" which are much shorter and still quite explicit. > +static int sample_conv_json_query(const struct arg *args, struct sample > *smp, void *private) > +{ > + struct buffer *trash = get_trash_chunk(); > + int rc; /* holds the return value of mjson* functions */ > + int bool; /* holds the value of mjson_get_bool */ > + int result; /* holds the return value of strncmp */ > + double double_val; /* holds the value of mjson_get_number */ > + > + /* if the second argument is "sint" then are the other checks not > necessary. */ > + if (args[1].data.str.data != 0) { > + result = strncmp(args[1].data.str.area, "sint", > sizeof("sint")); So here is the place where you'd use args->data.sint and retrieve your output type (look for IT_ASCII or IT_UTF8S in the code for example). > + if (result != 0) { > + /* "output_type only supports \"sint\" as > argument"); */ > + return 0; And that would save you from having to deal with this. > + } else { > + rc = mjson_get_number(smp->data.u.str.area, > smp->data.u.str.data, args[0].data.str.area, &double_val); > + if (rc == 0) { > + /* mjson does not recognized a number */ OK now I understand why you wanted to handle numbers separately, but see below. > + return 0; > + } else { > + smp->data.type = SMP_T_SINT; > + smp->data.u.sint = (unsigned long > long)double_val; > + return 1; > + } > + > + } > + } > + > + /* No output_type was given try to guess the type */ > + > + rc = mjson_get_string(smp->data.u.str.area, smp->data.u.str.data, > args[0].data.str.area, trash->area, trash->size); > + > + if (rc == -1 ) { I'm seeing that there's a very nice mjson_find() which does *exactly* what you need: "In a JSON string s, len, find an element by its JSONPATH path. Save found element in tokptr, toklen. If not found, return JSON_TOK_INVALID. If found, return one of: MJSON_TOK_STRING, MJSON_TOK_NUMBER, MJSON_TOK_TRUE, MJSON_TOK_FALSE, MJSON_TOK_NULL, MJSON_TOK_ARRAY, MJSON_TOK_OBJECT. If a searched key contains ., [ or ] characters, they should be escaped by a backslash." So you get the type in return. I think you can then call one of the related functions depending on what is found, which is more reliable than iterating over multiple attempts. Regards, Willy