On Wed, Apr 14, 2021 at 03:02:20AM +0200, Aleksandar Lazic wrote:
> > But then, could it make sense to also support "strict integers": values
> > that can accurately be represented as integers and which are within the
> > JSON valid range for integers (-2^52 to 2^52 with no decimal part) ?
> > This would then make the converter return nothing in case of violation
> > (i.e. risk of losing precision). This would also reject NaN and infinite
> > that the lib supports.
>
> You mean the same check which is done in arith_add().
Not exactly because arith_add only checks for overflows after addition
and tries to cap the result, but I'd rather just say that if the decoded
number is <= -2^53 or >= 2^53 then the converter should return a no match
in case an integer was requested.
> > Hmmmm that's not your fault but now I'm seeing that we already have a
> > converter inappropriately called "json", so we don't even know in which
> > direction it works by just looking at its name :-( Same issue as for
> > base64.
> >
> > May I suggest that you call yours "json_decode" or maybe shorter
> > "json_dec" so that it's more explicit that it's the decode one ? Because
> > for me "json_string" is the one that will emit a json string from some
> > input (which it is not). Then we could later create "json_enc" and warn
> > when "json" alone is used. Or even "jsdec" and "jsenc" which are much
> > shorter and still quite explicit.
>
> How about "json_query" because it's exactly what it does :-)
I'm not familiar with the notion of "query" to decode and extract contents
but I'm not the most representative user and am aware of the "jq" command-
line utility that does this. So if it sounds natural to others I'm fine
with this.
> > I'm seeing that there's a very nice mjson_find() which does *exactly* what
> > you need:
> >
> > "In a JSON string s, len, find an element by its JSONPATH path. Save
> > found element in tokptr, toklen. If not found, return JSON_TOK_INVALID.
> > If found, return one of: MJSON_TOK_STRING, MJSON_TOK_NUMBER,
> > MJSON_TOK_TRUE, MJSON_TOK_FALSE, MJSON_TOK_NULL, MJSON_TOK_ARRAY,
> > MJSON_TOK_OBJECT.
> > If a searched key contains ., [ or ] characters, they should be escaped
> > by a backslash."
> >
> > So you get the type in return. I think you can then call one of the
> > related functions depending on what is found, which is more reliable
> > than iterating over multiple attempts.
>
> Oh yes, this sounds like a better approach.
> I have now used this suggestion and I hope you can help me to fix the double
> parsing
> issue or is it acceptable to parse the input twice?
>From what I've seen in the code in the lib you have no other option.
I thought it might be possible to call mjson_get_string() on the
resulting pointer but you would need to find a way to express that
you want to extract the immediate content, maybe by having an empty
key designation or something like this. This point is not clear to
me and the unit tests in the project all re-parse the input string
after mjson_find(), so probably this is the way to do it.
> The check functions handles the int arg now as suggested.
>
> ```
> /* This function checks the "json_query" converter's arguments. */
> static int sample_check_json_query(struct arg *arg, struct sample_conv *conv,
> const char *file, int line, char **err)
> {
> if (arg[0].data.str.data == 0) { /* empty */
> memprintf(err, "json_path must not be empty");
> return 0;
> }
>
> if (arg[1].data.str.data != 0) {
> if (strncmp(arg[1].data.str.area, "int", sizeof("int"))
> != 0) {
> memprintf(err, "output_type only supports
> \"int\" as argument");
> return 0;
> } else {
> arg[1].type = ARGT_SINT;
> arg[1].data.sint = 0;
OK!
> I use now the token enum but I have some troubles to handle the c types
> properly.
OK that's better, but be careful about this:
> trash->data = mjson_get_string(smp->data.u.str.area,
> smp->data.u.str.data, args[0].data.str.area, trash->area, trash->size);
> if (trash->data == -1 ) {
You're sending the result into trash->data whose type you don't know,
and expect it to match against -1. I'd rather store the result into
a temporary variable of the same type as the function's return (int),
test it, and only if equal, I'd assign this value to the return sample's
length.
Willy