Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-23 Thread Jeff Hostetler



On 3/23/2018 4:11 PM, René Scharfe wrote:

Am 23.03.2018 um 20:55 schrieb Jeff Hostetler:

+struct json_writer_level
+{
+    unsigned level_is_array : 1;
+    unsigned level_is_empty : 1;
+};
+
+struct json_writer
+{
+    struct json_writer_level *levels;
+    int nr, alloc;
+    struct strbuf json;
+};


A simpler and probably more compact representation of is_array would
be a strbuf with one char per level, e.g. '[' for an array and '{'
for an object (or ']' and '}').

I don't understand the need to track emptiness per level.  Only the
top level array/object can ever be empty, can it?


My expectation was that any sub-object or sub-array could be empty.
That is, this should be valid (and the JSON parser in Python allows):

      {"a":{}, "b":[], "c":[[]], "d":[{}]}


Sure, but the emptiness of finished arrays and objects doesn't matter
for the purposes of error checking, comma setting or closing.  At most
one of them is empty *and* unclosed while writing the overall JSON
object -- the last one opened:

{
{"a":{
{"a":{}, "b":[
{"a":{}, "b":[], "c":[
{"a":{}, "b":[], "c":[[
{"a":{}, "b":[], "c":[[]], "d":[
{"a":{}, "b":[], "c":[[]], "d":[{

Any of the earlier written arrays/objects are either closed or contain
at least a half-done sub-array/object, which makes them non-empty.

René



good point.  i'll revisit.  thanks.
Jeff


Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-23 Thread René Scharfe
Am 23.03.2018 um 20:55 schrieb Jeff Hostetler:
>>> +struct json_writer_level
>>> +{
>>> +    unsigned level_is_array : 1;
>>> +    unsigned level_is_empty : 1;
>>> +};
>>> +
>>> +struct json_writer
>>> +{
>>> +    struct json_writer_level *levels;
>>> +    int nr, alloc;
>>> +    struct strbuf json;
>>> +};
>>
>> A simpler and probably more compact representation of is_array would
>> be a strbuf with one char per level, e.g. '[' for an array and '{'
>> for an object (or ']' and '}').
>>
>> I don't understand the need to track emptiness per level.  Only the
>> top level array/object can ever be empty, can it?
> 
> My expectation was that any sub-object or sub-array could be empty.
> That is, this should be valid (and the JSON parser in Python allows):
> 
>      {"a":{}, "b":[], "c":[[]], "d":[{}]}

Sure, but the emptiness of finished arrays and objects doesn't matter
for the purposes of error checking, comma setting or closing.  At most
one of them is empty *and* unclosed while writing the overall JSON
object -- the last one opened:

{
{"a":{
{"a":{}, "b":[
{"a":{}, "b":[], "c":[
{"a":{}, "b":[], "c":[[
{"a":{}, "b":[], "c":[[]], "d":[
{"a":{}, "b":[], "c":[[]], "d":[{

Any of the earlier written arrays/objects are either closed or contain
at least a half-done sub-array/object, which makes them non-empty.

René


Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-23 Thread Jeff Hostetler



On 3/23/2018 2:01 PM, René Scharfe wrote:

Am 21.03.2018 um 20:28 schrieb g...@jeffhostetler.com:

From: Jeff Hostetler 

Add basic routines to generate data in JSON format.

Signed-off-by: Jeff Hostetler 
---
   Makefile|   2 +
   json-writer.c   | 321 +
   json-writer.h   |  86 +
   t/helper/test-json-writer.c | 420 

   t/t0019-json-writer.sh  | 102 +++
   5 files changed, 931 insertions(+)
   create mode 100644 json-writer.c
   create mode 100644 json-writer.h
   create mode 100644 t/helper/test-json-writer.c
   create mode 100755 t/t0019-json-writer.sh

diff --git a/Makefile b/Makefile
index 1a9b23b..57f58e6 100644
--- a/Makefile
+++ b/Makefile
@@ -662,6 +662,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh
   TEST_PROGRAMS_NEED_X += test-genrandom
   TEST_PROGRAMS_NEED_X += test-hashmap
   TEST_PROGRAMS_NEED_X += test-index-version
+TEST_PROGRAMS_NEED_X += test-json-writer
   TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash
   TEST_PROGRAMS_NEED_X += test-line-buffer
   TEST_PROGRAMS_NEED_X += test-match-trees
@@ -815,6 +816,7 @@ LIB_OBJS += hashmap.o
   LIB_OBJS += help.o
   LIB_OBJS += hex.o
   LIB_OBJS += ident.o
+LIB_OBJS += json-writer.o
   LIB_OBJS += kwset.o
   LIB_OBJS += levenshtein.o
   LIB_OBJS += line-log.o
diff --git a/json-writer.c b/json-writer.c
new file mode 100644
index 000..89a6abb
--- /dev/null
+++ b/json-writer.c
@@ -0,0 +1,321 @@
+#include "cache.h"
+#include "json-writer.h"
+
+static char g_ch_open[2]  = { '{', '[' };
+static char g_ch_close[2] = { '}', ']' };
+
+/*
+ * Append JSON-quoted version of the given string to 'out'.
+ */
+static void append_quoted_string(struct strbuf *out, const char *in)
+{
+   strbuf_addch(out, '"');
+   for (/**/; *in; in++) {
+   unsigned char c = (unsigned char)*in;
+   if (c == '"')
+   strbuf_add(out, "\\\"", 2);
+   else if (c == '\\')
+   strbuf_add(out, "", 2);
+   else if (c == '\n')
+   strbuf_add(out, "\\n", 2);
+   else if (c == '\r')
+   strbuf_add(out, "\\r", 2);
+   else if (c == '\t')
+   strbuf_add(out, "\\t", 2);
+   else if (c == '\f')
+   strbuf_add(out, "\\f", 2);
+   else if (c == '\b')
+   strbuf_add(out, "\\b", 2);


Using strbuf_addstr() here would result in the same object code (its
strlen() call is inlined for constants) and avoid having to specify
the redundant length 2.


sure. thanks.



+   else if (c < 0x20)
+   strbuf_addf(out, "\\u%04x", c);
+   else
+   strbuf_addch(out, c);
+   }
+   strbuf_addch(out, '"');
+}
+

[...]

+
+void jw_object_double(struct json_writer *jw, const char *fmt,
+ const char *key, double value)
+{
+   assert_in_object(jw, key);
+   maybe_add_comma(jw);
+
+   if (!fmt || !*fmt)
+   fmt = "%f";
+
+   append_quoted_string(>json, key);
+   strbuf_addch(>json, ':');
+   strbuf_addf(>json, fmt, value);


Hmm.  Can compilers check such a caller-supplied format matches the
value's type?  (I don't know how to specify a format attribute for
GCC and Clang for this function.)


I'll look into this.  thanks.



+}

[...]

+
+void jw_object_sub(struct json_writer *jw, const char *key,
+  const struct json_writer *value)
+{
+   assert_is_terminated(value);
+
+   assert_in_object(jw, key);
+   maybe_add_comma(jw);
+
+   append_quoted_string(>json, key);
+   strbuf_addch(>json, ':');
+   strbuf_addstr(>json, value->json.buf);


strbuf_addbuf() would be a better fit here -- it avoids a strlen() call
and NUL handling issues.


sure. thanks. i always forget about _addbuf().



+}
+
+void jw_object_inline_begin_array(struct json_writer *jw, const char *key)
+{
+   assert_in_object(jw, key);
+   maybe_add_comma(jw);
+
+   append_quoted_string(>json, key);
+   strbuf_addch(>json, ':');
+
+   jw_array_begin(jw);
+}


Those duplicate calls in the last ten functions feel mind-numbing.  A
helper function for adding comma, key and colon might be a good idea.


I'll see if I can add another macro to do the dirty work here.

[...]


diff --git a/json-writer.h b/json-writer.h
new file mode 100644
index 000..ad38c95
--- /dev/null
+++ b/json-writer.h
@@ -0,0 +1,86 @@
+#ifndef JSON_WRITER_H
+#define JSON_WRITER_H
+
+/*
+ * JSON data structures are defined at:
+ *  http://json.org/
+ *  http://www.ietf.org/rfc/rfc7159.txt
+ *
+ * The JSON-writer API allows one to build JSON data structures using a
+ * simple wrapper around a "struct strbuf" buffer.  It is intended as a
+ * simple API to build output 

Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-23 Thread René Scharfe
Am 21.03.2018 um 20:28 schrieb g...@jeffhostetler.com:
> From: Jeff Hostetler 
> 
> Add basic routines to generate data in JSON format.
> 
> Signed-off-by: Jeff Hostetler 
> ---
>   Makefile|   2 +
>   json-writer.c   | 321 +
>   json-writer.h   |  86 +
>   t/helper/test-json-writer.c | 420 
> 
>   t/t0019-json-writer.sh  | 102 +++
>   5 files changed, 931 insertions(+)
>   create mode 100644 json-writer.c
>   create mode 100644 json-writer.h
>   create mode 100644 t/helper/test-json-writer.c
>   create mode 100755 t/t0019-json-writer.sh
> 
> diff --git a/Makefile b/Makefile
> index 1a9b23b..57f58e6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -662,6 +662,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh
>   TEST_PROGRAMS_NEED_X += test-genrandom
>   TEST_PROGRAMS_NEED_X += test-hashmap
>   TEST_PROGRAMS_NEED_X += test-index-version
> +TEST_PROGRAMS_NEED_X += test-json-writer
>   TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash
>   TEST_PROGRAMS_NEED_X += test-line-buffer
>   TEST_PROGRAMS_NEED_X += test-match-trees
> @@ -815,6 +816,7 @@ LIB_OBJS += hashmap.o
>   LIB_OBJS += help.o
>   LIB_OBJS += hex.o
>   LIB_OBJS += ident.o
> +LIB_OBJS += json-writer.o
>   LIB_OBJS += kwset.o
>   LIB_OBJS += levenshtein.o
>   LIB_OBJS += line-log.o
> diff --git a/json-writer.c b/json-writer.c
> new file mode 100644
> index 000..89a6abb
> --- /dev/null
> +++ b/json-writer.c
> @@ -0,0 +1,321 @@
> +#include "cache.h"
> +#include "json-writer.h"
> +
> +static char g_ch_open[2]  = { '{', '[' };
> +static char g_ch_close[2] = { '}', ']' };
> +
> +/*
> + * Append JSON-quoted version of the given string to 'out'.
> + */
> +static void append_quoted_string(struct strbuf *out, const char *in)
> +{
> + strbuf_addch(out, '"');
> + for (/**/; *in; in++) {
> + unsigned char c = (unsigned char)*in;
> + if (c == '"')
> + strbuf_add(out, "\\\"", 2);
> + else if (c == '\\')
> + strbuf_add(out, "", 2);
> + else if (c == '\n')
> + strbuf_add(out, "\\n", 2);
> + else if (c == '\r')
> + strbuf_add(out, "\\r", 2);
> + else if (c == '\t')
> + strbuf_add(out, "\\t", 2);
> + else if (c == '\f')
> + strbuf_add(out, "\\f", 2);
> + else if (c == '\b')
> + strbuf_add(out, "\\b", 2);

Using strbuf_addstr() here would result in the same object code (its
strlen() call is inlined for constants) and avoid having to specify
the redundant length 2.

> + else if (c < 0x20)
> + strbuf_addf(out, "\\u%04x", c);
> + else
> + strbuf_addch(out, c);
> + }
> + strbuf_addch(out, '"');
> +}
> +
> +
> +static inline void begin(struct json_writer *jw, int is_array)
> +{
> + ALLOC_GROW(jw->levels, jw->nr + 1, jw->alloc);
> +
> + jw->levels[jw->nr].level_is_array = !!is_array;
> + jw->levels[jw->nr].level_is_empty = 1;
> +
> + strbuf_addch(>json, g_ch_open[!!is_array]);
> +
> + jw->nr++;
> +}
> +
> +/*
> + * Assert that we have an open object at this level.
> + */
> +static void inline assert_in_object(const struct json_writer *jw, const char 
> *key)
> +{
> + if (!jw->nr)
> + BUG("object: missing jw_object_begin(): '%s'", key);
> + if (jw->levels[jw->nr - 1].level_is_array)
> + BUG("object: not in object: '%s'", key);
> +}
> +
> +/*
> + * Assert that we have an open array at this level.
> + */
> +static void inline assert_in_array(const struct json_writer *jw)
> +{
> + if (!jw->nr)
> + BUG("array: missing jw_begin()");
> + if (!jw->levels[jw->nr - 1].level_is_array)
> + BUG("array: not in array");
> +}
> +
> +/*
> + * Add comma if we have already seen a member at this level.
> + */
> +static void inline maybe_add_comma(struct json_writer *jw)
> +{
> + if (jw->levels[jw->nr - 1].level_is_empty)
> + jw->levels[jw->nr - 1].level_is_empty = 0;
> + else
> + strbuf_addch(>json, ',');
> +}
> +
> +/*
> + * Assert that the given JSON object or JSON array has been properly
> + * terminated.  (Has closing bracket.)
> + */
> +static void inline assert_is_terminated(const struct json_writer *jw)
> +{
> + if (jw->nr)
> + BUG("object: missing jw_end(): '%s'", jw->json.buf);
> +}
> +
> +void jw_object_begin(struct json_writer *jw)
> +{
> + begin(jw, 0);
> +}
> +
> +void jw_object_string(struct json_writer *jw, const char *key, const char 
> *value)
> +{
> + assert_in_object(jw, key);
> + maybe_add_comma(jw);
> +
> + append_quoted_string(>json, key);
> + strbuf_addch(>json, ':');
> + append_quoted_string(>json, value);
> +}
> +
> +void 

Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-23 Thread Jeff Hostetler



On 3/21/2018 5:25 PM, Junio C Hamano wrote:

g...@jeffhostetler.com writes:


From: Jeff Hostetler 

Add basic routines to generate data in JSON format.


And the point of having capability to write JSON data in our
codebase is...?


diff --git a/json-writer.c b/json-writer.c
new file mode 100644
index 000..89a6abb
--- /dev/null
+++ b/json-writer.c
@@ -0,0 +1,321 @@
+#include "cache.h"
+#include "json-writer.h"
+
+static char g_ch_open[2]  = { '{', '[' };
+static char g_ch_close[2] = { '}', ']' };


What's "g_" prefix?


Global.

Sorry, very old habits.


+
+/*
+ * Append JSON-quoted version of the given string to 'out'.
+ */
+static void append_quoted_string(struct strbuf *out, const char *in)
+{
+   strbuf_addch(out, '"');
+   for (/**/; *in; in++) {
+   unsigned char c = (unsigned char)*in;


It is clear enough to lose /**/, i.e.

for (; *in; in++) {

but for this one. I wonder if

unsigned char c;
strbuf_addch(out, '"');
while ((c = *in++) != '\0') {
...

is easier to follow, though.


either way is fine.  will fix.




+static inline void begin(struct json_writer *jw, int is_array)
+{
+   ALLOC_GROW(jw->levels, jw->nr + 1, jw->alloc);
+
+   jw->levels[jw->nr].level_is_array = !!is_array;
+   jw->levels[jw->nr].level_is_empty = 1;


An element of this array is a struct that represents a level, and
everybody who accesses an element of that type knows it is talking
about a level by the field that has the array being named as
.levels[] (also [*1*]).  In such a context, it is a bit too loud to
name the fields with level_$blah.  IOW,

struct json_writer_level
{
unsigned is_array : 1;
unsigned is_empty : 1;
};


make sense.  will fix.


+struct json_writer_level
+{
+   unsigned level_is_array : 1;
+   unsigned level_is_empty : 1;
+};
+
+struct json_writer
+{
+   struct json_writer_level *levels;
+   int nr, alloc;
+   struct strbuf json;
+};


[Footnote]

*1* I personally prefer to call an array of things "thing[]", not
 "things[]", because then you can refer to an individual element
 e.g. "thing[4]" and read it as "the fourth thing".

 Unless the code often treats an array as a whole, that is, in
 which case, things[] is OK as you'll be calling the whole thing
 with the plural name (e.g. call that function and give all the
 things by passing things[]).

 In this case, one level instance is an element of a stack, and
 the code would be accessing one level at a time most of the
 time, so "writer.level[4].is_empty" would read more naturally
 than "writer.levels[4].level_is_empty".


yeah, that makes sense.

Thanks
Jeff




Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-21 Thread Junio C Hamano
g...@jeffhostetler.com writes:

> From: Jeff Hostetler 
>
> Add basic routines to generate data in JSON format.

And the point of having capability to write JSON data in our
codebase is...?

> diff --git a/json-writer.c b/json-writer.c
> new file mode 100644
> index 000..89a6abb
> --- /dev/null
> +++ b/json-writer.c
> @@ -0,0 +1,321 @@
> +#include "cache.h"
> +#include "json-writer.h"
> +
> +static char g_ch_open[2]  = { '{', '[' };
> +static char g_ch_close[2] = { '}', ']' };

What's "g_" prefix?

> +
> +/*
> + * Append JSON-quoted version of the given string to 'out'.
> + */
> +static void append_quoted_string(struct strbuf *out, const char *in)
> +{
> + strbuf_addch(out, '"');
> + for (/**/; *in; in++) {
> + unsigned char c = (unsigned char)*in;

It is clear enough to lose /**/, i.e.

for (; *in; in++) {

but for this one. I wonder if

unsigned char c;
strbuf_addch(out, '"');
while ((c = *in++) != '\0') {
...

is easier to follow, though.

> +static inline void begin(struct json_writer *jw, int is_array)
> +{
> + ALLOC_GROW(jw->levels, jw->nr + 1, jw->alloc);
> +
> + jw->levels[jw->nr].level_is_array = !!is_array;
> + jw->levels[jw->nr].level_is_empty = 1;

An element of this array is a struct that represents a level, and
everybody who accesses an element of that type knows it is talking
about a level by the field that has the array being named as
.levels[] (also [*1*]).  In such a context, it is a bit too loud to
name the fields with level_$blah.  IOW,

struct json_writer_level
{
unsigned is_array : 1;
unsigned is_empty : 1;
};

> +struct json_writer_level
> +{
> + unsigned level_is_array : 1;
> + unsigned level_is_empty : 1;
> +};
> +
> +struct json_writer
> +{
> + struct json_writer_level *levels;
> + int nr, alloc;
> + struct strbuf json;
> +};

[Footnote]

*1* I personally prefer to call an array of things "thing[]", not
"things[]", because then you can refer to an individual element
e.g. "thing[4]" and read it as "the fourth thing".

Unless the code often treats an array as a whole, that is, in
which case, things[] is OK as you'll be calling the whole thing
with the plural name (e.g. call that function and give all the
things by passing things[]).

In this case, one level instance is an element of a stack, and
the code would be accessing one level at a time most of the
time, so "writer.level[4].is_empty" would read more naturally
than "writer.levels[4].level_is_empty".