On 19 November 2014 09:13, Francois Billard <francois.bill...@alyseo.com> wrote:
> we print the standardized column name in 'zfs_do_list' function :
> static char default_fields[] = "name,used,available,referenced,mountpoint";
> the name of properties MUST not ever change, else the code that will
> use them will break every time.

I agree, and this is what I was attempting to convey: that they be the
standard, lowercase names as provided to "-o".  Sorry for the
confusion.

> Your suggestion about parseable values and human readable values are
> already reflected (zfs natural way) :
>
> with human readable values :
>
>> zfs list -J -o used | python -m  json.tool
> {
>     "cmd": "zfs list -J -o used",
>     "stdout": [
>         {
>             "used": "55K"
>         },
>         {
>             "used": "56,5K"
>         }
>     ]
> }
>
> and with bytes values  (-p option) :
>
>> zfs list -pJ -o used | python -m  json.tool
> {
>     "cmd": "zfs list -pJ -o used",
>     "stdout": [
>         {
>             "used": "56320"
>         },
>         {
>             "used": "57856"
>         }
>     ]
> }

So, I actually think that "-J" should _imply_ (i.e. force) "-p".  It
does not make sense to provide non-parsable values in a
machine-readable format, especially if we are aiming for a strict,
well-documented schema for the resultant output that we commit to
supporting over time.

> Concerning the streaming manner (a JSON objects on each line) : if you
> do that, you will not have JSON output, but a bloc of text containing
> several json object and you will have to parse it with regexp to load
> each json object : very complicated.

No, this is absolutely not true.  The format I'm referring to is often
described as LDJSON or "Line Delimited JSON"[1], a kind of JSON
streaming format[2].  Critically, no newline characters (the byte
0x0A) appear anywhere within a JSON record -- only _between_ records.
This makes it trivial to read and parse in basically any modern
environment:

  - In C, use getline(3C) to read lines from a FILE * and then pass each
    one into a JSON parsing library

  - In node.js, use the "lstream" module to read one line at a time and
    JSON.parse()

  - In shell, use a sed(1)-like utility that understands line-delimited
    JSON, like json[3] or jq[4]; these make it trivial to manipulate
    each JSON object into some filtered or transformed version as part
    of a shell pipeline

  - Other environments such as Python, Ruby and Java all have similar
    library routines to read one line at a time from a file or other
    input source; each line is then run through the JSON parser to
    produce an object describing the current filesystem or other record

[1] http://en.wikipedia.org/wiki/Line_Delimited_JSON
[2] http://en.wikipedia.org/wiki/JSON_Streaming
[3] https://github.com/trentm/json
[4] http://stedolan.github.io/jq

> A well formed JSON object must have root element (as list, dict),
> which is easily loaded by code that will use the json output on server
> side (python, java,..)

In contrast, each _line_ in an LDJSON stream is a well-formed JSON
object containing just the data pertaining to the current record.
This enables the consumer to work on one record at a time, if that is
what they require, or to collate incoming records into whatever
application-specific data structure makes sense to them.  Of the
utmost importance, it requires neither zfs(1M) nor the application
consuming the stream to produce (and subsequently parse) all of the
data at one time.

This is akin to the difference between scandir(3C) and readdir(3C).
The former will load the entire directory into memory, sort it, then
return it in one result to the user.  That's fine for small
directories, but for larger directories with millions of files it can
take a very long time, and consume a considerable amount of memory and
cycles in doing so.  Using an interface like scandir(3C) has the
unfortunate result that processes with memory constraints (e.g. Java
with a fixed VM heap cap, or Node.js with its ~1.5GB heap limitation)
are unable to process directories beyond a certain size at all.  In
contrast, a streaming interface like readdir(3C) allows the program to
read a few directories, do some processing, and then throw that
storage away.

By using LDJSON for the output here, we are allowing for more flexible
usage of the tooling -- especially on large systems with thousands or
tens of thousands of filesystems, volumes or snapshots.  I speak from
painful experience dealing with processing large JSON datasets from
order 50MB up to a couple of gigabytes, often in programming
environments that simply cannot parse and store the entire object tree
in memory.


Cheers.

-- 
Joshua M. Clulow
UNIX Admin/Developer
http://blog.sysmgr.org
_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to