On 19 November 2014 09:13, Francois Billard <francois.bill...@alyseo.com> wrote: > we print the standardized column name in 'zfs_do_list' function : > static char default_fields[] = "name,used,available,referenced,mountpoint"; > the name of properties MUST not ever change, else the code that will > use them will break every time.
I agree, and this is what I was attempting to convey: that they be the standard, lowercase names as provided to "-o". Sorry for the confusion. > Your suggestion about parseable values and human readable values are > already reflected (zfs natural way) : > > with human readable values : > >> zfs list -J -o used | python -m json.tool > { > "cmd": "zfs list -J -o used", > "stdout": [ > { > "used": "55K" > }, > { > "used": "56,5K" > } > ] > } > > and with bytes values (-p option) : > >> zfs list -pJ -o used | python -m json.tool > { > "cmd": "zfs list -pJ -o used", > "stdout": [ > { > "used": "56320" > }, > { > "used": "57856" > } > ] > } So, I actually think that "-J" should _imply_ (i.e. force) "-p". It does not make sense to provide non-parsable values in a machine-readable format, especially if we are aiming for a strict, well-documented schema for the resultant output that we commit to supporting over time. > Concerning the streaming manner (a JSON objects on each line) : if you > do that, you will not have JSON output, but a bloc of text containing > several json object and you will have to parse it with regexp to load > each json object : very complicated. No, this is absolutely not true. The format I'm referring to is often described as LDJSON or "Line Delimited JSON"[1], a kind of JSON streaming format[2]. Critically, no newline characters (the byte 0x0A) appear anywhere within a JSON record -- only _between_ records. This makes it trivial to read and parse in basically any modern environment: - In C, use getline(3C) to read lines from a FILE * and then pass each one into a JSON parsing library - In node.js, use the "lstream" module to read one line at a time and JSON.parse() - In shell, use a sed(1)-like utility that understands line-delimited JSON, like json[3] or jq[4]; these make it trivial to manipulate each JSON object into some filtered or transformed version as part of a shell pipeline - Other environments such as Python, Ruby and Java all have similar library routines to read one line at a time from a file or other input source; each line is then run through the JSON parser to produce an object describing the current filesystem or other record [1] http://en.wikipedia.org/wiki/Line_Delimited_JSON [2] http://en.wikipedia.org/wiki/JSON_Streaming [3] https://github.com/trentm/json [4] http://stedolan.github.io/jq > A well formed JSON object must have root element (as list, dict), > which is easily loaded by code that will use the json output on server > side (python, java,..) In contrast, each _line_ in an LDJSON stream is a well-formed JSON object containing just the data pertaining to the current record. This enables the consumer to work on one record at a time, if that is what they require, or to collate incoming records into whatever application-specific data structure makes sense to them. Of the utmost importance, it requires neither zfs(1M) nor the application consuming the stream to produce (and subsequently parse) all of the data at one time. This is akin to the difference between scandir(3C) and readdir(3C). The former will load the entire directory into memory, sort it, then return it in one result to the user. That's fine for small directories, but for larger directories with millions of files it can take a very long time, and consume a considerable amount of memory and cycles in doing so. Using an interface like scandir(3C) has the unfortunate result that processes with memory constraints (e.g. Java with a fixed VM heap cap, or Node.js with its ~1.5GB heap limitation) are unable to process directories beyond a certain size at all. In contrast, a streaming interface like readdir(3C) allows the program to read a few directories, do some processing, and then throw that storage away. By using LDJSON for the output here, we are allowing for more flexible usage of the tooling -- especially on large systems with thousands or tens of thousands of filesystems, volumes or snapshots. I speak from painful experience dealing with processing large JSON datasets from order 50MB up to a couple of gigabytes, often in programming environments that simply cannot parse and store the entire object tree in memory. Cheers. -- Joshua M. Clulow UNIX Admin/Developer http://blog.sysmgr.org _______________________________________________ developer mailing list developer@open-zfs.org http://lists.open-zfs.org/mailman/listinfo/developer