On 09/08/12 12:50, Philip Brown wrote:
On 9/8/12 12:30 PM, Danek Duvall wrote:

Turns out the number of newlines (and indentation, which is just as
critical to being able to sight-read JSON) ends up being a hefty
amount of both disk space and parsing time (I don't remember whether
the time is I/O CPU).

One linebreak per pkg... which amounts to either 4000, or 8000, total,
in a solaris catalog... would just criple parsing time?

It's somewhat possible for a human to sightread IPS catalog json, as
long as this standard of linebreak use was met.
Adding indentation is just being "pretty". But linebreak-per-pkg, is
critical. After that, a semi-decent sysadmin can easily add indents
themselves, once they've used grep to narrow things down to a line they
care about.
But right now, the single-line is so long, it does wierd things to
"less", even. (search doesnt work right)

I would also guess, that if a few extra chars bog down parsing so badly,
then all the overhead of the extra "" that are theoretically
unnecessary, also has an impact.

The lack of linebreaks is intentional for a number of reasons:

  * performance of the json implementation we're using when linebreaks
    are enabled

  * significant increase in I/O

  * to discourage users from being tempted to edit the files by hand

Given that you can pretty print the entire file with a single line of python code from the command line, and there are various third-party utilities and libraries to parse/pretty-print, it seems like a non-issue.



Given the primary use of the catalog by the packaging tools, and not by
humans or line-oriented unix utilities, it seemed like a pretty decent
tradeoff, especially how easy it is to write something to pretty-print
it:

#!/usr/bin/python

"easy"... as long as you write in python.

Sorry, but for a target audience that should be "general UNIX
sysadmins", that does not count as easy.

Sysadmins shouldn't need to parse this data. pkg(5) supplies all of the supported tools and interfaces for consumers of packaging.

Some of this data is not in a committed format; that is, how it is interpreted and the values present is subject to change.

If it were a set of perl libraries(that shipped with solaris already),
that would almost count.
Note that I personally detest perl, but it is at least arguable that
most sysadmins "should" know perl.
The same cannot be said about python, however. It's still primarily a
programers language.

Python has been around almost as long as Perl and you'd be surprised how few system administrators seem to know Perl these days.

Python is a more appropriate language when developing large-scale projects amongst a team compared to Perl.

-Shawn
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to