On 21/09/2021 16.21, Pete Forman wrote:
"Michael F. Stemper" <michael.stem...@gmail.com> writes:
On 21/09/2021 13.49, alister wrote:
On Tue, 21 Sep 2021 13:12:10 -0500, Michael F. Stemper wrote:
It's my own research, so I can give myself the data in any format that I
like.

as far as I can see the main issue with XML is bloat, it tries to do
too many things & is a very verbose format, often the quantity of
mark-up can easily exceed the data contained within it. other formats
such a JSON & csv have far less overhead, although again not always
suitable.

I've heard of JSON, but never done anything with it.

Then you should certainly try to get a basic understanding of it. One
thing JSON shares with XML is that it is best left to machines to
produce and consume. Because both can be viewed in a text editor there
is a common misconception that they are easy to edit. Not so, commas are
a common bugbear in JSON and non-trivial edits in (XML unaware) text
editors are tricky.

Okay, after playing around with the example in Lubanovic's book[1]
I've managed to create a dict of dicts of dicts and write it to a
json file. It seems to me that this is how json handles hierarchical
data. Is that understanding correct?

Is this then the process that I would use to create a *.json file
to provide data to my various programs? Copy and paste the current
hard-coded assignment statements into REPL, use json.dump(dict,fp)
to write it to a file, and then read the file into each program
with json.load(fp)? (Actually, I'd write a function to do that,
just as I would with XML.)

Consider what overhead you should worry about. If you are concerned
about file sizes then XML, JSON and CSV should all compress to a similar
size.

Not a concern at all for my current application.

How does CSV handle hierarchical data? For instance, I have
generators[1], each of which has a name, a fuel and one or more
incremental heat rate curves. Each fuel has a name, UOM, heat content,
and price. Each incremental cost curve has a name, and a series of
ordered pairs (representing a piecewise linear curve).

Can CSV files model this sort of situation?

The short answer is no. CSV files represent spreadsheet row-column
values with nothing fancier such as formulas or other redirections.

Okay, that was what I suspected.

CSV is quite good as a lowest common denominator exchange format. I say
quite because I would characterize it by 8 attributes and you need to
pick a dialect such as MS Excel which sets out what those are. XML and
JSON are controlled much better. You can easily verify that you conform
to those and guarantee that *any* conformant parser can read your
content. XML is more powerful in that repect than JSON in that you can
define and enforce schemas. In your case the fuel name, UOM, etc. can be
validated with standard tools.

Yeah, validating against a DTD is pretty easy, since lxml.etree does all
of the work.

  In JSON all that checking is entirely
handled by the consuming program(s).
Well, the consumer's (almost) always going to need to do *some*
validation. For instance, as far as I can tell, a DTD can't specify
that there must be at least two of a particular item.

The designers of DTD seem to have taken the advice of MacLennan[2]:
  "The only reasonable numbers are zero, one, or infinity."

Which is great until you need to make sure that you have enough
points to define at least one line segment.

As in all such cases it is a matter of choosing the most apropriate tool
for the job in hand.

Naturally. That's what I'm exploring.

You might also like to consider HDF5. It is targeted at large volumes of
scientific data and its capabilities are well above what you need.

Yeah, I won't be looking at more than five or ten generators at most. A
small number is enough to confirm or refute the behavior that I'm
testing.


[1] _Introducing Python: Modern Computing in Simple Packages_,
Second Release, (c) 2015, Bill Lubanovic, O'Reilly Media, Inc.
[2] _Principles of Programming Languages: Design, Evaluation,
and Implementation_, Second Edition, (c) 1987, Bruce J. MacLennan,
Holt, Rinehart, & Winston
--
Michael F. Stemper
No animals were harmed in the composition of this message.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to