On Fri, Jan 24, 2014 at 8:21 PM, Frank Millman <fr...@chagford.com> wrote: > I find that I am using JSON and XML more and more in my project, so I > thought I would explain what I am doing to see if others think this is an > acceptable approach or if I have taken a wrong turn.
Please don't take this the wrong way, but the uses of JSON and XML that you describe are still completely inappropriate. Python does not need this sort of thing. > I store database metadata in the database itself. I have a table that > defines each table in the database, and I have a table that defines each > column. Column definitions include information such as data type, allow > null, allow amend, maximum length, etc. Some columns require that the value > is constrained to a subset of allowable values (e.g. 'title' must be one of > 'Mr', 'Mrs', etc.). I know that this can be handled by a 'check' constraint, > but I have added some additional features which can't be handled by that. So > I have a column in my column-definition table called 'choices', which > contains a JSON'd list of allowable choices. It is actually more complex > than that, but this will suffice for a simple example. This is MySQL-style of thinking, that a database is a thing that's interpreted by an application. The better way to do it is to craft a check constraint; I don't know what features you're trying to use that can't be handled by that, but (at least in PostgreSQL) I've never had anything that can't be expressed that way. With the simple example you give, incidentally, it might be better to use an enumeration rather than a CHECK constraint, but whichever way. > For my more complex example, I should explain that my project involves > writing a generalised business/accounting system. Anyone who has worked on > these knows that you quickly end up with hundreds of database tables storing > business data, and hundreds of forms allowing users to CRUD the data > (create/read/update/delete). Each form is unique, and yet they all share a > lot of common characteristics. I have abstracted the contents of a form > sufficiently that I can represent at least 90% of it in XML. This is not > just the gui, but all the other elements - the tables required, any input > parameters, any output parameters, creating any variables to be used while > entering the form, any business logic to be applied at each point, etc. Each > form definition is stored as gzip'd XML in a database, and linked to the > menu system. There is just one python program that responds to the selection > of a menu option, retrieves the form from the database, unpacks it and runs > it. Write your rendering engine as a few simple helper functions, and then put all the rest in as code instead of XML. The easiest way to go about it is to write three forms, from scratch, and then look at the common parts and figure out which bits can go into helper functions. You'll find that you can craft a powerful system with just a little bit of code, if you build it right, and it'll mean _less_ library code than you currently have as XML. If you currently represent 90% of it in XML, then you could represent 90% of it with simple calls to helper functions, which is every bit as readable (probably more so), and much *much* easier to tweak. Does your XML parsing engine let you embed arbitrary expressions into your code? Can you put a calculated value somewhere? With Python, everything's code, so there's no difference between saying "foo(3)" and saying "foo(1+2)". > Incidentally, I would take issue with the comment that 'JSON is easily > readable by humans (UNLIKE XML)'. Here is a more complete example of my > 'choices' definition. > > [true, true, [["admin", "System administrator", [], []], ["ind", > "Individual", [["first_name", true], ["surname", true]], [["first_name", " > "], ["surname", ""]]], ["comp", "Company", [["comp_name", true], ["reg_no", > true], ["vat_no", false]], [["comp_name", ""]]]]] > > You can read it, but what does it mean? > > This is what it would look like if I stored it in XML - > > <choices use_subtypes="true" use_displaynames="true"> > <choice code="admin" descr="System administrator"> > <subtype_columns/> > <displaynames/> > </choice> > <choice code="ind" descr="Individual"> > <subtype_columns> > <subtype_column col_name="first_name" required="true"/> > <subtype_column col_name="surname" required="true"/> > </subtype_columns> > <displaynames> > <displayname col_name="first_name" separator=" "/> > <displayname col_name="surname" separator=""/> > </displaynames> > </choice> > <choice code="comp" descr="Company"> > <subtype_columns> > <subtype_column col_name="comp_name" required="true"/> > <subtype_column col_name="reg_no" required="true"/> > <subtype_column col_name="vat_no" required="false"/> > </subtype_columns> > <displaynames> > <displayname col_name="comp_name" separator=""/> > </displaynames> > </choice> > </choices> > > More verbose - sure. Less human-readable - I don't think so. Well, that's because you elided all the names in the JSON version. Of course that's not a fair comparison. :) Here's how I'd render that XML in JSON. I assume that this is stored in a variable, database field, or whatever, named "choices", and so I skip that first-level name. {"use_subtypes":true, "use_displaynames":true, "choice":[ {"code":"admin", "descr":"System administrator", "subtype_columns":[], "displaynames":[]}, {"code":"ind", "descr":"Individual", "subtype_columns":[ {"col_name":"first_name", "required":true}, {"col_name":"surname", "required":true} ], "displaynames":[ {"col_name":"first_name", "separator":" "}, {"col_name":"surname", "separator":""} ] }, {"code":"comp", "descr":"Company", "subtype_columns":[ {"col_name":"comp_name", "required":true}, {"col_name":"reg_no", "required":true}, {"col_name":"vat_no", "required":false} ], "displaynames":[ {"col_name":"comp_name", "separator":""} ] } ]} You can mix and match the two styles to get the readability level you need. I think the "col_name" and "required" tags are probably better omitted, which would make the JSON block look like this: {"use_subtypes":true, "use_displaynames":true, "choice":[ {"code":"admin", "descr":"System administrator", "subtype_columns":[], "displaynames":[]}, {"code":"ind", "descr":"Individual", "subtype_columns":[ ["first_name", "required"], ["surname", "required"] ], "displaynames":[ ["first_name", " "] ["surname", ""] ] }, {"code":"comp", "descr":"Company", "subtype_columns":[ ["comp_name", "required"] ["reg_no", "required"] ["vat_no", "optional"] ], "displaynames":[ ["comp_name", ""] ] } ]} Note that instead of "true" or "false", I've simply used "required" or "optional". Easy readability without excessive verbosity. And you could probably make "required" the default, so you just have [["comp_name"], ["reg_no"], ["vat_no", "optional"]] for the last block. Now, here's the real killer. You can take this JSON block and turn it into Python code like this: choices = ... block of JSON ... And now it's real code. It's that simple. You can't do that with XML. And once it's code, you can add functionality to it to improve readability even more. > Also, intuitively one would think it would take much longer to process the > XML version compared with the JSON version. I have not done any benchmarks, > but I use lxml, and I am astonished at the speed. Admittedly a typical > form-processor spends most of its time waiting for user input. Even so, for > my purposes, I have never felt the slightest slowdown caused by XML. On this, I absolutely agree. You will almost never see a performance difference between any of the above. Don't pick based on performance - pick based on what makes sense and is readable. The pure-code version might suffer terribly (interpreted Python with several levels of helper functions, each one having its overhead - though personally, I expect it'd be comparable to the others), but you would still do better with it, because the difference will be on the scale of microseconds. I've built GUIs in a number of frameworks, including building frameworks myself. I've built a system for creating CRUD databases. (It's still in use, incidentally, though only on our legacy OS/2 systems. One of the costs of writing in REXX rather than Python, but I didn't know Python back in the early 1990s when I wrote Hudson.) Frameworks that boast that it doesn't take code to use them tend to suffer from the Inner-Platform Effect [1] [2] if they get sufficiently powerful, and it quickly becomes necessary to drop to code somewhere. (Or, if they DON'T get powerful enough to hit the IPE, they overly restrain what you can do with them. That's fine if all you want is simple, but what if 99.9% of what you want fits into the framework and 0.1% simply can't be done?) With Hudson, I handled the "easy bits" with a single program (basic table view display with some options, database/table selection, etc, etc), and then dropped to actual GUI code to handle the custom parts (the main form for displaying one record; optionally the Search dialog, though a basic one was provided "for free"), with a library of handy functions available to call on. Okay, I did a terrible job of the "library" part back in those days - some of them were copied and pasted into each file :| - but the concept is there. In Gypsum (a current project, MUD client for Linux/Windows/Mac OS), all GUI work is done through GTK. But there are a few parts that get really clunky, like populating a Table layout, so I made a helper function that creates a table based on a list of lists: effectively, a 2D array of cell values, where a cell could have a string (becomes a label), a widget (gets placed there), or nothing (the label/widget to the left spans this cell too). That doesn't cover _every_ possible use-case (there's no way to span multiple rows), but if I need something it can't do, I'm writing code already right there, so I can just use the Table methods directly. There was actually one extremely common sub-case where even my GTK2Table function got clunky, so I made a second level of wrapper that takes a simple list of elements and creates a 2D array of elements, with labels right-justified. Once again, if I'm doing the same thing three times, it's a prime candidate for a helper. The cool thing about doing everything in code is that you can create helpers at the scope where they're most needed. To build the character sheet module for Gypsum (one of its uses is Dungeons and Dragons), I had to build up a more complex and repetitive GUI than most, so I ended up making a handful of helpers that existed right there in the charsheet module. (A couple of them have since been promoted to global; GTK2Table mentioned above started out in charsheet.) Anyone reading code anywhere _else_ in the project doesn't need to understand those functions; but they improve readability in that module enormously. That's something that's fundamentally impossible in an interpreted/structured system like an XML GUI - at least, impossible without some horrible form of inner-platform effect. Python code is more readable than most data structures you could come up with, because you don't have to predict everything in advance. ChrisA [1] http://en.wikipedia.org/wiki/Inner-platform_effect [2] http://thedailywtf.com/Articles/The_Inner-Platform_Effect.aspx -- https://mail.python.org/mailman/listinfo/python-list