Hello, In writing some non-trivial amount of Python code I keep running into an organizational issue. I will try to state the problem fairly generally, and follow up with a (contrived) example.
The root cause of my difficulties is that by default, the relationship between a module hierarchy and the structure of files on disk is too strong for my taste. I want to separate the two as much as possible, but I do not want to resort to non-conventional "hacks" to do it. I am posting this in an attempt to present what I perceive to be a practical problem, and to get suggestions for solutions, or opinions on the most practical policy for how to deal with it. Like I said, I would like a weaker relationship between file system structure and module hierarchy. In particular there are two things I would like: * Least importantly, I don't like jamming code into __init__.py, as a personal preference. * Most importantly, I do not like to jam large amounts of code into a single source file, just for the purpose of keeping the public interface in the same package. An contrived but hopefully illustrative example: We have an organization "Org", which has a library, and as part of that library is code that relates to doing something with animals. As a result, the interesting top-level package for this example is: org.lib.animal Suppose now that I want an initial implementation of the most important animal. I want to create the class (but see [1]): org.lib.animal.Monkey The public interface consists of that class only (and possibly a small handful of functions). The implementation is quite significant however - it is 500 lines of code long. At this point, we had to jam those 500 lines of code into __init__.py. Let's ignore my personal preference of not liking to put code in __init__.py; the fact remains that we have 500 lines of code in a single source file. Now, we want to continue working on this library, adding ten additional animals. At this point, we have these choices (it seems to me): (1) Simply add these to __init__.py, resulting in __init__.py being 5000 lines long[2]. (2) Put each animal into its own file, resulting in org.lib.animal.Monkey now becoming org.lib.animal.monkey.Monkey, and animal X becoming org.lib.animal.x.X. The problem I have is that both of these solutions are, in my opinion, very ugly: * (1) is ugly from a source code management perspective, because jamming 5000 lines of code for ten different animals into a single file is bad for obvious reasons. * (2) is ugly because we introduce org.lib.animal.x.X for animal X, which: (a) is redundant in terms of naming (b) redundant in function since we have a single package for each animal containing nothing but a single class of the same name Clearly, (1) is bad due to file/source structure reasons, and (2) is bad for module organizational reasons. So we are back to my original wish - I want to separate the two, so that I can solve (1) indepeendently of (2). Now, I realize that __init__.py can contain arbitrary code, and that one can override __import__. However, I do not want to resort to "hacks" just to solve this problem; I would prefer some established convention in the community, or at least something that is elegant. Does are people's thoughts on this problem? Let me just shoot down one possible suggestion right away, to show you what I am trying to accomplish: I do *not* want to simply break out X into org.lib.animal.x, and have org.lib.animal import org.lib.animal.x.X as X. While this naively solves the problem of being able to refer to X as org.lib.animal.X, the solution is anything but consistent because the *identity* of X is still org.lib.animal.x.X. Examples of way this breaks things: * X().__class__.__name__ gives unexpected results. * Automatically generated documentation will document using the "real" package name. * Moving the *actual* classes around by way of this aliasing would break things like pickled data structure as a result of the change of actual identity, unless one *always* pre-emptively maintains this shadow hierarchy (which is a problem in and of itself). Thus, it's not clean. It breaks the module abstraction and as a result has unintended consequences. I am looking for some kind of clean solution. What do people do about this in practice? [1] Optionally, we might introduce an "animals" package such that it would become org.lib.animal.animals.Monkey, if we thought we were going to have a lot of public API outside of the animals themselves. This does not affect this dicussion however, as the exact same thing would apply to org.lib.animal.animals as applies to org.lib.animal in the above example. [2] Ignoring for now that it may not be realistic that every animal implementation would be that long; in many cases a lot of code would be in common. But feel free to substitude for something else (a Zoo say). -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org -- http://mail.python.org/mailman/listinfo/python-list