Richard Wackerbarth writes: > There seems to be two fundamental design strategies being discussed. > One of them has a monolithic data store and the other has a > distributed store. > Barry has expressed some reservations about overloading a > monolithic data store with data extraneous to the fundamental > mission of message handling. > > I have expressed concern in requiring any implementation to > maintain related data in a split format. > I recognize that there will be cases where this is necessary (for > example the Launchpad case as described by Barry in another > message). But, as he notes, such implementations tend to be > "brittle". Especially where there are multiple components which can > alter the data. But, unless it is a constraint external to MM, I > do not believe that such a restriction should be introduced.
I don't think that anybody is considering requiring a monolithic store, in the sense of putting everything into a single backend DBMS, because we all agree that it should be possible to take member lists from an external database and augment them with Mailman-specific properties that we may not be permitted to store in the external database. (Aside: I don't think we should assume that external databases are necessarily read-only. For example, I can imagine informal organizations that would allow Mailman to add new subscribers to the member directory, or sales organizations that would allow people to subscribe to product announcement lists and automatically add them the to CRM database.) What I propose is a requirement is that any data added to any databases used by Mailman be accessible via standard Python introspection techniques. (In principle and by default, that is; Mailman already hides some data from some interfaces, such as the member list.) For example, if we use the "user as Python object" model, then the introspection method would simply be the 'dir' function. Other possibilities would be to have components register mutators and accessors for "their" data. > There is also an issue of what the term "core" means. Perhaps you > have been referring to a distribution package. I have been > referring to one component of such a package, in particular that > component which interacts with the MTAs and redistributes > messages. I find that highly unintuitive. The core is the set of functions that are essential. The "distribution package" description is an heuristic. I.e., "these are the functions that would completely stop the show if you installed Mailman and discovered any one of them was not present." > I consider the processing of administrative messages to be a > separate component. And I consider the storage of configuration > information to be yet another component. In my view, each component > extends only as far as its parts interact with the same private > data representation. I don't think that's a useful definition, to be honest. On the one hand, most functions have local variables, but surely that doesn't make them components by themselves. On the other, pretty much everything in Mailman interacts with mailing lists in one way or another, but surely none of us thinks of Mailman as a one-component application. I think of "component" as a concept that belongs to the art of programming, and not having a technical definition. A component is any body of code and content that is a convenient unit of creation, maintenance, and administration. Of course issues of coherence and coupling will help determine what is "convenient", but I don't think they're sufficient in themselves. > I do mean the latter. But, if the real underlying database is a > RDBMS, then, within the "black box", these queries probably should > be implemented by translating them to real SQL queries and passing > those to the RDBMS. Sure. But this is more likely if we have a good ORM (which is a more Pythonic way of thinking about things) as an interface to the RDBMS, and all of that is wrapped in a convenient powerful API that allows the programmer to delegate data persistence to some component of Mailman. > First, we seem to have a different conceptual model of MM. > I view that which is being called "core", not as a single entity, > but a collection of components, most of which are critical to the > operation of the system. That's not what you said above; above you restrict it to the message routing and distribution component. I believe that is the definition you have been pretty consistently using throughout the thread. No? Anyway, I find this one very close to my own thinking. > > You started this thread with the observation that various > > components are keeping data in different places, and that this > > data is often redundant but not synced or inaccessible. To me > > this suggests a design principle: a single conceptual database > > managed by a core component (i.e., one that is present in every > > Mailman 3 system). > > Yes, that is how I started the thread. However, you misinterpret > the requirement for a monolithic database. I think you're misinterpreting my words, actually, though I'm open to correction by a third party. By a "single conceptual database", I mean that there is a single API for accessing persistent Mailman data, and that you don't have to specify a connection to a database to access data. The implementation knows where all the data is stored, whether that happens to be a single humongous ZODB, or an heterogeous array of LDAP, SQL, and flatfile data stores. > Certainly a monolithic database would be one way to accomplish DRY > storage of the data, but it can also be accomplished in a > distributed manner. What I am suggesting is that in a distributed > system, no component of the system has the right to demand that it > have the exclusive right to be the keeper of certain shared > data. But, further, that any component taking on that > responsibility should also be responsible for the storage of any > related items. I question whether the pain of having an (explicitly) distributed system is worth the gain. As you've explained it here, I see it as setting us up for a situation where each component (including components that are substitutes performing the same conceptual function) will make their own decisions about what to store and where, and what is private and what is public, so that components will continually need to negotiate with each other over who have authority and responsibility for certain data. >From the responses of several people in this thread, I strongly suspect that most implementers will decide that most of the data they use is not interesting to other modules and make it private, rather than spend the effort needed to generalize. So I think the costs will be higher and the amount of shared data lower than for a system where one component is responsible for all connections to databases. > I would agree only if you drop the "non-core". Each component may > have "private" data. But that data cannot include any data that > needs to be exposed by the API. And how do we know what "needs" to be exposed? We don't. I'm sure we can make a killer MLM with a distributed database and each component storing private data. What I don't think we can do is make an MLM that's capable of killing web fora and Usenet, too that way. I think it's worth the extra effort to keep things general. _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9